Author-Disambiguated ISI

Author clusters for each author on each article in ISI is a gzipped SQLite ( database. To read the data, unzip the database and query it using SQLite3. The database schema is very simple, it consists of a single table with the following columns:

  • article: the ISI article id
  • block: the block name (last name and first initial in "last_f" format)
  • rules: the index of the automatically assigned cluster (assigned by the higher precision, lower recall rule-based classifier)
  • bootstrap: the index of the automatically assigned cluster (assigned by the more precision-recall balanced bootstrapped classifier)

Note that the cluster indexes are cluster specific, so for each author block, there's a cluster 1, a cluster 2, etc. If you want a unique author (cluster) id, you should combine the block name and the cluster label, e.g. block "wei_w" and cluster label "1" could be combined into a unique author id as "wei_w_1" or something like that.

Also note that authors of ISI articles that were singletons (i.e. only one author was in the lastname_f block) are not listed in the database. If an author of a paper you're looking at is not in the database, the block label should be unique, and you can use the block label by itself as the unique author id, e.g. "borruel_a".