Skip to content

XX[DEPRECATED] Manual import of pubs using WoS ID with SCIENCEWIRE API

Peter Mangiafico edited this page Apr 13, 2018 · 1 revision

Periodically, we'll get an email or HelpSU ticket saying that we need to "catch up" a faculty member or that the harvest isn't finding a particular person's old citations. Typical use cases include:

  • brand new faculty member where most of the previous citations come other institutions (our harvest logic limits to "Stanford" as the organization affiliation so if a faculty member has published at Princeton for a few years, then Yale, then Harvard and then came to Stanford, we'd not harvest those records)
  • there's an issue with the harvester and we need to get a good clean set for the.

We'll do a specific targeted retrieval of known/manually determined citations for this faculty member and use a rake task to harvest those citations from ScienceWire. Once we've made the authorship association for this faculty member, we can either wait for overnight processing to pick them up or send a note to CAP support to ask them to do a directed harvest for those faculty.

Step-by-step guide

Support staff (typically Peter or Grace) generates a spreadsheet that lists the CAP ProfileID and the list of WoSIDs for the faculty member. Support staff (typically Peter or Grace) generates a spreadsheet that lists the CAP ProfileID and the list of WoSIDs for the faculty member. Grace typically manually go into Web of Knowledge (another Clarivate service) to do a manual search to create a search result for the faculty member.

Export the spreadsheet as a tab-delimited file (NOT COMMA!), with WOSID in the first column and cap_profile_id in the second column. Spaces are also fine to separate the columns. You do not have to worry about duplicate citations, the script will only import new citations. You do not need a header. The format of the file **should have carriage returns and line feeds **to separate new lines. Use a text editor that is capable of saving files in this way (i.e. both CR and LF).

Example file:

WOS:000408326300020 9554
WOS:000411299200030 9554
WOS:000407495400035 9554
WOS:000406465000026 9554
WOS:000402473900013 9554

Place this file on the server

sftp pub@sul-pub-prod
cd /tmp
put diehn.txt

Run the sw:wos_profile_id_report[path_to_report] rake task from the root of the project:

ssh pub@sul-pub-prod
cd sul-pub/current
RAILS_ENV=production bundle exec rake sw:wos_profile_id_report['/tmp/diehn.txt']

To view progress (only of course in a different process or if you happened to use nohup above), tail the log:

tail -99f log/sciencewire.log

Other interesting rake tasks

****rake sw:faculty_harvest[path_to_ids]                              # Harvest high priority faculty using the name-only query
rake sw:fortnightly_harvest[starting_author_id,ending_author_id]  # harvest from sciencewire by email or known sciencewire pub ids
rake sw:parallel_harvest                                          # harvest (with multiple processes) from sciencewire by email or known sciencewire p...
rake sw:wos_harvest[path_to_bibtex]                               # Harvest using a directory full of Web Of Science bibtex query results
rake sw:wos_profile_id_report[path_to_report]                     # Harvest using a text report file of WOS ids and cap_profile_ids
rake sw:wos_sunetid_json[sunetid,path_to_json]                    # Harvest for a given sunetid and a json-file with an array of WoS ids