Skip to content

Manual import of pubs using CSV of WoSIDs

Peter Mangiafico edited this page Dec 6, 2018 · 1 revision

Periodically, we'll get an email or HelpSU ticket saying that we need to "catch up" a faculty member or that the harvest isn't finding a particular person's old citations. Typical use cases include:

  • brand new faculty member where most of the previous citations come other institutions (our harvest logic limits to "Stanford" as the organization affiliation so if a faculty member has published at Princeton for a few years, then Yale, then Harvard and then came to Stanford, we'd not harvest those records)
  • there's an issue with the harvester and we need to get a good clean set for the.

We'll do a specific targeted retrieval of known/manually determined citations for this faculty member and use a rake task to harvest those citations from ScienceWire. Once we've made the authorship association for this faculty member, we can either wait for overnight processing to pick them up or send a note to CAP support to ask them to do a directed harvest for those faculty.

Step-by-step guide

Support staff (typically Peter or Grace) generates a spreadsheet that lists the CAP ProfileID and the list of WoSIDs for the faculty member. Grace typically manually go into Web of Knowledge (another Clarivate service) to do a manual search to create a search result for the faculty member.

Possible query to use: AUTHOR: Last Name, First Initital OR Last Name, First Name Refined by: ORGANIZATIONS-ENHANCED: ( Confirmed Organizations from Bio ) AND DOCUMENT TYPES: ( ARTICLE OR PROCEEDINGS PAPER OR REVIEW ) Indexes=SCI-EXPANDED, SSCI, A&HCI, CPCI-S, CPCI-SSH, BKCI-S, BKCI-SSH, ESCI, CCR-EXPANDED, IC Timespan=All years

Export the spreadsheet as simple text file with no header, with one WOSID per line. You do not have to worry about duplicate citations, the script will only import new citations. You do not need a header. The format of the file should have carriage returns and line feeds to separate new lines. Use a text editor that is capable of saving files in this way (i.e. both CR and LF).

Example file:

WOS:000408326300020
WOS:000411299200030
WOS:000407495400035
WOS:000406465000026
WOS:000402473900013

Place this file on the server

sftp pub@sul-pub-prod
cd /tmp
put diehn.txt

Run the wos:assign_uids[cap_profile_id] rake task from the root of the project, passing in the cap_profile_id and the file:

ssh pub@sul-pub-prod
cd sul-pub/current
RAILS_ENV=production bundle exec rake wos:assign_uids[12345] < '/tmp/diehn.txt'

To view progress, tail the log (only of course in a different process or if you happened to use nohup above)or just watch the output of the rake task:

tail -99f log/web_of_science.log