Skip to content

XX[DEPRECATED] Useful commands for common tasks: SW

Peter Mangiafico edited this page Apr 13, 2018 · 1 revision

These are either rails console ruby or rake task commands for various common tasks.

Find cap profile id given a sunet ID

sunetid='peter12345'
author=Author.find_by_sunetid(sunetid)
author.author_identities # gives the users alternate names data

Show publications that would be harvested for a given cap profile id but not actually add them to the profile

Note this only shows new publications that would be harvested, not any current publications already in the profile.

cap_profile_id='12345'
author = Author.find_by_cap_profile_id(cap_profile_id)
author_pub_swids = author.publications.with_sciencewire_id.pluck(:sciencewire_id).uniq # this shows the current publications in their profile

harvester=ScienceWire::HarvestBroker.new(author, ScienceWireHarvester.new, alternate_name_query: true)  # use alternate_name_query: false to skip alt names query
ids_for_author = harvester.send(:ids_for_author) # just stanford (no alt names)
ids_for_alternate_names = harvester.send(:ids_for_alternate_names) # all alternate names

ids_to_harvest = harvester.generate_ids # gets all ids and removes the current ones (i.e. just the ones that would be harvested)
pubs = ids_to_harvest.collect { |swid| ScienceWireClient.new.get_sw_xml_source_for_sw_id(swid) };nil
titles = pubs.collect { |i| "'#{i.at_xpath('//PublicationItem/Title').text}' BY #{i.at_xpath('//PublicationItem/AuthorList').text}" };nil;puts titles

Show IDs that result from a "dumb query" (i.e. name search only) regardless of number of publications currently on profile

Note that runs only the main identity (no alternate names/identities)

cap_profile_id='12345'
author = Author.find_by_cap_profile_id(cap_profile_id)
author_name = ScienceWire::AuthorName.new(author.last_name,author.first_name,author.middle_name)
harvester=ScienceWire::HarvestBroker.new(author, ScienceWireHarvester.new, alternate_name_query: true)  # use alternate_name_query: false to skip alt names query
institution = ScienceWire::AuthorInstitution.new(Settings.HARVESTER.INSTITUTION.name,ScienceWire::AuthorAddress.new(Settings.HARVESTER.INSTITUTION.address.to_hash))
author_attributes = ScienceWire::AuthorAttributes.new(author_name, '', [], institution)
ids_for_dumb_query = harvester.send(:ids_from_dumb_query,author_attributes)

# for another arbitrary institution from their profile
institution = author.author_identities[0].institution
ScienceWire::AuthorInstitution.new(institution,ScienceWire::AuthorAddress.new({}))
author_attributes = ScienceWire::AuthorAttributes.new(author_name, '', [], institution)
ids_for_dumb_query = harvester.send(:ids_from_dumb_query,author_attributes)

Harvest all publications for a single author given a cap profile id on the rails console

cap_profile_id='12345'
AuthorHarvestJob.new.perform(cap_profile_id, harvest_alternate_names: true)  # with alternate names
AuthorHarvestJob.new.perform(cap_profile_id, harvest_alternate_names: false)  # without alternate names

Harvest all publications for a given cap profile id from a rake task

# note, you may want to pull the latest profile data from CAP before doing this, command shown above
RAILS_ENV=production bundle exec rake sw:cap_profile_harvest[41135]  # with default flag for alternate names data harvesting (currently false as of 12/1/2016) ... default is in config/settings.yml (USE_AUTHOR_IDENTITIES)

RAILS_ENV=production bundle exec rake sw:cap_profile_harvest_alt_names[41135]  # force alternate names data harvesting to be true

Show all publication titles and their identifiers for a given author

cap_profile_id='12345'
author = Author.find_by_cap_profile_id(cap_profile_id)
author.publications.each {|pub| puts pub.title + "|" + pub.publication_identifiers.map {|pub_id| "#{pub_id.identifier_type}:#{pub_id.identifier_value}"}.join('|')};nil

Rebuild the pubhash from the source record in the database

Useful if the methods that parse source records into pub hashes change

pub=Publication.find_by_pmid('12345') # OR 
pub=Publication.find_by_sciencewire_id('12345')
pub.rebuild_pub_hash
pub.save

Rebuild all publication pubhash from source records for a given sunetID or capprofileID

This is useful if the parsing or other algorithms used to be build the pubhash from a source records changes. Note that it doesn't re-pull the source record itself from pubmed or sciencewire- it just reparses the data we already have and rebuilds the pub hash.

author=Author.find_by_sunetid('sunet') # or
author=Author.find_by_cap_profile_id(12345)

author.publications.each do |pub|
  pub.rebuild_pub_hash
  pub.save
end

Pull latest author data from SUL-CAP given a cap profile id

RAILS_ENV=production bundle exec rake cap:poll_for_cap_profile_id[12345]  # pulls in latest author identities as well as main author info

or on console

cap_profile_id='12345'
cap_http_client = CapHttpClient.new
record = cap_http_client.get_auth_profile(cap_profile_id)
poller = CapAuthorsPoller.new
puts JSON.pretty_generate(record) # display it
poller.process_record(record) # actually process it

Create a new publication from a PMID and associate with an author

author = Author.find_by_cap_profile_id(84517)
pub=Publication.find_or_create_by_pmid('20337330')  # this will pull from PubMed and create a pub record if it doesn't exist
ScienceWireHarvester.new.add_contribution_for_harvest_suggestion(author,pub) # add this publication to this author
# should be taken care of by "add_all_db_contributions_to_my_pub_hash"?
#authorship_hash = {:featured=>'false',:visibility=>'public',:status=>'new',:cap_profile_id=>author.cap_profile_id}
#pub.contributions.build_or_update(author, authorship_hash)
pub.add_all_db_contributions_to_my_pub_hash
pub.save

To add multiple PMIDs to an author:

pmids=%w{19752419 19752420 19674727 21526923 22555112 23337555 24835760 24727261 24652518 25307130 26143666 26086920 26681392}
author = Author.find_by_cap_profile_id(84517)
harvester=ScienceWireHarvester.new
pmids.each do |pmid| 
   pub=Publication.find_or_create_by_pmid(pmid)
   harvester.add_contribution_for_harvest_suggestion(author,pub)
   pub.add_all_db_contributions_to_my_pub_hash
   pub.save
end

or

# import the provided list of PMIDs from a plain text file, no header row, one line per PMID, into the cap profile ID 12345
RAILS_ENV=production bundle exec rake pubmed:pmid_profile_id_import['/tmp/list_of_pmids','12345']

Show all publications for given a cap profile id

author = Author.find_by_cap_profile_id(12345)
author.publications
author.approved_publications

Search ScienceWire for a specific sciencewire_id or WoSID or DOI or PMID

client=ScienceWireClient.new

sciencewire_id='42708914'
result = client.get_full_sciencewire_pubs_for_sciencewire_ids([sciencewire_id]) 

wos_id='000323660000020'
result = client.get_full_sciencewire_pubs_for_wos_ids([wos_id]) 

doi='10.1038/nature11397'
result = client.get_pub_by_doi(doi, 1)

pmid='10000166'
result = client.pull_records_from_sciencewire_for_pmids([pmid])

Search PubMed for a specific pmid

pmid='25277988'
pmclient=PubmedClient.new
pmclient.fetch_records_for_pmid_list([pmid])

Search all sources by pmid; order is local database, then sciencewire, then pubmed

pmid='25277988'
result=PubmedHarvester.search_all_sources_by_pmid([pmid])

Update the pub-hash for a publication from the ScienceWire source record manually (as cached in our database, no new pull from ScienceWire)

This is not a great idea and is only useful as a quick patch to an existing publication that is wrong in ScienceWire (e.g. a bad DOI in their system)

sciencewire_id='42708914'
publication=Publication.find_by(:sciencewire_id=>sciencewire_id)

sw_source=SciencewireSourceRecord.find_by(:sciencewire_id=>sciencewire_id)
puts sw_source.source_data
sw_source.source_data.gsub!('some bad string','some fixed string')  # edit the record if needed
sw_source.save

pub_hash=SciencewireSourceRecord.get_sciencewire_hash_for_sw_id(sciencewire_id)
publication.build_from_sciencewire_hash(pub_hash)
publication.save

Get a Nokogiri XML Element with the publication data for Sciencewire given a sciencewire id

sciencewire_id='42708914'
result=ScienceWireClient.new.get_sw_xml_source_for_sw_id(sciencewire_id)

Update an existing sciencewire or pubmed record publication data into the SciencewireSourceRecord/PubmedSourceREcord document and then update pubhash

This is useful if an existing ScienceWire and/or Pubmed record is updated to correct data and we need to resync with them and then update the local publication with the new data

sciencewire_id='42708914'
publication=Publication.find_by(:sciencewire_id=>sciencewire_id)
sw_source = SciencewireSourceRecord.find_by_sciencewire_id(sciencewire_id)
sw_source.sciencewire_update
publication.build_from_sciencewire_hash(sw_source_record.source_as_hash)
publication.sync_publication_hash_and_db
publication.save
pmid='42708914'
publication=Publication.find_by(:pmid=>pmid)
pm_source_record = PubmedSourceRecord.find_by_pmid(pmid)
pm_source_record.pubmed_update
publication.pub_hash = pm_source_record.source_as_hash
publication.save

Manually associate a publication with an author (useful for testing in UAT)

pmid='42708914'
cap_profile_id='12345'
publication=Publication.find_by_pmid(pmid) # find an existing pub somehow
author = Author.find_by_cap_profile_id(cap_profile_id) # author that we need to add this pub to

sw=ScienceWireHarvester.new
sw.add_contribution_for_harvest_suggestion(author,publication)

Merge duplicate author contributions from one record into another

Use case: a single author has two author rows with publications associated with each. You want to merge one author into the author, carrying any existing publications but not duplicating them. This happens when two profiles are created initially because CAP was not able to match the physician information to the faculty information until after two profiles were created. They "merged" them on the CAP side, but the publications were not merged on the SUL-PUB side. This manifests itself as unexpected behavior (missing pubs, etc.). The rake task takes in two cap_profile_ids and will merge all of the publications from DUPE_CAP_PROFILED_ID's profile into PRIMARY_CAP_PROFILE_ID's profile. It will then deactivate DUPE_CAP_PROFILED_ID's profile (which should now have no publications associated with it) to prevent harvesting into it.

RAILS_ENV=production bundle exec rake cleanup:merge_profiles[123,456] # will merge all publications from cap_profile_id 456 into 123, without duplication

Clone this wiki locally