Skip to content

Useful console commands (ORCID, MaIS)

Peter Mangiafico edited this page Mar 3, 2022 · 41 revisions

Authorization endpoints for users to connect their Sunet with ORCID

See if services are running

Note that production hits the production ORCID and MAIS services, and QA/UAT will hit the sandbox versions of each. This will cause the data returned to be different, so beware.

Using the Rails console you can see which one is being hit:

# production API URLs
Settings.ORCID.BASE_URL
=> "https://api.orcid.org"
Settings.MAIS.BASE_URL
=> "https://aswsweb.stanford.edu"

# sandbox API URLs
Settings.ORCID.BASE_URL
=> "https://api.sandbox.orcid.org"
Settings.MAIS.BASE_URL
=> "https://aswsuat.stanford.edu"

Mais.working?
=> true
Orcid.working?
=> true

Fetch a single user from MaIS

This lets you fetch a single stanford users that has connected with ORCID, scopes that have been authorized, along with tokens. It returns nil for a user who has not done this.

user = Mais.client.fetch_orcid_user(sunetid:'<SUNETID>');

Fetch all users who have integrated with ORCID

Note that the MaIS API will return all users who have integrated, and includes a history of users who have changed their scope or removed the link between their SUNET and ORCID. This means who may see the same user in the response more than once, and you need to use the last entry as the most current. However, these entries also "time out" over time, so if a given user has not changed their scope or link in some period of time (uncertain, but ~1-2 months), their previous history entry will be dropped from their response and you will see the user once with their current scope and with a timestamp of the last time it was changed.

As a rake task, to see users who have integrated:

RAILS_ENV=production bundle exec rake mais:fetch

From the rails console:

response = Mais.client.fetch_orcid_users;

To update our local database with ORCID ids for these users (this is a scheduled task):

RAILS_ENV=production bundle exec rake mais:update_authors

Fetch all the works from an ORCID profile for a given ORCID ID

(note this is a sandbox ORCIDID)

Orcid.client.fetch_works('https://sandbox.orcid.org/0000-0002-7262-6251')

Rake task for pushing works to ORCID from Profiles

All users who have integrated (this is a scheduled task):

RAILS_ENV=production bundle exec rake orcid:add_all_works

A single user:

RAILS_ENV=production bundle exec rake orcid:add_author_works['<SUNETID>']

Delete works at ORCID for a single user and reset their put codes in our database. This should rarely be done, only for debugging, testing, etc:

RAILS_ENV=production bundle exec rake orcid:delete_author_works['<SUNETID>']

Rake tasks for pulling works from ORCID and adding to Profiles

All users who have integrated:

RAILS_ENV=production bundle exec rake orcid:harvest_authors

A single user:

RAILS_ENV=production bundle exec rake orcid:harvest_author['petucket']

Examining our local database

We store orcids in our local database, but not auth tokens, which are always pulled from the MaIS API as needed. We also store "put codes" in the contribution table for any publications we have already sent to ORCID so we know not to send them again. You can use see this in our database:

sunetid='petucket';
author = Author.find_by(sunetid: sunetid); 
author.orcidid
=> "https://orcid.org/0000-0002-2230-4756"
Contribution.where(author_id: author.id).where.not(orcid_put_code: nil).count # shows the number of publications that have been sent to ORCID
=> 0
Contribution.where(author_id: author.id).where(orcid_put_code: nil).count # shows the number of publications that have NOT been sent to ORCID
=> 6
Contribution.where(author_id: author.id, status: 'approved', visibility: 'public', orcid_put_code: nil).count # shows the number of publications that can be potentially be sent to ORCID 
=> 4

Why are publications not being pushed to ORCID?

Check to see why a particular publication may not be getting pushed even though the author has authorized write scope, has a public profile and the publication is approved and public:

sunetid='petucket';
author = Author.find_by(sunetid: sunetid); 
contribs_to_push = Contribution.where(author_id: author.id, status: 'approved', visibility: 'public', orcid_put_code: nil);

pub = contribs_to_push[0].publication;
work = Orcid::PubMapper.map(pub.pub_hash)
==> Orcid::PubMapper::PubMapperError: A self identifier is required # complaining about an identifier not available

# Examine the provenance and the mapping for each of them:
contribs_to_push.each do |contrib|
  puts "#{contrib.publication_id} : #{contrib.publication.pub_hash[:provenance]}"
  begin puts Orcid::PubMapper.map(contrib.publication.pub_hash);rescue StandardError => e;puts e.message;end
end;nil

# look at a specific publication identifier mapping:
Orcid::PubIdentifierMapper.map(pub.pub_hash)
id_mapper = Orcid::PubIdentifierMapper.new(pub.pub_hash);
puts pub_hash[:identifier]
id_mapper.send(:map_identifiers,pub.pub_hash[:identifier], 'self') # use ids from pub_hash
Orcid::IdentifierTypeMapper.to_orcid_id_type('wosuid') # this identifier would be good
=> "wosuid"
Orcid::IdentifierTypeMapper.to_orcid_id_type('WoSItemID') # this identifier type is not found
=> nil

# if none are found in the pub hash, then the publication will not be pushed

Fetch all users from MaIS who have integrated with SUNET

The commands below let you determine how many stanford users have connected with ORCID, scopes that have been authorized, along with tokens. They are also wrapped up with an easy to run rake task:

RAILS_ENV=production bundle exec rake sul:orcid_integration_stats
orcid_users = Mais.client.fetch_orcid_users;
# note that the `fetch_orcid_users` command will returned some duplicated sunets, because it returns a history
# of all changes for that sunet, with the last entry being the current scope (i.e. if they change scope, they will be returned twice)
sunets = orcid_users.map {|user| user.sunetid}.uniq;

# determine how many have authorized write vs read; assumes that a connection implies read scope
# note that the `fetch_orcid_user` for a single user will return the latest scope for that user (no dupes)
scopes = {read: sunets.size, write: 0}
sunets.each { |sunetid| scopes[:write] += 1 if Mais::Client.new.fetch_orcid_user(sunetid:sunetid).update? };
scopes
=> {:read=>915, :write=>203}

# this doesn't give you scopes, but does let you know who has authorized orcid and has a public profile with harvest enabled, it assumes we have been regularly running the rake task in cron to discover new users from MaIS:
users = Author.where.not(orcidid: nil).where(cap_visibility:'public',cap_import_enabled:true).size

Export metadata for all users who have authorized write scope

This exports all sunets for users who have authorized write scope and who have public profiles and harvest enabled. Useful for communicating to users who are likely to have publications pushed.

require 'csv'
outputfile = 'tmp/public_sunets.csv'
sunets = Mais.client.fetch_orcid_users.map {|user| user.sunetid if user.update?}.compact.uniq;

CSV.open(outputfile, "w") do |csv|
  csv << ["sunet","email","full_name","last_name","first_name","cap_profile_id","orcidid","num_orcid_pubs","num_profile_pubs","date"]  
  sunets.each do |sunetid|
    mais_user = Mais::Client.new.fetch_orcid_user(sunetid:sunetid)
    orcidid=mais_user.orcidid
    number_of_orcid_pubs = Orcid.client.fetch_works(orcidid)[:group].size 
    author = Author.find_by(sunetid: sunetid)
    if author && author.cap_import_enabled && author.cap_visibility='public'
       number_of_profile_pubs=author.contributions.where(status:'approved',visibility:'public').size
       full_name = "#{author.first_name} #{author.last_name}"
       csv << [sunetid,"#{sunetid}@stanford.edu",full_name,author.last_name,author.first_name,author.cap_profile_id,orcidid,number_of_orcid_pubs,number_of_profile_pubs,Time.now]
    end
  end
end;

Export metadata for all ORCID integration users

This exports all sunets for users who have gone through the integration. Useful to get metadata about all of these users

require 'csv'
outputfile = 'tmp/integration_users.csv'
sunets = Mais.client.fetch_orcid_users.map {|user| user.sunetid}.uniq;

CSV.open(outputfile, "w") do |csv|
  csv << ["sunet","email","full_name","last_name","first_name","cap_profile_id","orcidid","date"]  
  sunets.each do |sunetid|
    mais_user = Mais::Client.new.fetch_orcid_user(sunetid:sunetid)
    orcidid=mais_user.orcidid
    author = Author.find_by(sunetid: sunetid)
    if author
       full_name = "#{author.first_name} #{author.last_name}"
       csv << [sunetid,"#{sunetid}@stanford.edu",full_name,author.last_name,author.first_name,author.cap_profile_id,orcidid,Time.now]
    end
  end
end;

Run a generalized query against ORCID API

You can run a query against ORCID and get ORCIDiDs back as a result.

orcid_client = Orcid::Client.new
response = orcid_client.search('(ringgold-org-id:6429)OR(orgname="Stanford%20University")OR("grid.168010.e")OR(email=*.stanford.edu)')
puts response['num-found']
response['result'].each do |result|
  puts result['orcid-identifier']['uri']
end

Lookup a person's name using their ORCID

(note this is a production ORCIDID)

orcid_client = Orcid::Client.new
orcidid = 'https://orcid.org/0000-0003-1527-0030'
name = orcid_client.fetch_name(orcidid).join(' ')
puts name

Export Stanford Integrations Users along with Stanford Users from ORCID API

This will query the sul_pub database for all people with an ORCIDID (i.e. that have gone through the Stanford integration and were returned via the MaIS API during our regular ORCID pushes). It also query for potential Stanford people with ORCIDIDs via the ORCID API. It then gets the difference between these two sets (which represents potential Stanford people with ORCIDs that are not known to sul-pub). Finally, it outputs all of the Stanford integration users, followed by the difference set. For Stanford users, we also return their SUNetID, cap_profile_id, ORCID API scope, and the number of publications we have pushed to ORCID. For ORCID API returned users, we lookup their name via the ORCID API and return an ORCIDID and name.

Note that the difference set is likely going to be bigger than the total API users minus the Stanford integration users, because the ORCID API is probably not going to return a perfect superset of all people of Stanford (i.e. we will have some integration users we know about that are not being returned via the ORCID API because they do not have Stanford listed in their ORCID profile or their visibility settings for their metadata is not allowing them to be found).

RAILS_ENV=production bundle exec rake sul:stanford_orcid_users['tmp/results.csv']

Export ORCIDIds and Names For An ORCID API Query

This runs a query against the ORCID API and exports the results as a CSV file (name and ORCIDID): (note: if you are running on localhost rails console, leave off the RAILS_ENV=production)

RAILS_ENV=production bundle exec rake sul:orcid_query['(ringgold-org-id:6429)OR(orgname="Stanford%20University")OR("grid.168010.e")OR(email=*.stanford.edu)','tmp/results.csv']

How many publications were added with the ORCID push from Profiles

Look at the log file, though note that the number of ORCID users is a total count of anyone with a linked SUNET, and will include people with read only scope. You can filter to get just user counts with write scope by using the queries shown above.

ssh pub@sul-pub-prod
cd pub/current
grep "Orcid::AddWorks Updated" log/orcid_add_works.log
I, [2021-06-21T11:51:58.540977 #4706]  INFO -- : Orcid::AddWorks Updated 8098 contributor records from 989 ORCID users.
I, [2021-06-22T06:01:16.395736 #2481]  INFO -- : Orcid::AddWorks Updated 5 contributor records from 991 ORCID users.

Or you can also count the contributions with non-nil put codes between specific dates, though keep in mind that if a contribution is updated in some other way (i.e. it had its status or visibility changed), this will increase the number:

date1=Time.parse('June 22 2021')
date2=Time.parse('June 23 2021')
Contribution.where.not(orcid_put_code:nil).where('updated_at >= ? and updated_at <= ?',date1,date2).count
=> 7

# or all of them
Contribution.where.not(orcid_put_code: nil).size
=> 12769
Clone this wiki locally