Skip to content

Automated processes

Peter Mangiafico edited this page Oct 10, 2019 · 6 revisions

We run 2 automated processes on the production and cap-qa machines. See the config/schedule.rb file for more details.

  1. rake cap:poll[1]: Nightly, we query the CAP API to update authorship records and subsequently run Web of Science harvesting on those changed authors.
  2. rake harvest:all_authors_update: Every few days, we run harvesting for all authors in the database across all sources.

The actual schedule for these processes is defined by the "whenever" gem -- see https://github.com/sul-dlss/sul_pub/blob/master/config/schedule.rb. Only deploys with a role of harvester will run these processes.

To determine if they are working is somewhat difficult to assess. You can see whether the processes are being fired by cron by looking at the log/cron.log -- generally this file is empty but you can look at its timestamp.

For CAP authors poller, you can look to see whether profiles are being updated via rails console:

Author.order('updated_at DESC').limit(10).select(:updated_at) # look at most recent 10 timestamps

For harvesting, there are detailed notes on how to debug the harvesting process here. The quick way to see if any publications are being added is to look at the publications table:

Publication.order('updated_at DESC').limit(10).select(:updated_at)