Skip to content

Running and Checking Harvesting Processes

Peter Mangiafico edited this page Apr 4, 2019 · 3 revisions

Running and Checking Harvesting Scripts

For testing the harvesting scripts on remote hosts, it can help to use the GNU Screen utility. These are long running processes and GNU Screen will allow them to run after your connection to the remote host is dropped.

See the common commands page for how to do individual harvests. See the config/schedule.rb file to see the automated cron jobs scheduled (which includes how to do a regular harvest).

Running the CAP Authorship Harvester manually

# Use a `screen` session for long running processes.
# When run on a STAGE or QA system, use RAILS_ENV=production.
$ cd sul-pub/current
$ DAYS_TO_UPDATE=1
$ RAILS_ENV=production bundle exec rake cap:poll[$DAYS_TO_UPDATE]
# In a separate `screen` window, tail the log:
$ tail -n100 -f log/cap_authors_poller.log # value from `Settings.CAP.AUTHORS_POLL_LOG`

Checking Harvesting Results

Check the logs for any errors. After that, inspect some db-data using rails console, e.g. (note these queries are slow):

Contribution.order("updated_at desc").limit(5)
Publication.order("updated_at desc").limit(5)

Similarly, use some of the log information to inspect details in the db-data.

Notes on Using GNU Screen

When I check the harvester, it goes something like this:

ssh pub@sul-pub-cap-qa
> screen -ls # list open screen sessions
> screen -r  # re-attach
  > # view current screen window and `alt-n` to switch between open windows
  > # one window is running the harvester, the other is tailing the log
  > `alt-d` to detach from screen
> exit # the vm