-
Notifications
You must be signed in to change notification settings - Fork 26
Indexing EAD in ArcLight
Now that you have your ArcLight application up and running, we need to index data into it.
First we need to download or access our EAD's. Let's create a directory where we can store these within our application.
$ mkdir eads
Now let's add some data there.
# This command will save one of our test datasets to the directory you just created
$ wget -P eads/ https://raw.githubusercontent.com/sul-dlss/arclight/master/spec/fixtures/ead/nlm/alphaomegaalpha.xml
Next we need to run our indexing task and tell the task which "Repository" the EAD file is linked to. By default, your ArcLight application should have a file config/repositories.yml
that was generated. This file contains information about the repositories for your instance. For example, in the EAD alphaomegaalpha.xml
we want to link it to the first repository in that file, nlm
:
nlm:
name: 'National Library of Medicine. History of Medicine Division'
description: 'NLM’s History of Medicine Division collects, preserves, makes available, and interprets for diverse audiences one of the world’s richest collections of historical material related to human health and disease.'
building: 'Building 38, Room 1E-21'
address1: '8600 Rockville Pike'
address2: ''
city: 'Bethesda'
state: 'MD'
zip: '20894'
country: 'USA'
phone: ''
contact_info: 'hmdref@nlm.nih.gov'
thumbnail_url: "https://collections.nlm.nih.gov/pageturnerserver/ajaxp?theurl=http://localhost:8080/fedora/get/nlm:nlmuid-101421040-img/THUMB"
We recommend that your config/repositories.yml
contain only the repositories for which you have EADs to index.
We can now use the arclight:index
task in ArcLight to index our EAD.
$ FILE=./eads/alphaomegaalpha.xml REPOSITORY_ID=nlm bundle exec rake arclight:index
Loading ./eads/alphaomegaalpha.xml into index...
Indexed ./eads/alphaomegaalpha.xml (in 0.837 secs).
You can add new repositories to the config/repositories.yml
file. The key that begins a repository is the same value you will use as the REPOSITORY_ID
in the indexing rake task.
We recommend that you organize EADs by repository and put them all in a directory using the repository's key. Then, run the rake arclight:index_dir
using the DIR
and REPOSITORY_ID
environment variables to index files all to the same repository:
# this assumes there's a directory with EAD files called /tmp/sul-spec, and a repository configured with the ID "spec"
$ DIR=/tmp/sul-spec REPOSITORY_ID=spec bundle exec rake arclight:index_dir
If you have another Solr instance that you are using that's not on the default location on localhost, you can provide the SOLR_URL
environment variable to index into that service:
SOLR_URL=http://solr.example.com/solr FILE=myead.xml REPOSITORY_ID=myid bundle exec rake arclight:index