-
Notifications
You must be signed in to change notification settings - Fork 13
/
README
174 lines (99 loc) · 5.23 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
=======================================================================
Moved!
=======================================================================
This repository has moved to a new home under the VuFind project.
Find the new repository here:
https://github.com/demiankatz/vufind-browse-handler
=======================================================================
Care and feeding of the NLA Solr browse request handler
Mark Triggs
The National Library of Australia, 2009
0. Background
This Solr plugin was developed to support the browse functionality of
the National Library of Australia's Catalogue
(http://catalogue.nla.gov.au). Please read the LICENSE file that
accompanies this file for details regarding the distribution of this
software.
1. Compiling it
You'll need Ant to get everything compiled:
ant jars -Dsolr.war=/path/to/my/webapps/solr.war
should give you the two required jar files:
browse-handler.jar
browse-indexing.jar
2. Creating your browse indexes
2.1. Index your authority data
This step creates a Lucene index of the "see also" and "use
instead" linkages from the authority data. Note that if you're using VuFind
you can stil this step because VuFind has its own authority index we use
instead.
java -cp browse-indexing.jar IndexAuth /path/to/a/dump/of/your/authority-data.mrc authority-index
2.2. Create lists of headings for browsing.
Now we produce a list of the headings we want to browse over. We want to browse on:
* Any term that appears in a particular index of our bib data (e.g. subject-browse)
* Any non-preferred term from our authority index whose preferred
form is linked to from our bib data (i.e. appears in the above index).
The PrintBrowseHeadings class does this: grabs headings from these
sources, produces a sort key for each heading and prints out a big
file with lines of the form:
<Sort key>^A<Heading>
Running it:
java -cp browse-indexing.jar PrintBrowseHeadings /path/to/your/bib/data/index subject-browse authority.index subjects.tmp
java -cp browse-indexing.jar PrintBrowseHeadings /path/to/your/bib/data/index author-browse authority.index names.tmp
By default this assumes you're using my default field names in your authority index, which are:
* preferred (1xx)
* insteadOf (4xx)
If you're not, you can provide the field names using Java system properties
on the above command lines. For example, VuFind uses:
-Dfield.preferred=heading -Dfield.insteadof=use_for
Next we just need to remove any duplicates. I do this using the GNU
sort program from the command-line because it's amazingly fast even on
big files:
sort -T /var/tmp -u --field-separator=$'\1' -k1 subjects.tmp -o sorted-subjects.tmp
sort -T /var/tmp -u --field-separator=$'\1' -k1 names.tmp -o sorted-names.tmp
2.3. Creating the SQLite DB
The last step is to load all the headings into an SQLite database
(which acts as the browse index, effectively). CreateBrowseSQLite
does this:
java -cp browse-indexing.jar CreateBrowseSQLite sorted-names.tmp namesbrowse.db
java -cp browse-indexing.jar CreateBrowseSQLite sorted-subjects.tmp subjectsbrowse.db
And that's the indexing process. At the end of this you should have
one SQLite database per browse type, and an index of your authority
data. Everything else is disposable!
3. Configuring Solr
3.1. Jar files
Now that we've got our indexes built, we just need to configure the
Browse request handler to use them. Start by copying the
browse-handler to Solr's lib directory.
cp browse-handler.jar solr/WEB-INF/lib
3.2. Solr configuration
Then configure your browse types in solrconfig.xml:
<requestHandler name="/browse" class="au.gov.nla.solr.handler.BrowseRequestHandler">
<str name="authIndexPath">/path/to/your/authority.index</str>
<str name="bibIndexPath">/path/to/your/bib/data/index</str>
<str name="sources">names,subjects</str>
<!-- These definitions should match the field names used in the authority index. -->
<str name="preferredHeadingField">preferred</str>
<str name="useInsteadHeadingField">insteadOf</str>
<str name="seeAlsoHeadingField">seeAlso</str>
<str name="scopeNoteField">scopeNote</str>
<lst name="names">
<str name="DBpath">/path/to/your/namesbrowse.db</str>
<str name="field">author-browse</str>
</lst>
<lst name="subjects">
<str name="DBpath">/path/to/your/subjectsbrowse.db</str>
<str name="field">subject-browse</str>
<str name="dropChars">[]()',</str>
</lst>
</requestHandler>
3.3. Testing
Finally, start up Solr and test that things are working:
http://yourhost.example.com:8080/solr/browse?source=subjects&from=boats&rows=20
4. Running updates
At the time of writing we are updating our authority and browse
indexes once per night at the same time we update our bib indexes.
The browse request handler has been designed to automatically detect
updates to these indexes and reloads them as required. The steps are
simple:
mv mybrowse.db mybrowse.db.old; mv mybrowse.db.new mybrowse.db
my authority.index authority.index.old; mv authority.index.new authority.index