A common demand on search applications for libraries is, to support synonymes from authority records. (The request "author:Blair, Eric" should find also the better known pseudonym "George Orwell")
In traditional a OPAC based on a SQL database, this may solved generically with a join. Modern bibliographic search systems are mostly based on full text retrieval systems like Lucene/SOLR/ElasticSearch/... Meanwhile some of this back ends may emulate a 'join', they are still key-value stores. For this it will be in most cases better, to expand the authority records external.
- Expanding the synonymes while searching is a straight forward strategy, but it is hard to handle complex synonymes like "big apple" to "new york city". Also it may limit the response time of the system.
- Expanding the synonymes while searching is not that flexible, but at index time the kind of the authority record (topic term, personal name, ...) is known. So it is easy to handle complex synonyms.
- A static file is easy to handle, but for a great collection of authority records may grow to a size of some Gb. This doesn't matte for a complete build of the index, but loading such a big file for every update of a bibliographic record is inefficient.
- A background service is slight more complex, but does not slow down the startup of the index or the update. On the other hand, a service may increase the time needed to build a new index. This disadvantage can be avoided with a cache.
This project contains a complete service to expand the 'GND'. (Authority records provided by the German National Library)
The service has three Parts
- Code to parse the authority records (provided in MarcXML) and load them into a simple Solr index.
- A minimal configuration for the Solr index
- Exemplary code to integrate the preprocessed synonymes into the own indexing process. e.g. SolrMarc
The main skeleton is quite stable but the processing of the data is in progress
The offline package of the GND is seperated in disjunkt files
- T_umlenk_loesch1701.mrc.xml - Deletions and redirections (todo)
- Tbgesamt1701gnd.mrc.xml - Organisations
- Tfgesamt1701gnd.mrc.xml - Meetings
- Tggesamt1701gnd.mrc.xml - Geographic
- Tngesamt1701gnd.mrc.xml - Personal names (non individualized)
- Tpgesamt1701gnd.mrc.xml - Personal Names (individualized)
- Tsgesamt1701gnd.mrc.xml - Topic Terms
- Tugesamt1701gnd.mrc.xml - Work/Title
Changes in the GND are available via OAI
- OaiUpdates - All kind (todo)
- Tw: Libraries (todo)
- Tk: RVK Notations (todo)
- Tr: other (todo)
- The code and the config for Solr contains some optional features, beside the synonyms
- The approach can easy extended for authority records from additional/other sources
- The source contains a URL to a local installation of Solr. This resource is not public available.
You can find the precompiled javadoc below doc
The code uses features of Java8 and needs libraries from following projects: