Skip to content

Python scripts to pull and convert data between Library of Congress vocabularies and other external vocabularies (GeoNames, VIAF, etc)

Notifications You must be signed in to change notification settings

jhu-library-applications/vocab-apis

Repository files navigation

vocab-apis

API resources

Here's a quick summary of the endpoints I tend to use, and some of their documentation.

vocabulary endpoint API Documentation
AAT http://vocab.getty.edu/sparql Getty Vocabularies: SPARQL endpoint
Europeana https://www.europeana.eu/api/ Europeana Record API
FAST (read) http://id.worldcat.org/fast FAST Linked Data API
FAST (Autosuggest) http://fast.oclc.org/searchfast/fastsuggest FAST Linked Data API
FAST (SRUSearch) http://id.worldcat.org/fast/search FAST Linked Data API
FAST (search, actually the best results) http://experimental.worldcat.org/fast/search not documented as an official endpoint idk?
GeoNames https://sws.geonames.org/ GeoNames Web Services Documentation
Internet Archive http://archive.org/metadata/ Internet Archive Developer Portal
Library of Congress Authorities http://id.loc.gov/authorities/ LOC Linked Data Service: Technical Center
VIAF http://www.viaf.org/viaf/ VIAF Authority Cluster Resource

Python resources

python library main purpose docs
bs4 Parses XML https://www.crummy.com/software/BeautifulSoup/bs4/doc/
pandas Everything tabular data https://pandas.pydata.org/docs/user_guide/index.html
requests Sends HTTP requests https://docs.python-requests.org/en/master/
rdflib Parses RDF/XML, N3, NTriples, Turtle, etc. https://rdflib.readthedocs.io/en/stable/index.html
xml.etree.ElementTree Parses XML https://docs.python.org/3/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

Scripts

searchForStringMatch

Starting data: A spreadsheet that searches a string heading in LCNAF and FAST and produces a URI if there is an exact match.
APIs: Library of Congress Authorities, FAST (SRUSearch)

Confirms the heading is authorized by retrieving the URIs and label from the APIs.

Starting data: A spreadsheet with strings of possible FAST headings.
APIs: FAST (Autosuggest), FAST (SRUSearch)

Finds exact and close matches to FAST subject headings.

getItemMetadata

Starting data: Europeana item identifier as variable item.
APIs: Europeana

Downloads item record in JSON-LD, and saves as file "query.json."

Starting data: Internet Archive item identifier as variable internet_id.
APIs: Internet Archive

Downloads item record in JSON and saves metadata in CSV.

Starting data: Entity id from WikiData.
APIs: Wikidata

Finds properties of entity and saves in CSV.

Convert

Starting data: A spreadsheet with FAST or VIAF identifiers.
APIs: none

Converts FAST and VIAF identifiers to URIs.

Starting data: A spreadsheet with geographic headers (from FAST or LCNAF).
APIs: FAST (read), Library of Congress Authorities, GeoNames

Convert geographic names from LCNAF to geonames identifiers. Example: Baltimore County (Md.) n79018713 is converted to Baltimore County https://www.geonames.org/4347790. It also builds full hierarchical name: Baltimore County, Maryland, United States from GeoNames.

Starting data: A spreadsheet with Library of Congress Subject Headings.
APIs: FAST (read), Library of Congress Authorities

Converts LCSH to one or more FAST headings.

Starting data: A spreadsheet with year dates from 1800s onwards.
APIs: none

Converts years into written out decades as given in FAST.

getAdditionalPropertiesFromIdentifiers

Starting data: A spreadsheet with FAST identifiers.
APIs: FAST (read)

Retrieves alternative identifiers from other authorities (VIAF, GeoNames, LCSH, etc.) given in FAST records.

Starting data: A spreadsheet with FAST identifiers.
APIs: FAST (read)

Converts the FAST identifier to a link, gets the rdf.xml record, and extracts the facet information (topical, geographical, corporate name, meeting or event, personal name, uniform title, form, period).

Starting data: A spreadsheet of VIAF URIs formatted like https://viaf.org/viaf/149920363. The script won't work if there is an ending dash (ex: https://viaf.org/viaf/149920363/).
APIs: VIAF, Library of Congress Authorities

Takes a list of VIAF URIs from a spreadsheet, finds the LCNAF authority record, and extracts the facet information from the rdf.xml record.

Starting data: A spreadsheet with URIs from the FAST, Library of Congress Authorities, GeoNames, VIAF, or AAT vocabularies.
APIs: FAST (read), Library of Congress Authorities, GeoNames, VIAF, AAT

Retrieves the authorized heading or label from the correct vocabulary using the URIs.

Starting data: A spreadsheet of VIAF URIs formatted like https://viaf.org/viaf/149920363. The script won't work if there is an ending dash (ex: https://viaf.org/viaf/149920363/).
APIs: VIAF, Library of Congress Authorities

Takes a list of VIAF URIs from a spreadsheet, finds the LCNAF authority record, and extracts the name components from the .marcxml.xml record.

Get URIs from authorized headings or labels

Starting data: A spreadsheet with Library of Congress Names.
APIs: Library of Congress Authorities

Retrieves the LCNAF URI for the searched named and grabbed alternative identifiers from other authorities (FAST and LC Authorities (LCNAF, LCSH, LCGFT)) in the records.

Starting data: String in variable label_search.
APIs: AAT

Retrieves AAT URI by searching for the label in the API.

Starting data: A spreadsheet with FAST string headings.
APIs: FAST (read), FAST (search)

Retrieves FAST URIs by searching for the heading in the API.

About

Python scripts to pull and convert data between Library of Congress vocabularies and other external vocabularies (GeoNames, VIAF, etc)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages