-
Guiding question
Which of the available open-source projects may meet project needs? -
Considerations
Open-ONI, OpeNER, Apache, NERC-Fr, Palladio, Voyant Tools, D3 -
Goal
Shortlist of open source tools to adapt -
Discussants
Ludovic Moncla (lead), Mary Elings & Elena Azadbakht
- Listen:
Session 7 audio recording PART 1
Session 7 audio recording PART 2 - View: Session presentation slide deck
- Read: Session notes
- Briefing Documents:
- Sampsel, Laurie J. 2018. "Voyant Tools." Music Reference Services Quarterly, 21:3, 153-157, DOI: 10.1080/10588167.2018.1496754
- Agerri Gascón, Rodrigo, Cuadros Sean Gaines, Montse, and Rigau Claramunt, Germán. 2013. "OpeNER: Open Polarity Enhanced Named Entity Recognition." Sociedad Española para el Procesamiento del Lenguaje Natural. http://dialnet.unirioja.es/servlet/oaiart?codigo=4452510.
During this session, we reviewed several open-source projects, paying particular attention to the research activities and experience of our grant participants.In discussing the available tools, we attended to four main project needs: 1) browsing and sharing the document collection; 2) annotating the corpus; 3) processing the corpus using machine learning, geoparsing, and text mining techniques; and 4) visualizing and exploring the corpus.
We resolved to adapt tools for the following purposes in building our project:
- Browsing Corpus OpenONI, The Online Newspaper Initiative which provides a function set for loading, modeling and indexing data
- Annotating Corpus BRAT A server-based tool used to annotate the training and verification data for natural language processing Pelagios and Recogito A semantic annotation tool for texts and images that can identify and map places PERDIDO Geoparser A flexible geoparser that could be adapted to use a manually annotated gazetteer of historic place names
- Natural Language Processing Spacy A flexible python library for natural language processing capable of performing most of the needed project tasks
- Visualization D3, Data-Driven Documents A Javascript library that will enable us to make custom visualizations that are web-based and interoperable across browsers