Skip to content

ADAH-EviDENce/NewsReader

Repository files navigation

NewsReader

NewsReader is a natural language processing pipeline. Among others, it tags parts-of-speech, recognizes named entities and annotates entities with predicates.

There are a number of implementations of the NewsReader pipeline:

  • POAS: pipeline-on-a-stick.
  • cltl/nlpp: contains a script that constructs the pipeline (EN+NL) from components.
  • vmc-from-scratch: creating a VM with the Dutch version of NewsReader
  • newsreader-docker: a Docker image for setting up a NewsReader server.

At the moment, none of these implementations succesfully build the whole pipeline for Dutch (see issues tracker). We have therefore decided to build the pipeline from individual modules.

Modules

We have imported all modules from NewsReader under the heading "Dutch modules":

These modules depend on the following software packages:

Build

The goal is to construct a lightweight, portable pipeline, which we achieve through a Docker image. This image is available from Docker Hub and can be obtained by pulling:

docker pull evidence/newsreaderdutch

If you would like to make change and build the image yourself, call:

docker image build -t newsreaderdutch NewsReaderDutch/

from within the root of the repository.

Usage

The Docker container can be run directly on your text files by calling:

docker run -v /workspace/:/work/ newsreaderdutch /work/file.txt

where /workspace/ is your local directory containing files that need to be processed and file.txt is the document that you would like to get annotated. The output will have the same filename, but with a *.naf extension. Currently, the pipeline writes the output of each module separately as well.

Contact

Questions, comments and bugs can be submitted to the issues tracker.