Skip to content

Tutorial

Simon Gray edited this page Oct 13, 2017 · 3 revisions

How to use a CoreNLP pipeline

The standard way to use CoreNLP is to set up a pipeline of annotators with the functionality that you require. In corenlp-clj.core we find pipeline, which is a higher-order function that returns an annotating function. This returned function can annotate text using the specified pipeline setup.

Please note that prerequisites simply creates a string of annotator dependencies based on the annotators specified.

;; options for setting up a new pipeline
(def opts {"annotators" (prerequisites ["depparse" "lemma" "ner"])}))

;; initialising the pipeline, creating a function for annotating
(def nlp (pipeline opts))

The annotating function (here named nlp) then forms the first step in a chain of functions. Following nlp are a series of calls to annotation or the several convenience functions found in corenlp-clj.annotations. These functions extract information from the hierarchy of annotations created by the pipeline and lend themselves well to Clojure's threading macro.

(->> "This is an example sentence. That is another."
     nlp
     sentences
     tokens
     pos)

This example returns the Part-Of-Speech (POS) tags for every word of both sentences: (("DT" "VBZ" "DT" "NN" "NN" ".") ("DT" "VBZ" "DT" ".")). The POS tags are further delimited by separate seqs, one for each sentence. Omitting sentences from the chain would return a single seq containing all of the POS tags. Removing pos at the end returns the word tokens (represented as CoreLabels) and also divided into separate sentences. Removing both results in a single seq of word tokens.

Clone this wiki locally