The NLP Interchange Format (NIF) is an RDF/OWL-based format that aims to achieve interoperability between Natural Language Processing (NLP) tools, language resources and annotations. It offers a standard representation of annotated texts for tasks such as Named Entity Recognition or Entity Linking. It is used by GERBIL to run reproducible evaluations of annotators.
This Python library can be used to serialize and deserialized annotated corpora in NIF.
NIF 2.1, serialized in any of the formats supported by rdflib
This library is revolves around three core classes: * a NIFContext
is a document (a string); * a NIFPhrase
is the annotation of a
snippet of text (usually a phrase) in a document; * a NIFCollection
is a set of documents, which constitutes a collection. In NIF, each of
these objects is identified by a URI, and their attributes and relations
are encoded by RDF triples between these URIs. This library abstracts
away the encoding by letting you manipulate collections, contexts and
phrases as plain Python objects.
- Import and create a collection
from pynif import NIFCollection
collection = NIFCollection(uri="http://freme-project.eu")
- Create a context
context = collection.add_context(
uri="http://freme-project.eu/doc32",
mention="Diego Maradona is from Argentina.")
- Create entries for the entities
context.add_phrase(
beginIndex=0,
endIndex=14,
taClassRef=['http://dbpedia.org/ontology/SportsManager', 'http://dbpedia.org/ontology/Person', 'http://nerd.eurecom.fr/ontology#Person'],
score=0.9869992701528016,
annotator='http://freme-project.eu/tools/freme-ner',
taIdentRef='http://dbpedia.org/resource/Diego_Maradona',
taMsClassRef='http://dbpedia.org/ontology/SoccerManager')
context.add_phrase(
beginIndex=23,
endIndex=32,
taClassRef=['http://dbpedia.org/ontology/PopulatedPlace', 'http://nerd.eurecom.fr/ontology#Location',
'http://dbpedia.org/ontology/Place'],
score=0.9804963628413852,
annotator='http://freme-project.eu/tools/freme-ner',
taMsClassRef='http://dbpedia.org/resource/Argentina')
- Finally, get the output with the format that you need
generated_nif = collection.dumps(format='turtle')
print(generated_nif)
You will obtain the NIF representation as a string:
<http://freme-project.eu> a nif:ContextCollection ;
nif:hasContext <http://freme-project.eu/doc32> ;
ns1:conformsTo <http://persistence.uni-leipzig.org/nlp2rdf/ontologies/nif-core/2.1> .
<http://freme-project.eu/doc32> a nif:Context,
nif:OffsetBasedString ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "33"^^xsd:nonNegativeInteger ;
nif:isString "Diego Maradona is from Argentina." .
<http://freme-project.eu/doc32#offset_0_14> a nif:OffsetBasedString,
nif:Phrase ;
nif:anchorOf "Diego Maradona" ;
nif:beginIndex "0"^^xsd:nonNegativeInteger ;
nif:endIndex "14"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://freme-project.eu/doc32> ;
nif:taMsClassRef <http://dbpedia.org/ontology/SoccerManager> ;
itsrdf:taAnnotatorsRef <http://freme-project.eu/tools/freme-ner> ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Person>,
<http://dbpedia.org/ontology/SportsManager>,
<http://nerd.eurecom.fr/ontology#Person> ;
itsrdf:taConfidence 9.869993e-01 ;
itsrdf:taIdentRef <http://dbpedia.org/resource/Diego_Maradona> .
<http://freme-project.eu/doc32#offset_23_32> a nif:OffsetBasedString,
nif:Phrase ;
nif:anchorOf "Argentina" ;
nif:beginIndex "23"^^xsd:nonNegativeInteger ;
nif:endIndex "32"^^xsd:nonNegativeInteger ;
nif:referenceContext <http://freme-project.eu/doc32> ;
nif:taMsClassRef <http://dbpedia.org/resource/Argentina> ;
itsrdf:taAnnotatorsRef <http://freme-project.eu/tools/freme-ner> ;
itsrdf:taClassRef <http://dbpedia.org/ontology/Place>,
<http://dbpedia.org/ontology/PopulatedPlace>,
<http://nerd.eurecom.fr/ontology#Location> ;
itsrdf:taConfidence 9.804964e-01 .
- You can then parse it back:
parsed_collection = NIFCollection.loads(generated_nif, format='turtle')
for context in parsed_collection.contexts:
for phrase in context.phrases:
print(phrase)
If you have any problems with or questions about this library, please contact us through a GitHub issue.