Skip to content

Commit

Permalink
Add index and emptyIndex as optional fields to the Token protobuf. Th…
Browse files Browse the repository at this point in the history
…is will be especially useful when passing around dependency graphs with empty indices
  • Loading branch information
AngledLuffa committed Oct 13, 2023
1 parent 56c78af commit 20c36f2
Show file tree
Hide file tree
Showing 3 changed files with 549 additions and 238 deletions.
13 changes: 12 additions & 1 deletion src/edu/stanford/nlp/pipeline/CoreNLP.proto
Original file line number Diff line number Diff line change
Expand Up @@ -241,9 +241,20 @@ message Token {
// string text @see Document#text + character offsets
// uint32 sentenceIndex @see Sentence#sentenceIndex
// string docID @see Document#docID
// uint32 index @see implicit in Sentence
// uint32 paragraph @see Sentence#paragraph

// Most serialized annotations will not have this
// Some code paths may not correctly process this if serialized,
// since many places will read the index off the position in a sentence
// In particular, deserializing a Document using ProtobufAnnotationSerializer
// will clobber any index value
// But Semgrex and Ssurgeon in particular need a way
// to pass around nodes where the node's index is not strictly 1, 2, 3, ...
// thanks to the empty nodes in UD treebanks such as
// English EWT or Estonian EWT (not related to each other)
optional uint32 index = 79;
optional uint32 emptyIndex = 80;

extensions 100 to 255;
}

Expand Down
Loading

0 comments on commit 20c36f2

Please sign in to comment.