You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The full S2ORC dataset releases include an "annotations" field along with the paper data. This field contains information about the indices corresponding to various parts (eg. title, abstract, author names, individual paragraphs, etc.) of the paper's plaintext.
Here's an illustration of the S2ORC schema:
We used the indices listed under the "bibref" annotations to isolate the positions of inline citations. These annotations also usually included a "matched_paper_id" field that we could use to match an inline citation from a source paper to a cited target paper within the S2ORC dataset.
I hope this answers your question. Let us know if you have any more!
How does the author get the inline information from the S2ORC dataset?
The text was updated successfully, but these errors were encountered: