Inline-citation questions #1

wyzhhhh · 2024-09-02T14:59:37Z

How does the author get the inline information from the S2ORC dataset?

anirudhajith · 2024-09-05T08:43:12Z

The full S2ORC dataset releases include an "annotations" field along with the paper data. This field contains information about the indices corresponding to various parts (eg. title, abstract, author names, individual paragraphs, etc.) of the paper's plaintext.

Here's an illustration of the S2ORC schema:

We used the indices listed under the "bibref" annotations to isolate the positions of inline citations. These annotations also usually included a "matched_paper_id" field that we could use to match an inline citation from a source paper to a cited target paper within the S2ORC dataset.

I hope this answers your question. Let us know if you have any more!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inline-citation questions #1

Inline-citation questions #1

wyzhhhh commented Sep 2, 2024

anirudhajith commented Sep 5, 2024

Inline-citation questions #1

Inline-citation questions #1

Comments

wyzhhhh commented Sep 2, 2024

anirudhajith commented Sep 5, 2024