Skip to content

Dataset for WWW '24 paper "A Tale of Two Communities: Exploring Academic References on Stack Overflow"

Notifications You must be signed in to change notification settings

aceatusc/sciso-www

Repository files navigation

A Tale of Two Communities: Exploring Academic References on Stack Overflow

Author: Run Huang (USC) and Souti Chattopadhyay (USC)

ACM Web Conference 2024, Short Paper

1. Included Academic Repositories

Each academic repository may have multiple web domains, e.g., ACL articles may be hosted on aclanthology.org or aclweb.org. For a full list of curated web domains, see pub_regs.json.

Type Sources
Publishers ACM, BMC, Cambridge University Press, De Gruyter, Elsevier, Emerald, Frontiers, Hindawi, ICST, IEEE, IET, IGI Global, Inderscience, INFORMS, Ingenta, IOS Press, Liebert Open Access, MDPI, MIT Press, NOW Publishers, Old City Publishing, Oxford University Press, revues online, RonPub, SAGE Publications, SIAM, Springer, Taylor & Francis, Versita, Wiley, World Scientific
Academic Socieities Association for Computational Linguistics
including: ACL, NAACL, EMNLP, EACL, etc.
Association for the Advancement of Artificial Intelligence
including: AAAI, IAI, ICWSM, etc.
International Machine Learning Society
including: JMLR, ICML, MLR, TMLR, etc.
International Association for Cryptologic Research
including: Crypto, AsiaCrypt, Fast Software Encryption
Computer Vision Foundation
including: CVPR, ICCV, ECCV, WACV
USENIX
including: OSDI, NSDI, USENIX Security, ATC
American Math Society
ACM Special Interest Groups (SIGs)
*note: some SIGs may host programs on their individual domains, e.g., SIGCHI.org
IEEE Computer Society
Individual Conferences
*including: ICLR, IJCAI, NeurIPS, NDSS, VLDB, WWW, EMSOFT*
Academic Databases arXiv, OpenReview, paperswithcode, Semantic Scholar, ResearchGate, Nature, PloS, PNAS, Cell Press, NIH (PubMed, PubChem), HAL, NBN Resolver, CEUR Workshop Proceedings

2. Dataset of Academic References

This dataset is a comprehensive collection of 15009 academic references cited in Stack Overflow posts (including questions, answers, community wikis, etc.) as of December 8, 2023. It represents a valuable resource for researchers and practitioners interested in understanding the intersection of academic knowledge and discussions about practical challenges on one of the largest technical forums online.

Access the dataset here

2.1 Dataset Format

The data is structured in Line-delimited JSON (JSONL) format. Each line contains metadata of an academic reference (e.g., see meta_example.json). Fields in the metadata are described below.

2.2 Field Description

  • PostId
    ID of the post containing this academic reference. Corresponding to the Id field in the StackOverflow-posts table of the official Stack Exchange data dump.
    e.g., 74109833 (click to see the original post on Stack Overflow)

  • Url
    URL of the academic reference.
    e.g., https://aclanthology.org/C18-1054.pdf

  • metadata

    • title
    • authors
    • venue
      Normalized to the full official name via Semantic Scholar
      (e.g., ACL -> Annual Meeting of the Association for Computational Linguistics)
    • open_access
      Whether the referenced article is publicly accessible
    • citation_count
    • abstract
    • type
    • external_ids
    • year
    • concepts
  • Topic
    Topic label assigned by BERTopic

  • RevisionId
    We collected URLs from every historic version of a post. RevisionId is the ID of the changelog where we found this academic reference. Corresponding to the Id field in the StackOverflow-PostHistory table of the official Stack Exchange data dump.
    e.g., 280345137

  • History
    Type of edits (e.g., initial, edit title, edit body, etc.). Corresponding to the PostHistoryTypeId field in the StackOverflow-PostHistory table of the official Stack Exchange data dump. See here for details.

  • AnswerCount

  • CommentCount

  • FavoriteCount

  • PostTyepeId

  • Score

  • ViewCount

3. Interactive Figures

https://sciso.vercel.app/

About

Dataset for WWW '24 paper "A Tale of Two Communities: Exploring Academic References on Stack Overflow"

Resources

Stars

Watchers

Forks