Skip to content
Change the repository type filter

All

    Repositories list

    • LimitGen

      Public
      Data and Code for ACL 2025 Paper "Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers"
      Jupyter Notebook
      0700Updated Jul 24, 2025Jul 24, 2025
    • AbGen

      Public
      Data and code for the ACL 2025 paper "AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research"
      Python
      2400Updated Jul 24, 2025Jul 24, 2025
    • Python
      0100Updated Jul 23, 2025Jul 23, 2025
    • SciArena

      Public
      Analysis code for paper "SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks"
      Python
      84520Updated Jul 1, 2025Jul 1, 2025
    • MCTS-RAG

      Public
      Python
      85860Updated Jun 28, 2025Jun 28, 2025
    • SciSketch

      Public
      Python
      0000Updated Jun 20, 2025Jun 20, 2025
    • MetaFaith

      Public
      MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs
      Python
      1300Updated Jun 3, 2025Jun 3, 2025
    • SCSS
      0001Updated May 29, 2025May 29, 2025
    • cpsc477

      Public
      Course website for CPSC 477/577 Natural Language Processing Spring 2025 at Yale University
      SCSS
      0100Updated Apr 10, 2025Apr 10, 2025
    • Physics

      Public
      Python
      01210Updated Apr 1, 2025Apr 1, 2025
    • MMVU

      Public
      Data and Code for CVPR 2025 paper "MMVU: Measuring Expert-Level Multi-Discipline Video Understanding"
      Python
      16800Updated Feb 28, 2025Feb 28, 2025
    • SciDQA

      Public
      Python
      0300Updated Feb 26, 2025Feb 26, 2025
    • M3SciQA

      Public
      Python
      11110Updated Jan 13, 2025Jan 13, 2025
    • Data and Code for ACL 2024 paper "DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents"
      Python
      02320Updated Dec 21, 2024Dec 21, 2024
    • Jupyter Notebook
      0300Updated Nov 18, 2024Nov 18, 2024
    • TAIL

      Public
      A Toolkit for Automatic and Realistic Long-Context Large Language Model Evaluation
      Python
      0600Updated Nov 14, 2024Nov 14, 2024
    • TOMATO

      Public
      Python
      02830Updated Nov 8, 2024Nov 8, 2024
    • MDCure

      Public
      MDCure: A Scalable Pipeline for Multi-Document Instruction-Following
      Python
      2901Updated Nov 2, 2024Nov 2, 2024
    • COMAL

      Public
      Python
      0100Updated Oct 31, 2024Oct 31, 2024
    • ReIFE

      Public
      Python
      0200Updated Oct 10, 2024Oct 10, 2024
    • ODSum

      Public
      Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"
      Python
      21010Updated Sep 20, 2024Sep 20, 2024
    • MRoSE

      Public
      Python
      0000Updated Sep 19, 2024Sep 19, 2024
    • Data and Code for the paper "FinanceMath: Knowledge-Intensive Math Reasoning in Finance Domains"
      Python
      42010Updated Aug 10, 2024Aug 10, 2024
    • refdpo

      Public
      Python
      11610Updated Jul 23, 2024Jul 23, 2024
    • This is the repo for ACL 2024 Finding paper - Unveiling the Spectrum of Data Contamination in Language Model: A Survey from Detection to Remediation
      0900Updated Jun 27, 2024Jun 27, 2024
    • Python
      21200Updated May 16, 2024May 16, 2024
    • QTSumm

      Public
      Data and Code for EMNLP 2023 paper "QTSumm: Query-Focused Summarization over Tabular Data"
      Python
      42100Updated Mar 29, 2024Mar 29, 2024
    • Consolidated experiments for simplification projects
      Jupyter Notebook
      0300Updated Mar 6, 2024Mar 6, 2024
    • InstruSum

      Public
      Jupyter Notebook
      02210Updated Feb 26, 2024Feb 26, 2024
    • cpsc488

      Public
      SCSS
      1000Updated Feb 11, 2024Feb 11, 2024