This is the repository of dataset and source code for "BIRD: A Trustworthy Bayesian Inference Framework for Large Language Models".
Setup the environment by first downloading this repository and then running:
pip install -r requirements.txtThe datasets evaluated in this paper are available in the data/ directory:
- probabilistic estimation:
common2sense_human_annotation.csv(for evaluation) andcommon2sense_human_annotation.json( We provide this in the same format as a decision-making dataset to facilitate easier inference). - decision making:
common2sense.json,plasma.jsonandtoday.json. Each JSON dataset contains the following columns:scenariostatementopposite_statementadditional_sentence_label(indicates which statement each additional condition supports)- In
common2sense.json, the additional conditions are provided asadded_informationandoppo_added_information. - In
plasma.jsonandtoday.json, the additional conditions are listed underadditional_sentences.
Configure files for running the pipeline are in the scripts/ directory:
- To run the entire BIRD pipeline:
bash scripts/run_bird.sh- To run the baselines:
bash scripts/baseline.sh- To run the evaluation:
bash scripts/eval.shIf you find the project helpful, please cite:
@inproceedings{
feng2025bird,
title={{BIRD}: A Trustworthy Bayesian Inference Framework for Large Language Models},
author={Yu Feng and Ben Zhou and Weidong Lin and Dan Roth},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=fAAaT826Vv}
}