Skip to content

ACL 2025 Paper(Do not Abstain! Identify and Solve the Uncertainty)Repository

Notifications You must be signed in to change notification settings

alimama-creative/ConfuseBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data

The benchmark dataset is shown in data/

  • question: the question
  • gold doc: gold documents which directly helps answer the question
  • doc: actual docs will be provided to the model
  • answer: ground truth
  • original question: if the query is ambiguous, then it stands for the original query, otherwise null
  • type: "doc" for lack of documents, "ambig" for ambiguous query, "ability" for lack of capacitys

Run

get corpus for the datasets

prepare retriever server

$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz
$ wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
$ shasum -a 512 -c elasticsearch-7.10.2-linux-x86_64.tar.gz.sha512
$ tar -xzf elasticsearch-7.10.2-linux-x86_64.tar.gz
$ cd elasticsearch-7.10.2/
$ ./bin/elasticsearch # start the server
# pkill -f elasticsearch # to stop the server

start the elastic search server by

uvicorn serve:app --port 8000 --app-dir retriever_server

put the corpus, index and retriever model of toolbench in toolbench_retriever

start toolbench retriever server in python toolbench_retriever/toolbench_retriever_server.py --port {}

set the toolbench retriever port in utils/es_retrieve

Set the llm calling API in utils/llm_proxy

evaluation

run the following command to judge the source of uncertainty by prompt, inquiry, and the answer of inquiry

python adaptive_answer/prompt_judge.py --model {}
python adaptive_answer/inquiry_judge.py --model {}
python adaptive_answer/answer_judge.py --model {}

About

ACL 2025 Paper(Do not Abstain! Identify and Solve the Uncertainty)Repository

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages