RAG Annotation Tool

Very brief readme. Ping Eugene for any questions.

Get Started

pip install -r requirements.txt

Running this app at port 9988.

streamlit run entry.py --server.port 9988 -- --user_db_path=user_db.db --task_configs ./mini-test_config.json

For the Streamlit runtime configuration, please refer to https://docs.streamlit.io/develop/concepts/configuration/options.

Flags after -- are app sepcific configurations. Right now, you can pass mulitple files to --task_configs to load multiple config files.

The --user_db_path points to a sqlite database that contains user log in information with passwords stored with salts.

The default password for root is yourdefaultpassword. Please change that before you make the tool publicly accessible.

Config File

Config files are json files that define the task. The dictionary is loadded to the Python class task_resources.TaskConfig.

There are several particularly important fields:

name: the unique name of the task severing as the unique identifier of the task.
task_assignment: a dictionary that define the assignment of the topic to the assessors.
topic_file: a jsonl file where the topic are stored with fields used defined by the topic_id_field and topic_fields.
cited_sentences_path: a json file containing all report sentences that cite some document in the collection. It is meant to be used to judge the supportness of the reference and constructing/revising the nuggets. The file should have four levels -- topic_id, doc_id, run_id, and sent_id.
report_runs_path a json file containing all the reports. The file should contain three levels -- topic_id, run_id, sent_id.

Other fields should be self-explanatory by the field name. Please refer to the mini-test_config.json as an example. mini-test.citation-to-sentences.json and mini-test.report-sentences.json are two example resource files referred in the mini-test_config.json config file. The two files are generaed by the utility script prepare_utils.py.

Preload Nuggets

In order to preload nuggets before the first stage citation support assessment, simply put a json file in the output directory defined in the config file with the file name in the format of nuggets_{topic_id}.preload.json.

The json file has the same format of the output nugget file, which contains two high level fields -- nugget_dict and group_assignment.

nugget_dict contains a dictionary of nugget questions to a dictionary of nugget answer to a list of document id supporting the question-answer pair. The preload nugget can have empty doucment id list, which is meant to be assigned during the nugget support stage.

group_assignment contains a dictionar of nugget question to its assigned group. The dictionary can also be empty in the preload file.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.streamlit		.streamlit
configs		configs
outputs/mini-test.zho		outputs/mini-test.zho
resources		resources
scripts		scripts
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
data_manager.py		data_manager.py
entry.py		entry.py
nugget_editor.py		nugget_editor.py
page_utils.py		page_utils.py
requirements.txt		requirements.txt
stage_citaiton_assessment.py		stage_citaiton_assessment.py
stage_nugget_alignment.py		stage_nugget_alignment.py
stage_nugget_creation.py		stage_nugget_creation.py
stage_nugget_revision.py		stage_nugget_revision.py
start-me.sh		start-me.sh
task_resources.py		task_resources.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAG Annotation Tool

Get Started

Config File

Preload Nuggets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RAG Annotation Tool

Get Started

Config File

Preload Nuggets

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages