Very brief readme. Ping Eugene for any questions.
pip install -r requirements.txtRunning this app at port 9988.
streamlit run entry.py --server.port 9988 -- --user_db_path=user_db.db --task_configs ./mini-test_config.jsonFor the Streamlit runtime configuration, please refer to https://docs.streamlit.io/develop/concepts/configuration/options.
Flags after -- are app sepcific configurations. Right now, you can pass mulitple files to --task_configs to load multiple config files.
The --user_db_path points to a sqlite database that contains user log in information with passwords stored with salts.
The default password for root is yourdefaultpassword. Please change that before you make the tool publicly accessible.
Config files are json files that define the task.
The dictionary is loadded to the Python class task_resources.TaskConfig.
There are several particularly important fields:
name: the unique name of the task severing as the unique identifier of the task.task_assignment: a dictionary that define the assignment of the topic to the assessors.topic_file: a jsonl file where the topic are stored with fields used defined by thetopic_id_fieldandtopic_fields.cited_sentences_path: a json file containing all report sentences that cite some document in the collection. It is meant to be used to judge the supportness of the reference and constructing/revising the nuggets. The file should have four levels -- topic_id, doc_id, run_id, and sent_id.report_runs_patha json file containing all the reports. The file should contain three levels -- topic_id, run_id, sent_id.
Other fields should be self-explanatory by the field name.
Please refer to the mini-test_config.json as an example.
mini-test.citation-to-sentences.json and mini-test.report-sentences.json are two example resource files referred in the mini-test_config.json config file.
The two files are generaed by the utility script prepare_utils.py.
In order to preload nuggets before the first stage citation support assessment, simply put a json file in the output directory defined in the config file with the file name in the format of nuggets_{topic_id}.preload.json.
The json file has the same format of the output nugget file, which contains two high level fields -- nugget_dict and group_assignment.
nugget_dict contains a dictionary of nugget questions to a dictionary of nugget answer to a list of document id supporting the question-answer pair. The preload nugget can have empty doucment id list, which is meant to be assigned during the nugget support stage.
group_assignment contains a dictionar of nugget question to its assigned group. The dictionary can also be empty in the preload file.