-
Notifications
You must be signed in to change notification settings - Fork 23
Convenience
The repository includes a number of convenience scripts to illustrate and automate common usage.
The basic evaluation scripts automate the following workflow:
- convert the gold data to the evaluation tool format,
- convert each system run output to the evaluation tool format,
- evaluate each system run.
The following are written to the output directory:
- detailed evaluation report for each run (*.evaluation),
- summary evaluation report for comparing runs (00report.tab).
Usage for TAC14 output format:
./scripts/run_tac14_evaluation.sh \
/path/to/gold.xml \ # TAC14 gold standard queries/mentions
/path/to/gold.tab \ # TAC14 gold standard link and nil annotations
/system/output/directory \ # directory containing (only) TAC14 system output files
/script/output/directory \ # directory to which results are written
number_of_jobs # number of jobs for parallel mode
Usage for TAC13 output format:
./scripts/run_tac13_evaluation.sh \
/path/to/gold.xml \ # TAC13 gold standard queries/mentions
/path/to/gold.tab \ # TAC13 gold standard link and nil annotations
/system/output/directory \ # directory containing (only) TAC13 system output files
/script/output/directory \ # directory to which results are written
number_of_jobs # number of jobs for parallel mode
The analysis scripts automate the following workflow:
- run the basic evaluation,
- calculate confidence intervals for each system run,
- count errors for each system run (nil-as-link, link-as-nil, wrong-link counts).
The following are written to the output directory:
- detailed evaluation report for each run (*.evaluation),
- summary evaluation report for comparing runs (00report.tab),
- detailed confidence interval report for each run (*.confidence),
- summary confidence interval report for comparing runs (00report.*),
- error type distribution for each run (*.analysis).
Usage for TAC14 output format:
./scripts/run_tac14_all.sh \
/path/to/gold.xml \ # TAC14 gold standard queries/mentions
/path/to/gold.tab \ # TAC14 gold standard link and nil annotations
/system/output/directory \ # directory containing (only) TAC14 system output files
/script/output/directory # directory to which results are written
Usage for TAC13 output format:
/path/to/gold.xml \ # TAC13 gold standard queries/mentions
/path/to/gold.tab \ # TAC13 gold standard link and nil annotations
/system/output/directory \ # directory containing (only) TAC13 system output files
/script/output/directory # directory to which results are written
The filtered evaluation scripts automate the following workflow:
- filter gold data to include a specific subset of instances,
- filter each system run to include a specific subset of instances,
- run the basic evaluation over subset data.
The following are written to an output directory for each subset:
- detailed evaluation report for each run (*.evaluation),
- summary evaluation report for comparing runs (00report.tab).
The following subsets/directorys are defined:
- PER - mentions with person entity type,
- ORG - mentions with organisation entity type,
- GPE - mentions with geo-political entity type,
- NW - mentions from newswire documents,
- WB - mentions from newsgroup and blog documents,
- DF - mentions from discussion forum documents,
- entity-document type combinations (PER_NW, PER_WB, PER_DF, ORG_NW, etc.).
Usage for TAC14 output format:
./scripts/run_tac14_filtered.sh \
/path/to/gold.xml \ # TAC14 gold standard queries/mentions
/path/to/gold.tab \ # TAC14 gold standard link and nil annotations
/system/output/directory \ # directory containing (only) TAC14 system output files
/script/output/directory # directory to which results are written
Usage for TAC13 output format:
./scripts/run_tac13_filtered.sh \
/path/to/gold.xml \ # TAC13 gold standard queries/mentions
/path/to/gold.tab \ # TAC13 gold standard link and nil annotations
/system/output/directory \ # directory containing (only) TAC13 system output files
/script/output/directory # directory to which results are written
The test evaluation script automates the following workflow:
- run the basic evaluation,
- compare evaluation output to official TAC13 results.
The following are written to the output directory:
- detailed evaluation report for each run (*.evaluation),
- summary evaluation report for comparing runs (00report.tab),
- copy of the official results sorted for comparison (00official.tab),
- a diff report if the test fails (00diff.txt).
Usage for TAC13 official results:
./scripts/test_tac13_evaluation.sh \
/path/to/gold.xml \ # TAC13 gold standard queries/mentions
/path/to/gold.tab \ # TAC13 gold standard link and nil annotations
/system/output/directory \ # directory containing (only) TAC13 system output files
/system/scores/directory \ # directory containing official score summary reports
/script/output/directory # directory to which results are written
The gold data from TAC13 is distributed by LDC. When running the test evaluation script, provide:
-
LDC2013E90_TAC_2013_KBP_English_Entity_Linking_Evaluation_Queries_and_Knowledge_Base_Links_V1.1/data/tac_2013_kbp_english_entity_linking_evaluation_queries.xml
, -
LDC2013E90_TAC_2013_KBP_English_Entity_Linking_Evaluation_Queries_and_Knowledge_Base_Links_V1.1/data/tac_2013_kbp_english_entity_linking_evaluation_KB_links.tab
.
The system data from TAC13 is distributed by NIST. When running the test evaluation script, provide:
-
KBP2013_English_Entity_Linking_Evaluation_Results/KBP2013_english_entity-linking_runs
, -
KBP2013_English_Entity_Linking_Evaluation_Results/KBP2013_english_entity-linking_scores
.