Skip to content

Analyzing Annotations

Zdenek Kasner edited this page Nov 5, 2024 · 2 revisions

Analyzing Annotations

Once you collect the annotations, factgenie can help you with computing basic statistics over the annotation labels:

Analysis scheme

You can find the tools for the analysis at /analyze: Analysis page

On the Analysis page, there are two main interfaces:

  • Individual statistics,
  • Inter-annotator agreement.

πŸ“ˆ Individual statistics

This interface provides statistics about a single annotation campaign.

Analysis table

In the table, we can find the following columns:

  • Dataset, split, setup: The source of the corresponding inputs (see terminology).
  • Category: The annotation span category label.
  • Ex. annotated: The number of examples annotated within the campaign.
  • Count: The total number of label occurences within annotated examples.
  • Avg. per ex.: The average number of label occurences within annotated examples (=Count / Ex. annotated).
  • Prevalence: A ratio of outputs containing the label (0 to 1 range).

The statistics are provided in full detail and also grouped by various aspects (label categories, setups, datasets).

Note that the page with individual statistics for each campaign can be also opened using the "View statistics" button on the campaign detail page.

Analysis view

βš–οΈ Inter-annotator agreement

This interface provides a way to compute inter-annotator agreement among span labels in pairs of campaigns:

Analysis inter-annotator agreement

The agreement is computed only for annotated examples and compatible labels.

The following coefficients are computed:

  • Pearson r (micro) - A Pearson r coefficient computed over concatenated results from all the categories.
  • Pearson r (macro) - An average of Pearson r coefficients computed separately for each category.

The coefficients are computed on the following levels:

  • Dataset-level - Computed over a list of average error counts, one number for each (dataset, split, setup_id) combination.
  • Example-level - Computed over a list of error counts, one number for each example.
Clone this wiki locally