Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify evaluate() #17

Merged
merged 6 commits into from
Sep 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,6 @@ jobs:
cargo run --release --example from_json --features serde
cargo run --release --example from_trec
cargo run --release --example paired_bootstrap_test
cargo run --release --example simple

correctness-test:
name: Correctness test against trec_eval
Expand Down
86 changes: 40 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Elinor: Evaluation Library in Information Retrieval
# Elinor: Evaluation Library in INfOrmation Retrieval

<p align="left">
<a href="https://github.com/kampersanda/elinor/actions/workflows/ci.yml?query=branch%3Amain"><img src="https://img.shields.io/github/actions/workflow/status/kampersanda/elinor/ci.yml?branch=main&style=flat-square" alt="actions status" /></a>
Expand All @@ -14,8 +14,7 @@ inspired by [ranx](https://github.com/AmenRa/ranx) and [Sakai's book](https://ww
## Features

- **IRer-friendly**:
The library is designed to be easy to use for developers in information retrieval
by providing TREC-like data structures, such as Qrels and Run.
The library is designed to be easy to use for developers in information retrieval.
- **Flexible**:
The library supports various evaluation metrics, such as Precision, MAP, MRR, and nDCG.
The supported metrics are available in [Metric](https://docs.rs/elinor/latest/elinor/metrics/enum.Metric.html).
Expand All @@ -33,52 +32,47 @@ RUSTDOCFLAGS="--html-in-header katex.html" cargo doc --no-deps --open

## Getting Started

A simple routine to prepare Qrels and Run data structures
A simple routine to prepare gold and predicted relevance scores
and evaluate them using Precision@3, MAP, MRR, and nDCG@3:

```rust
use elinor::{QrelsBuilder, RunBuilder, Metric};

// Construct Qrels data structure.
let mut qb = QrelsBuilder::new();
qb.add_score("q_1", "d_1", 1)?;
qb.add_score("q_1", "d_2", 0)?;
qb.add_score("q_1", "d_3", 2)?;
qb.add_score("q_2", "d_2", 2)?;
qb.add_score("q_2", "d_4", 1)?;
let qrels = qb.build();

// Construct Run data structure.
let mut rb = RunBuilder::new();
rb.add_score("q_1", "d_1", 0.5.into())?;
rb.add_score("q_1", "d_2", 0.4.into())?;
rb.add_score("q_1", "d_3", 0.3.into())?;
rb.add_score("q_2", "d_4", 0.1.into())?;
rb.add_score("q_2", "d_1", 0.2.into())?;
rb.add_score("q_2", "d_3", 0.3.into())?;
let run = rb.build();

// The metrics to evaluate can be specified via Metric instances.
let metrics = vec![
Metric::Precision { k: 3 },
Metric::AP { k: 0 }, // k=0 means all documents.
// The instances can also be specified via strings.
"rr".parse()?,
"ndcg@3".parse()?,
];

// Evaluate the qrels and run data.
let evaluated = elinor::evaluate(&qrels, &run, metrics.iter().cloned())?;

// Macro-averaged scores.
for metric in &metrics {
let score = evaluated.mean_scores[metric];
println!("{metric}: {score:.4}");
}
// => precision@3: 0.5000
// => ap: 0.5000
// => rr: 0.6667
// => ndcg@3: 0.4751
use elinor::{GoldRelStoreBuilder, PredRelStoreBuilder, Metric};
use approx::assert_abs_diff_eq;

// Prepare gold relevance scores.
let mut b = GoldRelStoreBuilder::new();
b.add_score("q_1", "d_1", 1)?;
b.add_score("q_1", "d_2", 0)?;
b.add_score("q_1", "d_3", 2)?;
b.add_score("q_2", "d_2", 2)?;
b.add_score("q_2", "d_4", 1)?;
let gold_rels = b.build();

// Prepare predicted relevance scores.
let mut b = PredRelStoreBuilder::new();
b.add_score("q_1", "d_1", 0.5.into())?;
b.add_score("q_1", "d_2", 0.4.into())?;
b.add_score("q_1", "d_3", 0.3.into())?;
b.add_score("q_2", "d_4", 0.1.into())?;
b.add_score("q_2", "d_1", 0.2.into())?;
b.add_score("q_2", "d_3", 0.3.into())?;
let pred_rels = b.build();

// Evaluate Precision@3.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::Precision { k: 3 })?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);

// Evaluate MAP, where all documents are considered via k=0.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::AP { k: 0 })?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);

// Evaluate MRR, where the metric is specified via a string representation.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "rr".parse()?)?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.6667, epsilon = 1e-4);

// Evaluate nDCG@3, where the metric is specified via a string representation.
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "ndcg@3".parse()?)?;
assert_abs_diff_eq!(evaluated.mean_score(), 0.4751, epsilon = 1e-4);
```

Other examples are available in the [`examples`](https://github.com/kampersanda/elinor/tree/main/examples) directory.
Expand Down
7 changes: 3 additions & 4 deletions elinor-evaluate/src/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,10 +28,9 @@ fn main() -> Result<(), Box<dyn std::error::Error>> {
let pred_rels = trec::parse_pred_rels_in_trec(load_lines(&args.pred_file)?.into_iter())?;

let metrics = all_metrics(&args.ks);
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, metrics.iter().cloned())?;

for metric in &metrics {
let score = evaluated.mean_scores[metric];
for metric in metrics {
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, metric)?;
let score = evaluated.mean_score();
println!("{metric}\t{score:.4}");
}

Expand Down
18 changes: 3 additions & 15 deletions examples/from_json.rs
Original file line number Diff line number Diff line change
Expand Up @@ -61,22 +61,10 @@ fn main() -> Result<()> {
Metric::NDCG { k: 3 },
Metric::NDCGBurges { k: 3 },
];
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, metrics.iter().cloned())?;

println!("=== Mean scores ===");
for metric in &metrics {
let score = evaluated.mean_scores[metric];
println!("{metric}: {score:.4}");
}

println!("\n=== Scores for each query ===");
for metric in &metrics {
println!("{metric}");
let qid_to_score = &evaluated.all_scores[metric];
for qid in ["q_1", "q_2"] {
let score = qid_to_score[qid];
println!("- {qid}: {score:.4}");
}
for metric in metrics {
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, metric)?;
println!("{:?}: {:.4}", metric, evaluated.mean_score());
}

Ok(())
Expand Down
18 changes: 3 additions & 15 deletions examples/from_trec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -37,22 +37,10 @@ q_2 0 d_4 3 0.1 SAMPLE
Metric::NDCG { k: 3 },
Metric::NDCGBurges { k: 3 },
];
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, metrics.iter().cloned())?;

println!("=== Mean scores ===");
for metric in &metrics {
let score = evaluated.mean_scores[metric];
println!("{metric}: {score:.4}");
}

println!("\n=== Scores for each query ===");
for metric in &metrics {
println!("{metric}");
let qid_to_score = &evaluated.all_scores[metric];
for qid in ["q_1", "q_2"] {
let score = qid_to_score[qid];
println!("- {qid}: {score:.4}");
}
for metric in metrics {
let evaluated = elinor::evaluate(&gold_rels, &pred_rels, metric)?;
println!("{:?}: {:.4}", metric, evaluated.mean_score());
}

Ok(())
Expand Down
45 changes: 0 additions & 45 deletions examples/simple.rs

This file was deleted.

109 changes: 69 additions & 40 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
//! ```
//! # fn main() -> Result<(), Box<dyn std::error::Error>> {
//! use elinor::{GoldRelStoreBuilder, PredRelStoreBuilder, Metric};
//! use approx::assert_abs_diff_eq;
//!
//! // Prepare gold relevance scores.
//! let mut b = GoldRelStoreBuilder::new();
Expand All @@ -39,27 +40,21 @@
//! b.add_score("q_2", "d_3", 0.3.into())?;
//! let pred_rels = b.build();
//!
//! // The metrics to evaluate can be specified via Metric instances.
//! let metrics = vec![
//! Metric::Precision { k: 3 },
//! Metric::AP { k: 0 }, // k=0 means all documents.
//! // The instances can also be specified via strings.
//! "rr".parse()?,
//! "ndcg@3".parse()?,
//! ];
//! // Evaluate Precision@3.
//! let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::Precision { k: 3 })?;
//! assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);
//!
//! // Evaluate.
//! let evaluated = elinor::evaluate(&gold_rels, &pred_rels, metrics.iter().cloned())?;
//! // Evaluate MAP, where all documents are considered via k=0.
//! let evaluated = elinor::evaluate(&gold_rels, &pred_rels, Metric::AP { k: 0 })?;
//! assert_abs_diff_eq!(evaluated.mean_score(), 0.5000, epsilon = 1e-4);
//!
//! // Macro-averaged scores.
//! for metric in &metrics {
//! let score = evaluated.mean_scores[metric];
//! println!("{metric}: {score:.4}");
//! }
//! // => precision@3: 0.5000
//! // => ap: 0.5000
//! // => rr: 0.6667
//! // => ndcg@3: 0.4751
//! // Evaluate MRR, where the metric is specified via a string representation.
//! let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "rr".parse()?)?;
//! assert_abs_diff_eq!(evaluated.mean_score(), 0.6667, epsilon = 1e-4);
//!
//! // Evaluate nDCG@3, where the metric is specified via a string representation.
//! let evaluated = elinor::evaluate(&gold_rels, &pred_rels, "ndcg@3".parse()?)?;
//! assert_abs_diff_eq!(evaluated.mean_score(), 0.4751, epsilon = 1e-4);
//! # Ok(())
//! # }
//! ```
Expand All @@ -73,9 +68,9 @@ pub mod relevance;
pub mod statistical_tests;
pub mod trec;

use ordered_float::OrderedFloat;
use std::collections::HashMap;
use std::collections::HashSet;

use ordered_float::OrderedFloat;

pub use metrics::Metric;
pub use relevance::Relevance;
Expand All @@ -102,34 +97,68 @@ pub type PredRelStoreBuilder<K> = relevance::RelevanceStoreBuilder<K, PredScore>

/// Data type to store evaluated scores.
pub struct Evaluated<K> {
/// Metric to macro-averaged score.
pub mean_scores: HashMap<Metric, f64>,
scores: HashMap<K, f64>,
mean_score: f64,
}

impl<K> Evaluated<K> {
/// Returns the reference to the mappping from query ids to scores.
pub const fn scores(&self) -> &HashMap<K, f64> {
&self.scores
}

/// Metric to mapping from query ID to the score.
pub all_scores: HashMap<Metric, HashMap<K, f64>>,
/// Returns the macro-averaged score.
pub const fn mean_score(&self) -> f64 {
self.mean_score
}
}

/// Evaluates the given gold_rels and pred_rels data using the specified metrics.
pub fn evaluate<K, M>(
pub fn evaluate<K>(
gold_rels: &GoldRelStore<K>,
pred_rels: &PredRelStore<K>,
metrics: M,
metric: Metric,
) -> Result<Evaluated<K>, errors::ElinorError>
where
K: Clone + Eq + Ord + std::hash::Hash + std::fmt::Display,
M: IntoIterator<Item = Metric>,
{
let metrics: HashSet<Metric> = metrics.into_iter().collect();
let mut mean_scores = HashMap::new();
let mut all_scores = HashMap::new();
for metric in metrics {
let result = metrics::compute_metric(gold_rels, pred_rels, metric)?;
let mean_score = result.values().sum::<f64>() / result.len() as f64;
mean_scores.insert(metric, mean_score);
all_scores.insert(metric, result);
let scores = metrics::compute_metric(gold_rels, pred_rels, metric)?;
let mean_score = scores.values().sum::<f64>() / scores.len() as f64;
Ok(Evaluated { scores, mean_score })
}

#[cfg(test)]
mod tests {
use super::*;
use approx::assert_relative_eq;

#[test]
fn test_evaluate() -> Result<(), errors::ElinorError> {
let mut b = GoldRelStoreBuilder::new();
b.add_score("q_1", "d_1", 1)?;
b.add_score("q_1", "d_2", 0)?;
b.add_score("q_1", "d_3", 2)?;
b.add_score("q_2", "d_2", 2)?;
b.add_score("q_2", "d_4", 1)?;
let gold_rels = b.build();

let mut b = PredRelStoreBuilder::new();
b.add_score("q_1", "d_1", 0.5.into())?;
b.add_score("q_1", "d_2", 0.4.into())?;
b.add_score("q_1", "d_3", 0.3.into())?;
b.add_score("q_2", "d_4", 0.1.into())?;
b.add_score("q_2", "d_1", 0.2.into())?;
b.add_score("q_2", "d_3", 0.3.into())?;
let pred_rels = b.build();

let evaluated = evaluate(&gold_rels, &pred_rels, Metric::Precision { k: 3 })?;
assert_relative_eq!(evaluated.mean_score(), (2. / 3. + 1. / 3.) / 2.);

let scores = evaluated.scores();
assert_eq!(scores.len(), 2);
assert_relative_eq!(scores["q_1"], 2. / 3.);
assert_relative_eq!(scores["q_2"], 1. / 3.);

Ok(())
}
Ok(Evaluated {
mean_scores,
all_scores,
})
}
5 changes: 4 additions & 1 deletion src/metrics.rs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ pub(crate) mod precision;
pub(crate) mod r_precision;
pub(crate) mod recall;
pub(crate) mod reciprocal_rank;
pub(crate) mod success;

use std::collections::HashMap;
use std::fmt::Display;
Expand Down Expand Up @@ -336,7 +337,9 @@ where
let golds = gold_rels.get_map(query_id).unwrap();
let score = match metric {
Metric::Hits { k } => hits::compute_hits(golds, sorted_preds, k, RELEVANT_LEVEL),
Metric::Success { k } => hits::compute_success(golds, sorted_preds, k, RELEVANT_LEVEL),
Metric::Success { k } => {
success::compute_success(golds, sorted_preds, k, RELEVANT_LEVEL)
}
Metric::Precision { k } => {
precision::compute_precision(golds, sorted_preds, k, RELEVANT_LEVEL)
}
Expand Down
Loading
Loading