Ekstra Bladet News Recommendation Dataset (EB-NeRD)

Contributor

About

This repository serves as a toolbox for working with the Ekstra Bladet News Recommendation Dataset (EB-NeRD)—a rich dataset designed to advance research and benchmarking in news recommendation systems.

EB-NeRD is based on user behavior logs from Ekstra Bladet, a classical Danish newspaper published by JP/Politikens Media Group in Copenhagen. The dataset was created as part of the 18th ACM Conference on Recommender Systems Challenge (RecSys'24 Challenge).

What You'll Find Here

This repository provides:

Starter notebooks for descriptive data analysis, data preprocessing, and baseline modeling.
Examples of established models to kickstart experimentation.
A step-by-step tutorial for running a CodaBench server locally, which is required to evaluate models on the hidden test set.

🔗 Useful Links

Resource	Description
recsys.eb.dk	Main dataset website with detailed documentation
CodaBench Setup Guide	Local evaluation server setup instructions
RecSys'24 Challenge	Original competition page

ℹ️ Dataset Information

📊 Size: Large-scale news recommendation dataset
🏢 Source: Ekstra Bladet (Danish newspaper)
📅 Period: User behavior logs from JP/Politikens Media Group
🎯 Focus: Balancing accuracy and editorial values in news recommendations

📚 Key Papers / References

Below are important papers related to this repository:

Title	Authors	Venue / Year
RecSys Challenge 2024: Balancing Accuracy and Editorial Values in News Recommendation	J. Kruse, K. Lindskow, S. Kalloori, M. Polignano, C. Pomo, A. Srivastava, A. Uppal, M. R. Andersen, J. Frellsen	Proc. ACM RecSys ’24
EB-NeRD: A large-scale dataset for news recommendation	J. Kruse, K. Lindskow, S. Kalloori, M. Polignano, C. Pomo, A. Srivastava, A. Uppal, M. R. Andersen, J. Frellsen	Proc. RecSys Challenge ’24
Proceedings of the RecSys ’24: 18th ACM Conference on Recommender Systems	— (conference proceedings volume)	Proc. RecSys ’24
Why design choices matter in recommender systems	J. Kruse, K. Lindskow, M. R. Andersen, J. Frellsen	Nature Machine Intelligence vol. 7 (2025)

🚀 Quick Start

Want to jump right in? Here's a 5-minute setup:

# 1. Clone and install
git clone https://github.com/ebanalyse/ebnerd-benchmark.git
cd ebnerd-benchmark
pip install .

# 2. Run your first model
python examples/quick_start/nrms_dummy.py

# 3. Explore the data (optional)
# Open examples/datasets/ebnerd_overview.ipynb in Jupyter

🛠️ Installation

We recommend using conda for environment management.

Prerequisites

Python: 3.10+ (recommended: 3.11)
RAM: Minimum 8GB (16GB+ recommended for larger datasets)
Storage: ~2GB for repository + dataset storage space
OS: Linux, macOS, or Windows

Standard Installation

# 1. Create and activate a new conda environment
conda create -n <environment_name> python=3.11
conda activate <environment_name>

# 2. Clone this repo within VSCode or using command line:
git clone https://github.com/ebanalyse/ebnerd-benchmark.git
cd ebnerd-benchmark

# 3. Install the core ebrec package to the environment:
pip install .

# 4. Verify installation
python -c "import ebrec; print('✅ Installation successful!')"

🍎 M1 Mac Users

We have encountered issues installing TensorFlow on M1 MacBooks when using conda (sys_platform == 'darwin').

Recommended Workaround - Use venv:

python3 -m venv .venv
source .venv/bin/activate
pip install .

Alternative - Conda with local environment:

conda create -p .venv python=3.11.8
conda activate ./.venv
pip install .

GPU Support

To enable GPU support, install the appropriate TensorFlow package based on your platform:

# For Linux
pip install tensorflow-gpu

# For macOS
pip install tensorflow-macos

🔬 Algorithms

We have implemented several state-of-the-art news recommender systems to get you started quickly:

Model	Description	Notebook	Example
NRMS	Neural News Recommendation with Multi-Head Self-Attention	📓 Notebook	🔗 Code
LSTUR	Long- and Short-term User Representations for news recommendation	-	🔗 Code
NPA	Neural News Recommendation with Personalized Attention	-	🔗 Code
NAML	Neural News Recommendation with Attentive Multi-View Learning	-	🔗 Code
NRMSDocVec	NRMS variant using pre-trained document embeddings	-	🔗 Code

The implementations of NRMS, LSTUR, NPA, and NAML are adapted from the excellent recommenders repository, with all non-model-related code removed for simplicity. NRMSDocVec is our variation of NRMS where the NewsEncoder is initialized with document embeddings (i.e., article embeddings generated from a pretrained language model), rather than learning embeddings solely from scratch.

📊 Data Manipulation & Enrichment

To help you get started, we have created a set of introductory notebooks designed for quick experimentation, including:

ebnerd_descriptive_analysis: Basic descriptive analysis of EB-NeRD.
ebnerd_overview: Demonstrates how to join user histories and create binary labels.

Note: These notebooks were developed on macOS. Small adjustments may be required for other operating systems.

🔄 Reproduce EB-NeRD Experiments

Make sure you've installed the repository and dependencies. Then activate your environment:

Activate your environment:

conda activate <environment_name>

NRMSModel

python examples/reproducibility_scripts/ebnerd_nrms.py
  --datasplit ebnerd_small \
  --epochs 5 \
  --bs_train 32 \
  --bs_test 32 \
  --history_size 20 \
  --npratio 4 \
  --transformer_model_name FacebookAI/xlm-roberta-large \
  --max_title_length 30 \
  --head_num 20 \
  --head_dim 20 \
  --attention_hidden_dim 200 \
  --learning_rate 1e-4 \
  --dropout 0.20

Tensorboards:

tensorboard --logdir=ebnerd_predictions/runs

NRMSDocVec

python examples/reproducibility_scripts/ebnerd_nrms_docvec.py \
  --datasplit ebnerd_small \
  --epochs 5 \
  --bs_train 32 \
  --history_size 20 \
  --npratio 4 \
  --document_embeddings Ekstra_Bladet_contrastive_vector/contrastive_vector.parquet \
  --head_num 16 \
  --head_dim 16 \
  --attention_hidden_dim 200 \
  --newsencoder_units_per_layer 512 512 512 \
  --learning_rate 1e-4 \
  --dropout 0.2 \
  --newsencoder_l2_regularization 1e-4

Tensorboards:

tensorboard --logdir=ebnerd_predictions/runs

🛠️ Troubleshooting

Common Issues

ImportError: No module named 'ebrec'

# Make sure you're in the right environment and have installed the package
conda activate <environment_name>
pip install -e .

TensorFlow GPU not found

# Verify GPU installation
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Memory errors during training

Reduce batch size (--bs_train, --bs_test)
Reduce history size (--history_size)
Use gradient checkpointing

M1 Mac TensorFlow issues

Use the venv workaround mentioned in installation
Consider using tensorflow-macos instead

Getting Help

📫 Issues: GitHub Issues

📄 Cite Our Work

If you use this repository, our methods, or datasets in your research, please cite the following papers:

@inproceedings{kruse2024recsys_challenge,
  author    = {Kruse, Johannes and Lindskow, Kasper and Kalloori, Saikishore and Polignano, Marco and Pomo, Claudio and Srivastava, Abhishek and Uppal, Anshuk and Andersen, Michael Riis and Frellsen, Jes},
  title     = {RecSys Challenge 2024: Balancing Accuracy and Editorial Values in News Recommendations},
  booktitle = {Proceedings of the 18th ACM Conference on Recommender Systems},
  series    = {RecSys '24},
  year      = {2024},
  pages     = {1195--1199},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  doi       = {10.1145/3640457.3687164},
  url       = {https://doi.org/10.1145/3640457.3687164},
  keywords  = {Beyond-Accuracy, Competition, Dataset, Editorial Values, News Recommendations, Recommender Systems}
}

@inproceedings{kruse2024ebnerd,
  author    = {Kruse, Johannes and Lindskow, Kasper and Kalloori, Saikishore and Polignano, Marco and Pomo, Claudio and Srivastava, Abhishek and Uppal, Anshuk and Andersen, Michael Riis and Frellsen, Jes},
  title     = {EB-NeRD: A Large-scale Dataset for News Recommendation},
  booktitle = {Proceedings of the Recommender Systems Challenge 2024},
  series    = {RecSysChallenge '24},
  year      = {2024},
  pages     = {1--11},
  publisher = {Association for Computing Machinery},
  address   = {New York, NY, USA},
  doi       = {10.1145/3687151.3687152},
  url       = {https://doi.org/10.1145/3687151.3687152},
  keywords  = {Beyond-Accuracy, Dataset, Editorial Values, News Recommendations, Recommender Systems}
}

@article{kruse2025design_choices,
  author    = {Kruse, Johannes and Lindskow, Kasper and Andersen, Michael Riis and Frellsen, Jes},
  title     = {Why Design Choices Matter in Recommender Systems},
  journal   = {Nature Machine Intelligence},
  year      = {2025},
  volume    = {7},
  number    = {6},
  pages     = {979--980},
  doi       = {10.1038/s42256-025-01043-5},
  url       = {https://doi.org/10.1038/s42256-025-01043-5},
  publisher = {Nature},
  note      = {In press}
}

Name		Name	Last commit message	Last commit date
Latest commit History 205 Commits
codabench		codabench
examples		examples
src		src
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ekstra Bladet News Recommendation Dataset (EB-NeRD)

Contributor

About

What You'll Find Here

🔗 Useful Links

ℹ️ Dataset Information

📚 Key Papers / References

🚀 Quick Start

🛠️ Installation

Prerequisites

Standard Installation

🍎 M1 Mac Users

GPU Support

🔬 Algorithms

📊 Data Manipulation & Enrichment

🔄 Reproduce EB-NeRD Experiments

NRMSModel

Tensorboards:

NRMSDocVec

Tensorboards:

🛠️ Troubleshooting

Common Issues

Getting Help

📄 Cite Our Work

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ebanalyse/ebnerd-benchmark

Folders and files

Latest commit

History

Repository files navigation

Ekstra Bladet News Recommendation Dataset (EB-NeRD)

Contributor

About

What You'll Find Here

🔗 Useful Links

ℹ️ Dataset Information

📚 Key Papers / References

🚀 Quick Start

🛠️ Installation

Prerequisites

Standard Installation

🍎 M1 Mac Users

GPU Support

🔬 Algorithms

📊 Data Manipulation & Enrichment

🔄 Reproduce EB-NeRD Experiments

NRMSModel

Tensorboards:

NRMSDocVec

Tensorboards:

🛠️ Troubleshooting

Common Issues

Getting Help

📄 Cite Our Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages