Resources for combining teaching and research in information retrieval coursework.
The resources are intended as a collection of supplementary materials for exercises in IR courses that let students experience the full research cycle in their course. We provide:
- A dashboard to support brainstorming
- Explore datasets, approaches, and evaluation techniques
- Easy to use online
- Links to in-depth resources
- An
ir_datasets
browser to explore datasets and runs from TIREx- Reuse and explore strong baselines from TIREx
- Deep links for referencing in research papers
- Hosted on GitHub Pages and Zenodo
- A set of tutorials covering IR concepts
- Showcases a single concept using small example data
- Takes about 15 minutes per tutorial
- Implemented as Jupyter notebooks in GitHub Codespaces
- A way to archive finished courses
- Explore topics, documents, relevance judgments, and submitted runs
- Leaderboards encourage competition between students
- Course results are easily re-usable for research
Read more about the resources in our accompanying research paper:
Resources for Combining Teaching and Research in Information Retrieval Coursework (abstract)
Overview screencast |
Please watch our short screencast showing our resources on YouTube to get an overview of our teaching resources.
The easiest way to start with the tutorials is to open this repository in GitHub Codespaces:
This will install all the necessary software. Just wait until the editor window has fully loaded (i.e., no progress bars visible; may take a while).
The other resources (i.e., dashboard, ir_datasets
browser, and archived courses) are static web apps that you can run in your web browser.
In the following, you will learn how to use each of the four main components of our resources:
the dashboard, the ir_datasets
browser, the tutorials, and the archived courses.
The sections roughly follow the order as you would use the components in your course (either as a student or teacher).
Check out the dashboard web app at: https://tira-io.github.io/teaching-ir-with-shared-tasks
Dashboard web app |
Explore existing datasets, retrieval components, and evaluation measures with deep links to implementations and papers. Components can be filtered to only include, e.g., components with code available or with a corresponding tutorial. To focus your search on a specific goal, e.g., precision-oriented components, select a research focus from the dropdown list.
The ir_datasets
browser can be used to explore existing datasets: https://tira-io.github.io/ir-dataset-browser
ir_datasets browser web app |
Here are some examples that can be found using the browser:
A total of 13 datasets are already available to be explored online. (Some others could not be included due to their licenses.)
Our hands-on tutorials lower the barrier of entry to implementing IR models and experiments for IR students. You can easily run the tutorials online:
A full list of all covered tutorials and further information on how to run the tutorials on your local machine can be found in the tutorial readme.
We also include tools that ease uploading pooled documents and downloading relevance judgments to/from the Doccano annotation platform. To use these tools, follow these steps:
-
Install Python 3.10 or later.
-
Create and activate a virtual environment:
python3.10 -m venv venv/ source venv/bin/activate
-
Install dependencies:
pip install -e .
-
Create top-k pools of documents retrieved by TIREx baselines (assuming a file data//topics.xml exists):
teaching-ir pool-documents --pooling-depth 10 data/<YOUR-COURSE>/
-
Prepare the relevance judgments in Doccano like so:
teaching-ir prepare-relevance-judgments --doccano-url https://doccano.web.webis.de/ --doccano-username admin --doccano-password <PASSWORD> project-prefix data/<YOUR-COURSE>/
-
All teams can now work on their relevance judgments.
-
Export the relevance judgments as qrels from Doccano like so:
teaching-ir export-relevance-judgments project-prefix /path/to/pool1.jsonl /path/to/pool2.jsonl ... /path/to/qrels.txt
-
Once the semester is over and when you have exported all data, clean up the projects and users on Doccano like so:
teaching-ir clean-up project-prefix
Please refer to the teaching-ir
command's help (i.e., run teaching-ir --help
) for more detailed options.
The below list includes finished (✅), ongoing (⏳) and future (🔜) IR courses that use shared task-oriented teaching. The finished courses have been archived on Zenodo and are accessible via GitHub Pages. To explore their topics and relevance judgments, click on the "browser" links. Get in touch to integrate your course too!
📅 | Semester | Course | University | Browser | Source |
---|---|---|---|---|---|
✅ | Summer semester 2023 | Information Retrieval | Leipzig University | 🔗 | 🔗 |
✅ | Summer semester 2023 | Advanced Information Retrieval | Friedrich-Schiller-Universität Jena | 🔗 | 🔗 |
✅ | Winter semester 2023/2024 | Advanced Information Retrieval | Leipzig University | 🔗 | 🔗 |
✅ | Winter semester 2023/2024 | Information Retrieval | Friedrich-Schiller-Universität Jena | 🔗 | 🔗 |
⏳ | Summer semester 2024 | Search Engines and Neural Information Retrieval | Augsburg University | 🔗 | 🔗 |
⏳ | Summer semester 2024 | Advanced Information Retrieval | Friedrich-Schiller-Universität Jena | 🔗 | 🔗 |
⏳ | Summer semester 2024 | Information Retrieval | TH Köln | 🔗 | 🔗 |
⏳ | Summer semester 2024 | Information Retrieval | Leipzig University | 🔗 | 🔗 |
🔜 | soon | your IR course | get in touch 💬 | 🔜 | 🔜 |
Our accompanying research paper includes a case study and describes our experiences of using our resources in the IR courses of two universities over two semesters.
We took inspiration from some great tutorials and resources out there. Of course, our resources should not replace but complement them:
With the plethora of new retrieval approaches emerging every year, it is hard for us alone to keep all resources up-to-date and to add new tutorials. We would be extremely happy if you (as an IR teacher) could take some time to improve an existing notebook or propose a new one!
Contributing to the resources is as easy as using it: Just open this repository in GitHub Codespaces (or clone it and open the repo in a Dev container with your favorite IDE).
We would be glad to support you in applying shared task style teaching for your information retrieval course! Do not hesitate to write us an email or file an issue:
- Maik Fröbe maik.froebe@uni-jena.de
- Harrisen Scells
- Theresa Elstner
- Christopher Akiki
- Lukas Gienapp
- Jan Heinrich Merker heinrich.merker@uni-jena.de
- Sean MacAvaney
- Benno Stein
- Matthias Hagen
- Martin Potthast
We're happy to help!
If you use our resources or its tutorials in your research, please cite the following paper:
Maik Fröbe, Harrisen Scells, Theresa Elstner, Christopher Akiki, Lukas Gienapp, Jan Heinrich Reimer, Sean MacAvaney, Benno Stein, Matthias Hagen, and Martin Potthast. Resources for Combining Teaching and Research in Information Retrieval Courses. In 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024), July 2024. ACM.
You can use the following BibTeX entry for citation:
@InProceedings{froebe:2024a,
author = {Maik Fr{\"o}be and Harrisen Scells and Theresa Elstner and Christopher Akiki and Lukas Gienapp and Jan Heinrich Reimer and Sean MacAvaney and Benno Stein and Matthias Hagen and Martin Potthast},
booktitle = {47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2024)},
month = jul,
numpages = 11,
publisher = {ACM},
title = {{Resources for Combining Teaching and Research in Information Retrieval Courses}},
year = 2024
}
If you use the resources in your research, we'd be glad if you'd cite us.
A recent study has shown that students in IR courses are especially motivated and learn more effectively when they participate in shared tasks as part of their coursework. We thus present a range of tools and resources that support teachers in integrating research in their IR courses. Based on TIREx and ir_datasets, our Web IDE-based applications and tutorials cover the process of a typical shared task in IR and allow students to gain hands-on experience with experimental IR research—from creating test collections over developing retrieval systems to making relevance judgments and finally statistically analyzing the results. Using our tools, IR research coursework can be conducted on existing or new collections but can also be coupled with an upcoming shared task to which students can optionally submit their final approaches. We do not only present our tools and resources, but also report on our experiences in implementing the corresponding teaching concept in four IR courses for students at two universities. Our results confirm that students are very motivated to conduct research, and we also find that some of the resulting artifacts (e.g., students’ test collections and retrieval approaches) are of genuinely high quality.