Skip to content

Commit 08293fa

Browse files
Merge pull request #183 from AnastasiiaAdapt/patch-1
Update multisumm.md
2 parents 30c3c2e + d56d6bb commit 08293fa

File tree

1 file changed

+104
-3
lines changed

1 file changed

+104
-3
lines changed

_editions/2026/tasks/multisumm.md

Lines changed: 104 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,8 +16,109 @@ FSI activities in the city that satisfy specified criteria. Evaluation will expl
1616

1717
#### Task description
1818

19-
The goal of the task is to explore the creation of multimodal summaries from multiple multimodal content items.
19+
The goal of the MultiSumm task is to explore the creation of multimodal summaries from multiple multimodal content items. Specifically, at MedaEval 2026, MultiSumm will explore the multimodal summarization of multiple websites. The websites for summarization will be provided by the H2020 Cultivate project. Cultivate is exploring online resources relating to Food Sharing Initiatives (FSIs) in urban and peri-urban environments for cities around the world. A key element of the Cultvate project is the creation of the ShareCity200 database. ShareCity200 will consist of an automatically crawled and curated database of FSIs identified using automated crawling present in 200 cities, primarily European, but also including cities from international territories beyond Europe. ShareCity200 is an extension and exploration of the ShareCity100 database created as part of an earlier project.
20+
21+
22+
Participants in MultiSumm will be provided with the crawled FSI web content for a small number of selected cities and asked to create a multimodal summary of the FSIs present in each city. Participants will be provided with details of the requirements for the summaries and details of the summary evaluation methods to be used.
23+
24+
25+
Since the ShareCity200 database will include details of FSIs in cities in many countries, and we are seeking to automate the evaluation process as much as possible, we will be open to including specific cities at the request of individual participants to expand the linguistic scope of the task.
26+
27+
28+
We define two tasks, main task and subtask for additional investigations:
29+
30+
31+
#### Main Task: Summarization of FSIs in English-Speaking Cities
32+
33+
34+
Participants will be asked to build a large language model-based summarization system that produces high-quality, detailed summaries of FSIs in cities such as Dublin (Ireland) and Brighton & Hove (U.K.). The summarization output must reflect:
35+
- Geographical distribution of FSIs by city districts
36+
- Types of initiatives (e.g., food sharing, swapping, gifting)
37+
- Operational level (government-funded, district-supported, community-led, etc.)
38+
- Popularity (e.g., estimated reach, activity levels, or attendance)
39+
- Public sentiment or feedback extracted from website content or user reviews
40+
- Visual component, including representative photos of prominent FSIs in the city
41+
- The attributes for each FSI (e.g. type of FSI initiative) are provided as part of the dataset.
42+
- The final summaries must be generated in English and presented in a clear multimodal format (e.g., combining text with images or structured visuals).
43+
#### Evaluation:
44+
Performance will be assessed using quantitative metrics including content coverage, coherence, informativeness, visual relevance, and structure alignment with the specified summary schema.
45+
46+
47+
#### Subtask: Cross-Cultural and Geographically Grounded Summarization
48+
This subtask extends the MultiSumm challenge along linguistic, cultural, and geographic dimensions, encouraging participants to explore how summarization models perform across diverse urban contexts.
49+
The subtask includes the following cities:
50+
- London as a large, complex English-speaking metropolis with highly diverse and decentralised food-sharing ecosystems.
51+
- Barcelona and Milan as major non-English-speaking European cities, representing different cultural, linguistic, and organisational contexts for food-sharing initiatives.
52+
In addition to multilingual and cross-cultural summarization, this subtask introduces an optional geographic grounding component.
53+
#### Optional extension: District-level heatmap summarization
54+
Participants may optionally generate summaries that reflect the spatial distribution of Food Sharing Initiatives (FSIs) across city districts or boroughs.
55+
For this extension:
56+
- FSIs are grouped by administrative districts or boroughs (where available).
57+
- Participants are encouraged to identify and describe spatial density patterns, using the following qualitative categorisation:
58+
- Green – districts with a high concentration of FSIs
59+
- Yellow – districts with a medium concentration of FSIs
60+
- Red – districts with a low concentration of FSIs
61+
62+
63+
The output does not require map generation. Instead, participants may:
64+
describe spatial patterns textually (e.g. “FSIs are concentrated in inner-city districts…”), or include a lightweight structured component indicating district-level density categories.
65+
This geographic extension is optional, applicable primarily to the subtask cities (London, Barcelona, Milan), not compulsory for participation or evaluation.
66+
67+
68+
Participants who wish to apply the same district-level heatmap analysis to the main task cities (Dublin and Brighton) are also very welcome to do so, and such submissions will be considered positively in the qualitative analysis.
69+
The summarization requirements are the same as for the main task, but the challenge here includes cross-lingual understanding and translation, alignment of heterogeneous data sources, and increased complexity in the urban FSI ecosystem.
70+
Participants are encouraged to propose additional cities, especially in different linguistic contexts, to support the multilingual vision of the ShareCity200 dataset and broaden the impact of their models.
71+
#### Motivation and background:
72+
Multidocument summarization for text documents has been a longstanding area of investigation. For example, for providing single summaries of multiple news articles on the same story. Traditionally, this process has been complex and inflexible in terms of content style and test, requiring the use of a wide variety of natural language processing (NLP) tools and detailed specification of the summarization process. The emergence of large language models (LLM) technologies has revolutionized many NLP tasks, including summarization. The more recent arrival of multimodal LLMs is similarly impacting on topics relating to multimedia content.
73+
While the MultiSumm tasks could be tackled using traditional NLP and multimedia processing tools, the expectation is that participants will tackle it using multimodal LLM methods. To the best of our knowledge, this will be the first benchmark task focusing on this topic and providing a potentially valuable venue for exploration of the potential and challenges of use of multimodal LLMs in tasks of this sort.
74+
#### Target group
75+
Researchers exploring the use of multimodal LLMs, potentially drawn from both the NLP and multimedia research communities. One of the nice features of LLM methods is that they enable researchers to engage with tasks for which they are not experts with the methods and tools traditionally used to address them.
76+
##### Data
77+
- A subset of the ShareCity200 database, including manually verified and labelled websites for each city
78+
- Web-crawled content with accompanying metadata (district, type, tags, language, etc.)
79+
- A reference schema for the summary format and example outputs: report template and visualisation
80+
- A set of evaluation metrics and guidelines for both text and visual components
81+
Optionally: Access to additional cities or languages upon request for experimentation
82+
83+
84+
#### Quest for insight
85+
Here are several research questions related to this challenge that participants can strive to answer in order to go beyond just looking at the evaluation metrics:
86+
- What are the challenges of creating multi source summaries of web content source?
87+
- What are the most effective approaches to applying LLM methods in multimodal summarization?
88+
- Identifying open research questions and challenges in applying LLM methods in multidocument summarization.
89+
- What is the effectiveness of using LLM-based evaluation methods in multidocument summarization? answer in order to go beyond just looking at evaluation metrics.
90+
91+
#### References and recommended reading
92+
- Zhu et al., “Multimodal Summarization: A Survey” (2020)
93+
https://arxiv.org/abs/2006.08835
94+
- Radev et al., “Centroid-based summarization of multiple documents” (2004)
95+
https://aclanthology.org/W04-1013/
96+
- Nenkova & McKeown, “A Survey of Text Summarization Techniques” (2012)
97+
https://www.cs.columbia.edu/~smaskey/CS6998/SurveySummarization.pdf
98+
- PEGASUS (Zhang et al., 2020)
99+
https://arxiv.org/abs/1912.08777
100+
- BART for Multi-Document Summarization
101+
https://aclanthology.org/2020.acl-main.703/
102+
103+
104+
105+
#### Task organizers
106+
Gareth J. F. Jones, Maynooth University, Ireland
107+
108+
109+
Anastasia Potyagalova, DCU, Ireland
110+
##### Task schedule
111+
Registration for task participation opens: January 2026
112+
113+
Test data release: 1 March 2026
114+
115+
Runs due: 1 May 2026
116+
117+
Working notes papers due: 31 May 2026
118+
119+
MediaEval 2026 Workshop, Sat.-Sun. 15-16 June 2026, Amsterdam, Netherlands and Online, co-located with ACM ICMR 2026
120+
121+
122+
20123

21-
Please watch here for an updated description. In the meantime, please see [the previous edition](https://multimediaeval.github.io/editions/2025/tasks/multisumm/) for more information. The task will build on last
22-
year’s task, but with links to new data.
23124

0 commit comments

Comments
 (0)