Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
cb03cf4
add rds reader def
lukavdplas Jan 12, 2026
4af228e
add sample file
lukavdplas Jan 12, 2026
4f95805
add fields to eupdcorpreader
lukavdplas Jan 12, 2026
21d69a9
fix name formatting, add source archive field
lukavdplas Jan 12, 2026
584b356
add original language field
lukavdplas Jan 12, 2026
fb6549e
correct xml parser
lukavdplas Jan 12, 2026
35636a0
filter speeches without speaker
lukavdplas Jan 13, 2026
323db8f
catch if no memberships
lukavdplas Jan 13, 2026
57a51b8
fix when no response for person metadata
lukavdplas Jan 13, 2026
32250e6
fix multi-paragraph text
lukavdplas Jan 13, 2026
1ffc09b
fix no <speech> tag
lukavdplas Jan 13, 2026
8fe848f
consistent party names
lukavdplas Jan 13, 2026
1176efa
cleaner code
lukavdplas Jan 13, 2026
947bca4
filter records with no speech content
lukavdplas Jan 13, 2026
4253526
add original language code
lukavdplas Jan 13, 2026
193a854
include original speech in APi corpus
lukavdplas Jan 13, 2026
319f1e3
add speaker metadata in api corpus
lukavdplas Jan 14, 2026
9f9e9f9
update test data
lukavdplas Jan 14, 2026
5e732fb
url utility function for eu api
lukavdplas Jan 14, 2026
1f8a317
remove url field
lukavdplas Jan 14, 2026
4e34080
remove rdf test files
lukavdplas Jan 14, 2026
900c077
update debugger conf
lukavdplas Jan 14, 2026
1778401
small fixes
lukavdplas Jan 14, 2026
a68f92e
Merge branch 'develop' into feature/eudpcorp
lukavdplas Jan 22, 2026
620b00c
make max date configurable
lukavdplas Jan 22, 2026
8c9fcae
update documentation
lukavdplas Jan 22, 2026
4cafa49
fix nan values
lukavdplas Jan 22, 2026
f0c03a9
fix operator order
lukavdplas Jan 22, 2026
5eec0a5
fix mapping for source archive field
lukavdplas Jan 22, 2026
7147f45
update subcorpus dates
lukavdplas Jan 22, 2026
cd855f5
field presentation
lukavdplas Jan 22, 2026
d4941d3
fix missing keys error
lukavdplas Jan 26, 2026
c615787
add log statement
lukavdplas Jan 27, 2026
d7fbea4
get sequence from data
lukavdplas Jan 28, 2026
a9c556a
updated speech loop
lukavdplas Jan 28, 2026
21d02fa
handle empty debate ids + add comments/docstrings
lukavdplas Jan 28, 2026
b26670a
code clarity
lukavdplas Jan 28, 2026
d110125
update documentation
lukavdplas Jan 28, 2026
b435765
remove unused test file
lukavdplas Jan 28, 2026
a10ad08
fix data splicing
lukavdplas Jan 28, 2026
53957a9
Merge branch 'develop' into feature/eudpcorp
lukavdplas Jan 29, 2026
42b9a45
Merge branch 'develop' into feature/eudpcorp
JeltevanBoheemen Feb 2, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 7 additions & 6 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
"configurations": [
{
"name": "django: runserver",
"type": "python",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/backend/manage.py",
"args": ["runserver"],
Expand All @@ -15,7 +15,7 @@
},
{
"name": "django: shell",
"type": "python",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/backend/manage.py",
"args": ["shell"],
Expand All @@ -24,16 +24,17 @@
},
{
"name": "django: index",
"type": "python",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/backend/manage.py",
"args": ["index", "${input:corpusName}"],
"args": ["index", "${input:corpusName}", "--delete"],
"django": true,
"justMyCode": true
"justMyCode": true,
"console": "internalConsole"
},
{
"name": "django: loadcorpora",
"type": "python",
"type": "debugpy",
"request": "launch",
"program": "${workspaceFolder}/backend/manage.py",
"args": ["loadcorpora"],
Expand Down
75 changes: 45 additions & 30 deletions backend/corpora/parliament/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def parliament_corpora_settings(settings):
settings.PP_CANADA_DATA = os.path.join(here, 'tests', 'data', 'canada')
settings.PP_DENMARK_DATA = os.path.join(here, 'tests', 'data', 'denmark')
settings.PP_DENMARK_NEW_DATA = os.path.join(here, 'tests', 'data', 'denmark-new')
settings.PP_EUPARL_DATA = os.path.join(here, 'tests', 'data', 'euparl', 'rdf')
settings.PP_EUPARL_DATA = os.path.join(here, 'tests', 'data', 'euparl', 'rds')
settings.PP_FINLAND_DATA = os.path.join(here, 'tests', 'data', 'finland')
settings.PP_FINLAND_OLD_DATA = os.path.join(here, 'tests', 'data', 'finland-old')
settings.PP_FR_DATA = os.path.join(here, 'tests', 'data', 'france')
Expand Down Expand Up @@ -685,48 +685,63 @@ def parliament_corpora_settings(settings):
"name": "parliament-europe",
"start": datetime(1999, 7, 20),
"docs": [
# EUPDCorp data
{
"id": "1999-07-21-Speech-3-063",
"date": "1999-07-21",
"debate_id": "1999-07-21_AgendaItem_5",
"debate_title": "Statement by Mr Prodi, President-elect of the Commission",
"party": "Group for the Technical Coordination and Defence of Indipendent Groups and Members (TGI)",
"sequence": 15,
"speaker": "Francesco Enrico Speroni",
"speaker_country": "Italy",
"speech": """Mr President, as a Member of the Italian national Parliament for the\n(The Northern League for the Independence of Padania), I did not vote for Professor Prodi in Rome as I considered he would be completely useless as head of government. I was then proved right as he lost the vote of confidence of the Italian Parliament. Reckoning also that a Roman idiot would still be that stupid wherever he was, which, incidently, is reflected in the symbol on the list which bears his name for the election of this Parliament, I cannot for consistency\"s sake express my faith in the President of the Commission. As a native of the Po valley who is Italian only by passport, I am fortunately immune from the national Christian Democrat type of opportunism which brings Berlusconi together with Mastella and De Mita and sees in Prodi not the impartial President of the Commissioners uninfluenced by the States, but the lavish dispenser of favours to a wide and varied assortment of Southern Italian profiteers. Although I hold some of the Commissioners in high esteem, I recall the old mafioso Neapolitan saying: ‘A fish rots from the head downwards’ and I therefore have to express my negative opinion of the Prodi Presidency.""",
"source_language": "Italian",
"url": "http://purl.org/linkedpolitics/eu/plenary/1999-07-21-Speech-3-063",
'date': '1999-07-20',
'debate_title': 'Genoptagelse af sessionen',
'debate_id': 'CRE-5-1999-07-20-FNL',
'speaker': 'Giorgio Napolitano',
'party': 'IND',
'party_full': 'Independent',
'party_national': 'Democratici di Sinistra',
'speaker_country': 'Italy',
'speaker_gender': 'Male',
'speaker_birth_year': 1925,
'speaker_id': '1103',
'speech_original': 'Dichiaro ripresa la sessione interrotta il 7 maggio 1999 e '
"dichiaro aperta la seduta prevista all'articolo 10, paragrafo 3, "
"dell'Atto recante elezione dei rappresentanti al Parlamento "
"europeo a suffragio universale diretto nonché all'articolo 10, "
"paragrafo 3, del Regolamento del Parlamento. L'onorevole Crowley "
'ha chiesto, pregiudizialmente, la parola.',
'speech': 'I declare resumed the session adjourned on 7 May 1999 and open '
'the sitting provided for in Article 10 (3) of the Act electing '
'the representatives of the European Parliament by direct '
"universal suffrage and in Article 10 (3) of Parliament's Rules "
'of Procedure. Mr Crowley has asked for the floor on a point of '
'order.',
'original_language': 'Italian',
'sequence': 1,
'source_archive': 'EUPDCorp',
},
{
"id": "2017-07-06-Speech-4-146-000",
"date": "2017-07-06",
"debate_id": "2017-07-06_AgendaItem_13",
"debate_title": "Composition of committees and delegations",
"party": None,
"sequence": 2,
"source_language": "English",
"speaker": "Ashley Fox",
"speaker_country": "United Kingdom",
"speech": """Mr President, yesterday afternoon we had a lively debate, under Rule 153, on the subject of a single seat for this Parliament. Unfortunately, under that rule, it was not possible to have a resolution, but it was the clear will of this House that we bring forward a report to propose a treaty change. So, as Mr Weber and Mr Pittella are in their seats, could they please take note of the view of this House and, when the matter comes to the Conference of Presidents, could they please authorise that report?""",
"url": "http://www.europarl.europa.eu/plenary/EN/vod.html?mode=unit&vodLanguage=EN&startTime=20170706-12:02:01-324",
'speaker': 'Brian Crowley',
'speaker_country': 'Ireland',
'sequence': 2,
},
{}, {}, {},
# API data
{
"date": "2024-11-13",
"debate_id": "MTG-PL-2024-11-13-PVCRE-ITM-17",
"debate_title": "17. Fight against money laundering and terrorist financing: listing Russia as a high-risk third country in the EU (debate)",
"debate_title": "Fight against money laundering and terrorist financing: listing Russia as a high-risk third country in the EU (debate)",
"id": "MTG-PL-2024-11-13-OTH-2017005042457",
"party": "European Conservatives and Reformists Group",
"party": 'ECR',
'party_full': 'European Conservatives and Reformists',
'party_national': None,
"party_id": "7037",
"source_language": "English",
"sequence": 1,
'original_language': "English",
"sequence": 321,
"speaker": "Roberts Zīle",
"speaker_country": "Latvia",
'speaker_gender': 'Male',
'speaker_birth_year': 1958,
"speaker_id": "28615",
"speech": "Thank you, Commissioner McGuinness, and I would also like to thank you for your work on the AML package and many other issues, also for today's issues. Thank you very much.",
},
"speech": "Thank you, Commissioner McGuinness, and I would also like to thank you for your work on the AML package and many other issues, also for today's issues. Thank you very much.\n\nThe concludes the item.",
'source_archive': 'European Parliament Open Data API',
}
],
"n_documents": 3,
"n_documents": 6,
},
{
'name': 'parliament-sweden-swerik',
Expand Down
26 changes: 25 additions & 1 deletion backend/corpora/parliament/description/euparl.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,28 @@
The debates from the European Parliament, in English (translation), as provided by the [Talk of Europe](https://ssh.datastations.nl/dataset.xhtml?persistentId=doi:10.17026/dans-x62-ew3m&version=1.0) dataset. The dataset covers debates from July 1999 to July 2017.
Debates from the European Parliament (EP). As the parliament of the European Union, the EP has representatives from all member states. Members are organised into political groups, which are broad alliances of national parties.

## Source data

The European Parliament corpus in People & Parliament is based two datasets: [EUPDCorp](https://doi.org/10.5281/zenodo.15056399) (CC-BY 4.0 International licence) is used for debates from 1999 to February 2024 (terms 5-9).

Debates from February 2024 to January 2026 are sourced from the [European Parliament Open Data API](https://data.europarl.europa.eu/en/developer-corner/opendata-api) (CC-BY 4.0 International licence).

**References:**

- Mochtak, Michal (2025): Corpus of the EU Parliament Debates (EUPDCorp), 1999-2024, Zenodo, v1.0, https://doi.org/10.5281/zenodo.15056399

## Notes

### Language and translations

Speakers in the European Parliament use a large number of different languages. The documents in the this corpus include both the original speech, and an English translation.

Translations are taken from the source datasets (see above). Translations in the EUPDCorp dataset are machine-translated.

### Parties

The _party_ field specifies the political group of the speaker in the European Parliament. In addition, the _national party_ field specifies the speaker's party at the national level.

Because the data is extracted from multiple datasets, the corpus may use a different name for the same political group before and after 2024.

### Image attribution

Expand Down
Loading