-
Notifications
You must be signed in to change notification settings - Fork 3
ParlaMint 5 #1975
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Meesch
wants to merge
52
commits into
develop
Choose a base branch
from
feature/parlamint-v5
base: develop
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
ParlaMint 5 #1975
Changes from all commits
Commits
Show all changes
52 commits
Select commit
Hold shift + click to select a range
4f37a79
improve docs (typos and broken link)
Meesch 76776ca
update launch json to include loadcorpora
Meesch 81463b2
preliminary corpus definition turkiye
Meesch ec5c678
add organisational metadata to parlamint turkiye
Meesch ffced2b
add more speaker metadata to parlamint turkiye
Meesch e3fe322
add role info for parlamint turkiye
Meesch 83554ea
add speaker_constituency to parlamint-turkiye
Meesch 0e931e4
cleanup
Meesch 4169706
add description page for parlamint-turkiye
Meesch 1c7844d
cleanup
Meesch 8c6afb7
additional documentation
Meesch 4d8fad8
Merge branch 'develop' into feature/parlamint-turkey
Meesch fd85664
update format function to read TEI-XML for Parlamint
Meesch 65b6c69
Merge branch 'develop' into feature/parlamint-turkey
Meesch ec19fc5
Merge branch 'feature/parlamint-turkey' of https://github.com/CentreF…
Meesch 79704d6
change behavior parliamentary corpora - include all languages specified
Meesch 5787f35
add translated_speech to parlamint corpora
Meesch 2f8c91a
update launch.json to delete the index when debugging
Meesch 2af429d
inital implementation of NER Parlamint
Meesch bff7a56
Merge branch 'develop' into feature/parlamint-turkey
Meesch a40d616
fix merge dependencies in parlamint
Meesch a083f35
update field default parliament to not make the keyword fields search…
Meesch b0434d6
create preliminary parlamint corpus for all corpora combined
Meesch 9b5acf6
add additional extractor for parlamint date field
Meesch c763457
add filter for political leaning
Meesch 6c26bbb
include non-MP option for parliamentary role
Meesch f255a73
preliminary parlamint corpus for all countries
Meesch 877bd94
use translated speech as the main content field for the full parlamin…
Meesch b9e10d7
activate all countries for the massive parlamint corpus
Meesch 6a9e2aa
add country to the visualisations
Meesch 695fcc8
fix capital typo
Meesch e801d3f
increase option count for country to include all countries
Meesch a5cc664
add preliminary language constants for each parlamint country
Meesch b6de6fc
Merge branch 'develop' into feature/parlamint-v5
Meesch 2517201
start dividing subcorpora per country
Meesch 44c18de
implement supplementary stopword lists
Meesch c9fa0bd
include stopword lists for several unsupported languages
Meesch 6382303
include corpora for each parlamint country
Meesch 8fa208a
make speech and speech_translated new FieldDefinitions instead of def…
Meesch b725e95
change parlamint-all index name
Meesch 5e3356d
fix: wrong variable name
Meesch 0f9d173
workaround for translated_speech for the UK
Meesch 6e992c2
include chamber field in parlamint
Meesch 8f54436
add government field to parlamint
Meesch 5cf88e4
include workaround for translated_speech for the UK
Meesch 8a0fa70
include names for each parliament
Meesch ee0aa43
documentation
Meesch 2d05295
include ministerial role in parlamint corpora
Meesch 8c8786f
improve logic
Meesch 1132e20
include markdown description for parlamint
Meesch fc96786
harmonise legacy and recent parlamint logic into two files
Meesch 4e729ca
restore legacy parlamint utils file for p&p finland corpus
Meesch File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletions
12
backend/addcorpus/stopword_data/supplementary_data/README.md
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 👍 👍 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,12 @@ | ||
| ## Supplementary Data Sources | ||
| Source 1: For Bulgarian, Czech, Croatian, Galician, Latvian, and Ukrainian, stopword lists were downloaded from this [Github repository](https://github.com/negapedia/nltk/tree/master/corpora/stopwords), by [Marco Chilese](https://github.com/MarcoChilese). The stopword lists are a combination of nltk stopwords (where available) and stopwords from [ranks.nl](https://www.ranks.nl/stopwords/). They were downloaded on 2025-12-18. | ||
|
|
||
| Source 2: For Bosnian stopwords, the following publication was used: Sead Jahić, & Jernej Vičič. (2023). Lists of stopwords, polarity shifters and AnAwords of Bosnian language [Data set]. Zenodo. https://doi.org/10.5281/zenodo.10373141 | ||
|
|
||
| Source 3: For Estonian stopwords, the following Github repository was used: https://github.com/stopwords-iso/stopwords-et?tab=readme-ov-file. | ||
|
|
||
| Source 4: For Icelandic stopwords, the following Github repository was used: https://github.com/ViktorMS/stoppord/blob/master/stoppord.csv | ||
|
|
||
| Source 5: For Serbian stopwords, the following Github repository was used: https://github.com/Xangis/extra-stopwords/blob/master/serbian | ||
|
|
||
| Source 6: For Slovenian stopwords, the following Github repository was used: https://github.com/stopwords-iso/stopwords-sl/blob/master/raw/gh-stopwords-json-sl.txt |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid repetition: