This repository provides some data and scripts related to the paper:
- von Nordheim, G., Rieger, J. & Kleinen-von Königslöw, K. (2021). From the fringes to the core – An analysis of right-wing populists’ linking practices in seven EU parliaments and Switzerland. Digital Journalism.
For bug reports, comments and questions please use the issue tracker.
- tosca is used for managing and manipulating the text data to a structure requested by
ldaPrototype
. - ldaPrototype is used to determine a prototype from a number of runs of Latent Dirichlet Allocation.
- ineq is used to calcaulate Gini coefficients.
- longurl is used for expanding short urls and urltools is useful for extracting url cores from urls.
- RCurl and RJSONIO are used for scraping.
- batchtools is used for calculating (prototypes of) LDAs on the High Performace Compute Cluster LiDO3.
- data.table is used for managing and storing tabular data, e.g. texts and meta information.
- tm, lubridate and utf8 are useful packages for preprocessing and managing text data.
- ggplot2, ggpubr, ggrepel and cividis are used to create the plots.
- beanplot is used to visualize advanced boxplots.
Please note: For legal reasons the repository cannot provide all data. Please let us know if you feel that there is anything missing that we could add.
In the code
folder you can view and trace the chronology of the R code used.
The folder countries
contains the following structure for all examined countries:
- the subfolder
docs
contains the used preprocessed texts (bag of words with indices of vocabulary - seevocab.txt/.rds
), - the subfolder
proto
contains for all considered values K=20,25, ..., 75 the LDAPrototype models, - the subfolder
tables
contains some descriptive statistics in tabular form, docs.rds
contains the preprocessed texts in the form that can be used with the ldaPrototype package,onepercent.txt
specifies the parties of the country that would pass an artificial 1% hurdle,parties.csv
andparties_col.csv
contain information about party abbreviations and used colors for the party,topwords30.csv
contains the 50 topwords for all K=30 topics of the corresponding LDAPrototype model,vocab.txt
contains the corresponding vocabulary,vocab.rds
as RDS file (seedocs.rds
).
The folder countries/incl_UK
also contains the (differing) results including the results of United Kingdom, which was omitted from the paper for interpretational reasons.
The misc
folder contains various summary tables, e.g. in party_names.csv
all party abbreviations are given - sorted by country - and in party_names2.csv
sorted by abbreviation.
At last the pdf
folder contains three PDFs:
statistics_general.pdf
gives an insight into descriptive statistics concerning the parliament and the raw data set (see alsocode/3statistics_general.Rmd
),create_textmeta.pdf
gives the process and statistics of the preprocessing (see alsocode/4create_textmeta.Rmd
),plots.pdf
contains some descriptive plots and further analytical plots based on the LDA results.