A (computational) linguistic analysis of code-switching Russian-French in Lev Tolstoi's Война и мир (War and Peace).
To create a virtual environment and install all required packages, run bash install.sh
.
Download the dataset from WikiSource and place the .txt files in a directory corpus
.
scripts:
preprocess.py
: data preprocessing and computing CS types -> createscs_*.csv
codeswitch.py
: annotate intrasentential CS instances with PoS tags, lemmata, dependency and morphological information -> createsfeatures.csv
analysis.ipynb
: analyse outputs
outputs:
cs_*.csv
: overview of CS instances of each volumefeatures.csv
: linguistic features of intra-sentential CS instances