Here is the code I used for compiling a Wikipedia corpus, preprocessing, topic modeling
wikiscrap contains 4 functions for creating a comparable corpora using Wikipedia (I used the Wikipedia-API library). You get txt files.
preprocess contains functions for preprocessing texts in russian and english.
BasicLDAmethods contains code for experiments with a comparable corpora using a standart LDA model from the gensim library
pd2txt is used to create files with all documents in english and russian corpora which then can be used for training polylingual topic model (PLTM) from the MALLET package (
my comparable corpus you can find here