As language changes and words shift in meaning, these developments are in addition re-flected in cultural change. For instance, a language—in which words move in and out of a dynamic lexicon—displays the ideas and inventions that a culture might value. In this way, examining changes in lexicon possibly reveals significant aspects about the state of our society. Hence, it may be valuable to examine neologisms (newly coined words or ex-pressions) as a certain phenomenon of language change. More specifically, analyzing the semantic developments which occur during the early existence of a neologism could provide us with further insight into cultural shifts.
- Download the sliced corpus into
comment_data/1-mil-comm-per-month/
. OR download the raw files on your own intocomment_data/tobewrited
- Run
prepare_data.py
to create an atemporal compass based on the downloaded corpus. - Run
TWEC_master/train_model.py
to train the model. See the TWEC GitHub page for the requirements, etc.
WIP: Some toy data is included, with the intention that there will be toy models as well.
Reddit corpus: https://www.dropbox.com/sh/pf8dripkoqo8wzb/AABhwGhUocRLUoacCCL_sAYua?dl=0
Pushshift.io: https://files.pushshift.io/reddit/comments/
This work is based on the following paper https://aaai.org/ojs/index.php/AAAI/article/view/4594
- Di Carlo, V., Bianchi, F., & Palmonari, M. (2019). Training Temporal Word Embeddings with a Compass. Proceedings of the AAAI Conference on Artificial Intelligence, 33(01), 6326-6334. https://doi.org/10.1609/aaai.v33i01.33016326