Explorations on word2vec algorithm on dialogue from Simpson's

Explorations on word2vec algorithm on dialogue from Simpson's

As a first exploration I looked into different implementations of word2vec. Quickly found 'gensim', which as far as i know, is the first implementation of the original algorithm we explored in class for INST0075. It also seems relatively widely used and quickly found many example tutorials. Using the dialouge from 27 seasons of The Simpson's seemed like a silly enough way to get started without much plan. This project could be broken down into three parts:

following the original tutorial (referenced later) to clean, initialize and train the model on all dialouges
looking into new ways to understand the results
aggregating dialoges by character to understand how characters are represented by what they say.

Code dependencies from the following libraries:

Python 3.8.15 (gensim does not support the newest versions of python at the time of writing)

gensim 4.3.0
scikit-learn 1.2.1
plotly 5.13.0
spacy 3.5.0
seaborn 0.12.2

Much of the first half directly comes from this tutorial. It also points to the used dataset, which is available here.

The whole code is ran through the jupyter notebook, which also explains everything as it is ran. The original visualisation is referenced from another tutorial and the last exploration is largely my own code.

Simpsons picture from Wikipedia

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data		data
README.md		README.md
does.txt		does.txt
hey.txt		hey.txt
how.txt		how.txt
word2vec.ipynb		word2vec.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explorations on word2vec algorithm on dialogue from Simpson's

Code dependencies from the following libraries:

About

Releases

Packages

Languages

blintkcsis/Exploring-word2vec

Folders and files

Latest commit

History

Repository files navigation

Explorations on word2vec algorithm on dialogue from Simpson's

Code dependencies from the following libraries:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages