Skip to content

blintkcsis/Exploring-word2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Simpson's image from WikiPedia

Explorations on word2vec algorithm on dialogue from Simpson's

As a first exploration I looked into different implementations of word2vec. Quickly found 'gensim', which as far as i know, is the first implementation of the original algorithm we explored in class for INST0075. It also seems relatively widely used and quickly found many example tutorials. Using the dialouge from 27 seasons of The Simpson's seemed like a silly enough way to get started without much plan. This project could be broken down into three parts:

  1. following the original tutorial (referenced later) to clean, initialize and train the model on all dialouges
  2. looking into new ways to understand the results
  3. aggregating dialoges by character to understand how characters are represented by what they say.

Code dependencies from the following libraries:

Python 3.8.15 (gensim does not support the newest versions of python at the time of writing)

  • gensim 4.3.0
  • scikit-learn 1.2.1
  • plotly 5.13.0
  • spacy 3.5.0
  • seaborn 0.12.2

Much of the first half directly comes from this tutorial. It also points to the used dataset, which is available here.

The whole code is ran through the jupyter notebook, which also explains everything as it is ran. The original visualisation is referenced from another tutorial and the last exploration is largely my own code.


Simpsons picture from Wikipedia

About

No description or website provided.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published