CMPE597 Project 1 - Neural language model using a multi-layer perceptron

Description:

In this project, I created a neural language model that receives 3 consecutive words as the input and aims to predict the next word as output. The model uses a multi-layer perceptron that is implemented from scratch (only using Numpy functions) and it is trained using cross-entropy loss.

The network consists of a 16 dimensional embedding layer, a 128 dimensional hidden layer and one output layer. The input consists of a sequence of 3 consecutive words, provided as integer valued indices representing a word in a 250-word dictionary. Each word is converted to a one-hot representation and fed to the embedding layer which is 250x16 dimensional. Hidden layer has sigmoid loss function and the output layer is a softmax over the 250 words in the dictionary. After the embedding layer, 3 word embeddings are concatenated and fed to the hidden layer.

This is a Python 3 implementation using Numpy for necessary operations, Matplotlib for visualization, Pickle to save and load learned model parameters and scikit-learn for t-SNE functions.

Files:

Network.py file implements forward and backward propagation from scratch with activations and other necessary functions.

main.py file loads the dataset, shuffles the training data, divides it into mini batches, trains the model and evaluates it on validation set during training, saving the model parameters to model.pk file and outputting cost, training accuracy, validation accuracy plots and final accuracy values.

eval.py file loads the learned model parameters and evaluates the model on test data, outputting test accuracy.

tsne.py file loads the learned model parameters, calculates 16 dimensional word embeddings, fits and plots t-SNE for the embeddings, outputting a word plot and an interactive plot with several word predictions from the model.

model.pk file contains learned model parameters in Pickle format.

Instructions:

You can run the model after extracting the data files to "\data\data" folders under the same directory with model files. After this step, typing these commands in IPython should return the outputs:

In [1]: %cd "CURRENT DIRECTORY"

In [2]: %run main.py

In [3]: %run eval.py

In [4]: %run tsne.py

Results:

Hyperparameters:

Batch size = 256
Learning rate = 0.01
Epochs = 66

Training accuracy: 43.36%

Validation accuracy: 31.14%

Test accuracy: 31.07%

2-D t-SNE word plot:

Clusters of related words:

should, could, can, would, will, may, might
my, your, his, their, our
have, has, had
been, be, was, were, is, are, ‘s
do, does, did
say, says, said
make, made
it, that, this
me, him, us, them
i, you, he, she, we, they
dr., ms.
now, then
where, when, department
what, who
because, and, if, but, or, ‘,’, ‘:’, ‘;’, ‘—', percent
few, more, much
since, street
program, so
five, officials
state, York
like, see

Looking at these examples, words in the same cluster usually have the same function in a sentence. Thus, they can stand in for one another in a sentence. Most of the clusters are from the same lexical category (pronouns, determiners, conjunctions, auxiliary verbs, etc.) Also, in some clusters, there seems to be some commonality with meaning too. However, words in some clusters (17, 18, 19) did not make a lot of sense. I also prepared an interactive plot (by modifying the code from reference 5) that makes it a lot easier to examine the clusters. The interactive plot runs with a GUI, thus I cannot attach it here. However, you can run the tsne.py file in IPython to create the plot.

Some predictions:

city of new → york
life in the → world
he is the → same

These predictions looked sensible. After seeing such predictions, I was curious if the model could achieve more. So, I wrote a function that creates n-word-long strings by continuously predicting the next word from preceding three words, just to see how well the network performs. But after a dozen trials, I realized that it can very seldom create meaningful phrases longer than 5-6 words, usually going into a loop of dots afterwards. Here are some examples:

where are you going to do ? of them .
the city of the country . . . the next
he will go to the way that i can do
i would like to think about it . those .

Mathematical expressions:

Forward propagation:

Backward propagation:

References:

Neural networks from scratch in Python. (2021, April 27). Christian Dima. https://www.cristiandima.com/neural-networks-from-scratch-in-python/
Clark, K. (2021, April 27). Computing Neural Network Gradients. Stanford CS224 Readings. https://web.stanford.edu/class/cs224n/readings/gradient-notes.pdf
Derivative of Softmax loss function. (2021, April 27). Mathematics Stack Exchange. https://math.stackexchange.com/questions/945871/derivative-of-softmax-loss-function/
Code example of t-SNE: Dimensionality reduction Lecture 27@ Applied AI Course. (2021, April 27). YouTube. https://www.youtube.com/watch?v=eDGWcIt10d8
Possible to make labels appear when hovering over a point in matplotlib?. (2021, April 27). Stack Overflow. https://stackoverflow.com/questions/7908636/possible-to-make-labels-appear-when-hovering-over-a-point-in-matplotlib

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data/data		data/data
images		images
LICENSE.md		LICENSE.md
Network.py		Network.py
README.md		README.md
eval.py		eval.py
main.py		main.py
model.pk		model.pk
tsne.py		tsne.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CMPE597 Project 1 - Neural language model using a multi-layer perceptron

Description:

Files:

Instructions:

Results:

Mathematical expressions:

References:

About

Releases

Packages

Languages

License

alperkent/neural_language_model

Folders and files

Latest commit

History

Repository files navigation

CMPE597 Project 1 - Neural language model using a multi-layer perceptron

Description:

Files:

Instructions:

Results:

Mathematical expressions:

References:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages