After continuing my studies in the field of AI, I realized that my knowledge needs to be improved to better understand the contents I am dealing with.
Therefore, I decided to study from the very beginning.
Here is the list of the contents I will study.
The list can be updated and refined as I progress through the studies.
In this directory, Research Paper Implementation will be done.
Here is the order that I completed; the order does not necessarily follow the chronological order of the model development.
I believe the below are the MUST to learn to start with NLP.
If anyone has any suggestions, please feel free to tell me! :)
- After going through Single Layer Perceptron, I realized that reading a research paper line by line is quite ineffective.
Therefore, I have decided to go through only the equations and their explanations and conclusions.
- Attention Is All You Need (Transformer) ☑️
- Single Layer Perceptron ☑️
- Back-Propagation & Multilayer Perceptron ☑️
- Recurrent Neural Network (RNN) & Long Short-term Memory (LSTM) ☑️
- Convolutional Neural Network (CNN) ☑️
- Gated Recurrent Unit (GRU) <<<< On Going >>>>
- Batch Normalization
- The graph neural network model (GNN)
- Transformers (All publicly available models)
image retrieved from ResearchGate
Model | Characteristics | Main Weaknesses |
---|---|---|
Single-Layer Perceptron | - It is the first Neural Network Model, imitating Human Brain Cell (Neurons) - It is able to solve simple Linear Classification Problems |
- Unable to solve XOR problems, which requires more than one classifying standard or Non-Linear Classification. |
Multi-Layer Perceptron | - It can solve XOR problems with multiple hidden layers. - It is the model where the term "Deep-Learning" came from |
- Vanishing & Exploding Gradients - High computation cost for large inputs |
RNN | - Utilizes hidden states as memory therefore, it is able to process time-series data |
- Struggle in long-term dependencies. - The larger data it takes as input, the smaller the inclination the further part gets (due to the multiplication of weights). |
LSTM | - Solves the long-term dependency problem by applying Cell State (Long-Term memory) and the summation (not multiplication) of the weights - It is designed to mimic the human brain (forgetting unimportant things and remembering important things) |
- Too many gates, too much computational cost needed. - Bigger storage (or memory) needed to train longer data, as it is remembering more of the past data than other RNN models. |
CNN | - | - |
GRU | - | - |
Transformer |