-
Notifications
You must be signed in to change notification settings - Fork 1
Recurrent Neural Networks
Recurrent neural networks, also known as RNNs, are a class of neural networks that allow previous outputs to be used as inputs while having hidden states. An RNN includes the ability to maintain internal memory with feedback and therefore support temporal behavior.
- ANN or CNN get spatial-temporal/static type (only individual input is needed, previous or next inputs are not required) input data.
- Time series data point requires knowledge of data points around it. For example, videos, speech, etc.
- Feed-forward networks can't consider previous inputs.
Advantages | Disadvantages |
---|---|
Possibility of processing input of any length | Computation is slow |
Model size doesn't increase with the size of input | Difficulty of accessing information from a long time ago |
Computation considers historical information | Cannot consider any future input for the current state |
Weights are shared across time |
- Formula for calculating current state:
$h_t=f(h_{t-1}, x_{t})$ -
$h_t$ → current state -
$h_{t-1}$ → previous state -
$x_t$ → input state
-
- Formula for applying Activation function:
$h_t=tanh(W_{hh}.h_{t-1} + W_{xh}.x_{t})$ -
$W_{hh}$ → weight at the recurrent neuron -
$W_{xh}$ → weight at input neuron
-
- Formula for calculating output:
$y_t=W_{hy}.h_t$ -
$Y_t$ → output -
$W_{hy}$ → weight at the output layer
-
- A single-time step of the input is provided to the network.
- Then calculate its current state using a set of current input and the previous state.
- The current
$h_t$ becomes$h_{t-1}$ for the next time step. - One can go as many time steps as possible according to the problem and join the information from all the previous states.
- Once all the time steps are completed the final current state is used to calculate the output.
- The output is then compared to the actual output i.e., the target output and the error are generated.
- The error is then backpropagated to the network to update the weights and hence the network (RNN) is trained.
Type of RNN | Inputs, |
Illustration | Example |
---|---|---|---|
One-to-one | |||
One-to-many |
|
||
Many-to-one |
|
||
Many-to-many | |||
Many-to-many |
The vanishing and exploding gradient phenomena are often encountered in the context of RNNs. The reason why they happen is that it is difficult to capture long-term dependencies because of multiplicative gradients that can be exponentially decreasing/increasing with respect to the number of layers.
Gradient clipping is a technique used to cope with the exploding gradient problem sometimes encountered when performing backpropagation. By capping the maximum value for the gradient, this phenomenon is controlled in practice.
Types of gates: To remedy the vanishing gradient problem, specific gates are used in some types of RNNs and usually have a well-defined purpose. They are usually noted
where
The main ones are summed up here:
Type of gate | Sign | Role | Used in |
---|---|---|---|
Update gate | How much past should matter now? | GRU, LSTM | |
Relevance gate | Drop previous information? | GRU, LSTM | |
Forget gate | Erase a cell or not? | LSTM | |
Output gate | How much to reveal of a cell? | LSTM |
Gated Recurrent Unit (GRU) and Long Short-Term Memory units (LSTM) deal with the vanishing gradient problem encountered by traditional RNNs, with LSTM being a generalization of GRU.
-
Input Gate
- The addition of useful information to the cell state is done by the input gate.
- First, the information is regulated with the sigmoid function and filters the value to be remembered, similar to the forget gate using inputs
$h_{t-1}$ and$x_t$ . - Then a vector is created using the
$tanh$ function that gives an output from -1 to +1, which contains all the possible values from$h_{t-1}$ and$x_t$ . - At last, the values of the vector and the regulated values are multiplied to obtain useful information.
-
Forget Gate
- Information that is no longer useful in the cell state is removed with the forget gate.
- Two inputs
$x_t$ (input at the particular time) and$h_{t-1}$ (previous cell output) are fed to the gate and multiplied with weight matrices followed by the addition of bias. - The resultant is passed through an activation function that gives a binary output.
- If for a particular cell state the output is 0, the piece of information is forgotten and for output 1, then information is stored for future use.
-
Output Gate
- The task of extracting useful information from the current cell state to be presented as output is done by the output gate.
- First, a vector is generated by applying the
$tanh$ function on the cell. - Then, the information is regulated using the sigmoid function and filtered by the values to be remembered using inputs
$h_{t-1}$ and$x_t$ . - At last, the values of the vector and the regulated values are multiplied to be sent as an output and input to the next cell.
-
To learn more about Artificial Intelligence concepts, see Artificial Intelligence, Machine Learning, and Deep Learning..
-
Learn ML with Google Machine Learning Crash Course.
- Home
-
Machine Learning
- Supervised Learning
- Unsupervised Learning
- Deep Learning
- Recommender Systems