Recurrent Neural Network
RNN is a neural network that learn from a sequential database. So, it can handle different size vectors, preserve sequential information, and it has good generalization.
Vanilla RNN
As we see on the image it can work with sequences of variable length. And below you can see its recurrent process :
The network uses a "Back Propagation through time" (BPTT) : sums the gradients of loss over time steps. The loss at a time (t) is calculated based on the output and the cell state at the same time step, then we calculate the gradient of the cell based on gradients of previous hidden states.
Limits :
- factors > 1 the gradient loss explodes
- factors < 1 the gradient loss vanishes
Long short term memory
Same idea, but we change how RNN cells manage information and gradient flow : two internal states : Hidden state (short term) & Cell state (long term) :
We have four gates, in the order, left to right :
- Forget gate : what information should we forget or keep from previous states ?
- Input gate : which values will be written to current cell
- Input modulation gate : how much to write to current cell? (sub-gate of previous one)
- Output gate : what will be output from current cell ?
We use the same back propagation as the first example : Back propagation through time. The loss is propagated by hidden & Cell state which avoid limits seen before.
Attention
Attention can be defined as the ability to decide what to focus on. Attention over time is called memory.
Attention model |
Two types of attention models :
- Hard : Use reinforcement learning to train attention model --> stochastic
- Soft : Use Back Propagation to train attention model --> differentiable
Implicit attention
RNN has a memory of past events via their recursive hidden state. We can use Jacobian to measure it.
Explicit attention
This attention react like a human could react : it reduces computations, better understanding of what the Neural Network is doing, and it also simplifies sequential processing.
Transformers
Some applications :
- Language modelling : predict the next word of a sentence. The size of a sentence could change : only RNN could solve it.
- Image generation : such as language modelling it try to predict the missing part of the image
- Sequence to sequence : translation, hand gesture recognition, video descriptions... The encoder uses a many-to-one architecture and the decoder to the opposite (one-to-many).
- Image captioning : what is important on an image (also for video description).
- Visual Question Answering : similar to image recognition : respond to some questions on an image.