Basic in Recurrent Neural Network

Recurrent Neural Network

RNN is a neural network that learn from a sequential database. So, it can handle different size vectors, preserve sequential information, and it has good generalization.

Vanilla RNN

As we see on the image it can work with sequences of variable length. And below you can see its recurrent process :

The network uses a "Back Propagation through time" (BPTT) : sums the gradients of loss over time steps. The loss at a time (t) is calculated based on the output and the cell state at the same time step, then we calculate the gradient of the cell based on gradients of previous hidden states.

Limits :

factors > 1 the gradient loss explodes
factors < 1 the gradient loss vanishes

Long short term memory

Same idea, but we change how RNN cells manage information and gradient flow : two internal states : Hidden state (short term) & Cell state (long term) :

We have four gates, in the order, left to right :

Forget gate : what information should we forget or keep from previous states ?
Input gate : which values will be written to current cell
Input modulation gate : how much to write to current cell? (sub-gate of previous one)
Output gate : what will be output from current cell ?

We use the same back propagation as the first example : Back propagation through time. The loss is propagated by hidden & Cell state which avoid limits seen before.

Attention

Attention can be defined as the ability to decide what to focus on. Attention over time is called memory.

Attention model

Two types of attention models :

Hard : Use reinforcement learning to train attention model --> stochastic
Soft : Use Back Propagation to train attention model --> differentiable

Implicit attention

RNN has a memory of past events via their recursive hidden state. We can use Jacobian to measure it.

Explicit attention

This attention react like a human could react : it reduces computations, better understanding of what the Neural Network is doing, and it also simplifies sequential processing.

Transformers

Transformer is a new model to deal with sequences. Focus on attention mechanisms : self-attention or Multi-head attention.

Some applications :

Language modelling : predict the next word of a sentence. The size of a sentence could change : only RNN could solve it.
Image generation : such as language modelling it try to predict the missing part of the image
Sequence to sequence : translation, hand gesture recognition, video descriptions... The encoder uses a many-to-one architecture and the decoder to the opposite (one-to-many).
Image captioning : what is important on an image (also for video description).
Visual Question Answering : similar to image recognition : respond to some questions on an image.

Carl's Blog

Rechercher dans ce blog