Neural Network

Neural Networks

Origin: Algorithms that try to mimic the brain.

Commonly used types of neural networks include convolutional and recurrent neural networks.

By noting $i$ the $i^{th}$ layer of the network, $j$ the $j^{th}$ hidden unit of the layer and $a$ the output after activation function $g$ of $i^{th}$ layer of the network.

\[ a^{i}_{j} = g((w^{i}_{j})^{T} x + b^{i}_{j}) \]

AND vectorization version:

\[ a^{i} = g(\Theta^{i-1} a^{(i-1)}) \]

Activation function

Activation functions are used at the end of a hidden unit to introduce non-linear complexities to the model. Here are the most common ones:

Sigmoid	ReLU	Leaky ReLu
$ g(z) = \frac{1}{1+e^{-z}} $	$ g(z) = max(0, z) $	$ g(z) = max(\epsilon z, z), with \epsilon \ll 1$

Cost function

One of the most used cost function in neural network is cross-entropy, and is defiened as

\[ L(z, y) = -[ylog(z) + (1-y)log(1-z) ] \]

Backpropagation (Backprop)

Backpropagation is a method to update the weights in the neural network by taking into account the actual output and the desired output. The derivative with respect to weight $w$ is computed using chain rule and is of the following form:

\[ \frac{\partial L(z, y)}{\partial w} = \frac{\partial L(z, y)}{\partial a} * \frac{\partial a)}{\partial z} * \frac{\partial z)}{\partial w} \]

As a result, the weight is updated as follow:

\[ w \gets w - \alpha \frac{\partial L(z, y)}{\partial w} \]

Gradient checking

Idea: Make sure the written gradient descent algorithm calculate the right gradient using the gradient approximation. And be sure to disable your gradient checking code before training.

Random initialization

Idea: Avoid to make hidden units learn the identical things, instead of using zero initialization, take symmetry breaking random initialization.

Last updated on Dec 3, 2019