In short, it helps signals reach deep into the network.
During the training process of deep nn:
If the weights in a network start are too small,
then the signal shrinks as it passes through
each layer until it’s too tiny to be useful.
2. If the weights in a network start too large,
then the signal grows as it passes through each
layer until it’s too massive to be useful.
Xavier initialization makes sure the weights are ‘just right’,
keeping the signal in a reasonable range of values through many layers.
More details on the paper
[Understanding the difficulty of training deep feedforward neural networks]
(http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)
In short, it helps signals reach deep into the network.
During the training process of deep nn:
2. If the weights in a network start too large, then the signal grows as it passes through each layer until it’s too massive to be useful.
Xavier initialization makes sure the weights are ‘just right’, keeping the signal in a reasonable range of values through many layers.
More details on the paper [Understanding the difficulty of training deep feedforward neural networks] (http://jmlr.org/proceedings/papers/v9/glorot10a/glorot10a.pdf)