Vanishing Gradients and Why ReLU Helps
WHAT IT TESTS: grasp of deep-network training dynamics. OUTLINE: saturating activations shrink gradients across layers, ReLU's flat-one derivative preserves them. RED FLAG: confusing it with exploding gradients or ignoring ReLU's dead-neuron downside.
WHAT IT TESTS: understanding why deep networks were once hard to train. ANSWER OUTLINE: backpropagation multiplies many derivatives through layers; sigmoid and tanh saturate, with derivatives well below one, so gradients shrink exponentially and early layers barely learn. ReLU has a derivative of one for positive inputs, so it does not squash gradients, letting signal flow through deep stacks. Mention dead ReLUs and fixes like LeakyReLU. RED FLAG: confusing it with exploding gradients or claiming ReLU has no drawbacks.
Read the original → interview
- #deep-learning
- #vanishing-gradient
- #relu
- #activation-functions
- #backpropagation
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.