Vanishing Gradients and Why ReLU Helps

June 23, 2026Source: interviewbeginner

WHAT IT TESTS: grasp of deep-network training dynamics. OUTLINE: saturating activations shrink gradients across layers, ReLU's flat-one derivative preserves them. RED FLAG: confusing it with exploding gradients or ignoring ReLU's dead-neuron downside.

WHAT IT TESTS: understanding why deep networks were once hard to train. ANSWER OUTLINE: backpropagation multiplies many derivatives through layers; sigmoid and tanh saturate, with derivatives well below one, so gradients shrink exponentially and early layers barely learn. ReLU has a derivative of one for positive inputs, so it does not squash gradients, letting signal flow through deep stacks. Mention dead ReLUs and fixes like LeakyReLU. RED FLAG: confusing it with exploding gradients or claiming ReLU has no drawbacks.

Read the original → interview

#deep-learning
#vanishing-gradient
#relu
#activation-functions
#backpropagation

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Get on Play Store Get on App Store