tezvyn:

Attention in Sequence-to-Sequence Models

Source: interviewintermediate

WHAT IT TESTS: why attention beats a fixed context vector. OUTLINE: attention computes per-step weighted sums over all encoder states, fixing the information bottleneck for long inputs. RED FLAG: describing attention but never naming the bottleneck it solves.

WHAT IT TESTS: understanding the information bottleneck of vanilla seq2seq and how attention removes it. ANSWER OUTLINE: a plain encoder-decoder compresses the whole input into one fixed vector, which loses detail and hurts long sequences. Attention instead lets each decoder step compute alignment scores against every encoder hidden state, softmax them into weights, and form a context vector as a weighted sum, so the decoder focuses on the relevant input parts per output token.

Read the original → interview

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Attention in Sequence-to-Sequence Models · Tezvyn