Encoder-Only vs. Decoder-Only vs. Encoder-Decoder Transformers?

May 6, 2026Source: Wikipedia: Transformer (deep learning architecture)intermediate

This tests your ability to connect transformer architecture to specific NLP tasks. A great answer explains how each model's attention mechanism dictates its use: encoder-only (bidirectional attention) for understanding content, decoder-only (causal attention) for text generation, and encoder-decoder for sequence-to-sequence tasks like translation. The key red flag is failing to explain the *why* behind the task suitability—the attention mechanism.

This question tests your fundamental understanding of how transformer architecture maps to specific NLP tasks, moving beyond buzzwords. A strong answer outlines the three types by their core attention mechanism and resulting function. First, explain encoder-only models (like BERT) use bidirectional attention to see the entire input, making them ideal for understanding tasks like sentiment analysis. Second, detail how decoder-only models (like GPT) use causal attention to only see past tokens, suiting them for generation. Finally, describe encoder-decoder models (like T5) for sequence-to-sequence tasks like translation. The main red flag is just listing model names without explaining the underlying architectural differences.

Read the original → Wikipedia: Transformer (deep learning architecture)

#llm
#architecture
#transformers
#nlp

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Get on Play Store Get on App Store