Causal versus Masked Language Modeling

June 23, 2026Source: interviewbeginner

WHAT IT TESTS: understanding LLM pre-training objectives. OUTLINE: pre-training learns general language from unlabeled text; CLM predicts the next token left-to-right, MLM predicts masked tokens using both sides.

WHAT IT TESTS: clear contrast between the two dominant self-supervised objectives. ANSWER OUTLINE: pre-training teaches a model general linguistic and world knowledge from massive unlabeled text via self-supervision. Causal language modeling, as in GPT, predicts the next token from only the leftward context, making it naturally generative. Masked language modeling, as in BERT, hides a fraction of tokens and predicts them using bidirectional context, making it strong for understanding.

Read the original → interview

#pre-training
#causal-language-modeling
#masked-language-modeling
#gpt
#bert

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Get on Play Store Get on App Store