Causal versus Masked Language Modeling
WHAT IT TESTS: understanding LLM pre-training objectives. OUTLINE: pre-training learns general language from unlabeled text; CLM predicts the next token left-to-right, MLM predicts masked tokens using both sides.
WHAT IT TESTS: clear contrast between the two dominant self-supervised objectives. ANSWER OUTLINE: pre-training teaches a model general linguistic and world knowledge from massive unlabeled text via self-supervision. Causal language modeling, as in GPT, predicts the next token from only the leftward context, making it naturally generative. Masked language modeling, as in BERT, hides a fraction of tokens and predicts them using bidirectional context, making it strong for understanding.
Read the original → interview
- #pre-training
- #causal-language-modeling
- #masked-language-modeling
- #gpt
- #bert
Get five bites like this every day.
Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.