tezvyn:

Data Parallelism: One Task, Many Data Chunks

Source: Wikipedia: Data parallelismintermediate

Data parallelism splits a huge dataset across multiple processors, each running the same task on its own chunk. It's how large models are trained on massive datasets, with each GPU handling a different batch of data.

Data parallelism is like having many workers perform the exact same task on different parts of a huge dataset—it splits data, not tasks. This is the standard for training large ML models, where a giant dataset is split across many GPUs, each processing its own batch to calculate gradients in parallel. The main footgun is assuming it's always efficient; the overhead of synchronizing results across all workers after each step can become a major bottleneck, negating the parallelism gains.

Read the original → Wikipedia: Data parallelism

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Data Parallelism: One Task, Many Data Chunks · Tezvyn