tezvyn:

Resilient stateful batch on Spot Instances

Source: interviewadvanced

WHAT IT TESTS: fault tolerance on interruptible compute. OUTLINE: externalize state and checkpoint to durable storage, react to interruption and rebalance notices to drain gracefully, diversify instance pools.

WHAT IT TESTS: whether you can make interruptible compute safe for stateful work. ANSWER OUTLINE: decouple state from the instance by checkpointing progress to durable storage like S3 or a database so a new instance resumes, not restarts; consume the two-minute interruption notice and rebalance recommendation to checkpoint and drain gracefully; diversify across many instance types and AZs and mix in On-Demand for the baseline; make work idempotent so reprocessing a chunk is safe.

Read the original → interview

Get five bites like this every day.

Tezvyn delivers a daily feed of 60-second tech bites with quizzes to lock in what you learn.

Resilient stateful batch on Spot Instances · Tezvyn