tezvyn:

LLMs & Generative AI

Large language models, chatbots, agents, prompt engineering

105 bites

LLMs & Generative AI30 sec read

How does text guide Stable Diffusion via U-Net cross-attention?

Tests whether you know text embeddings condition the U-Net through cross-attention. Good answers explain that image features query text keys and values at every layer. Red flag: claiming the prompt is concatenated to the image latent.

LLMs & Generative AI30 sec read

Key latent space difference between Autoencoder and VAE, and generative use

This tests deterministic versus probabilistic latent representations. Standard autoencoders encode fixed points; VAEs encode distributions. Sampling the regularized latent distribution generates new data. Red flag: calling VAEs mere noise adders.

LLMs & Generative AI30 sec read

Describe a ReAct agent architecture for multi-step dependent tool calls

WHAT IT TESTS: Designing loops that interleave reasoning and tool use across steps. ANSWER OUTLINE: Sketch ReAct's thought-action-observation cycle; keep state in an append-only trajectory; re-plan after each observation.

LLMs & Generative AI30 sec read

Walk me through building a weather agent with get_weather

WHAT IT TESTS: The LLM tool-use loop separating inference from execution. ANSWER OUTLINE: Register get_weather, let the model emit parameters, execute it yourself, feed the result back, then synthesize the answer. RED FLAG: Claiming the LLM calls the API.

LLMs & Generative AI30 sec read

How does function calling work in modern LLMs?

WHAT IT TESTS: whether you see function calling as client-side structured generation, not model execution. ANSWER OUTLINE: schemas in the prompt; model emits JSON name and arguments; client executes and returns results.

LLMs & Generative AI30 sec read

How would you architect a multi-turn conversational RAG system?

This tests memory and query reformulation design beyond single-turn RAG. A strong answer covers 5-10 turn windows, LLM-based rewriting with coreference resolution, hybrid fallbacks, and summarized memory.

LLMs & Generative AI30 sec read

How would you modify retrieval architecture for hybrid text and SQL RAG?

It tests unified retrieval across unstructured text and structured SQL. Outline a query planner that routes to vector search or text-to-SQL, joins the results, and synthesizes a final answer. Never suggest embedding the whole database as text chunks.

LLMs & Generative AI30 sec read

Why does your RAG ignore or contradict retrieved context?

Tests separation of retrieval failures from generation grounding in RAG. Strong answers trace symptoms to root causes like bad chunks, prompt ordering, or parametric knowledge override, then outline systematic debugging. Do not just say hallucination.

LLMs & Generative AI30 sec read

What does the KL-divergence penalty do in RLHF PPO, and if zeroed?

It tests RLHF reward hacking awareness. The KL penalty anchors PPO to the reference model to stop mode collapse; zeroing it causes over-optimization against the proxy reward model, yielding incoherent outputs.

LLMs & Generative AI30 sec read

Walk through RLHF's three stages, outputs, and purposes.

Tests your grasp of the RLHF pipeline end-to-end. A strong answer lists: pretrain an instruction-following LM, train a reward model outputting a scalar preference score, then fine-tune the LM via RL.

LLMs & Generative AI30 sec read

Full fine-tuning or LoRA on a tight compute budget?

This tests budget-constrained adaptation for many tasks. A strong answer picks LoRA: it trains only a small number of extra parameters, cutting compute and storage versus full fine-tuning while matching performance.

LLMs & Generative AI30 sec read

Describe supervised fine-tuning for a pre-trained language model

Tests if you know SFT aligns a base model to instructions using curated prompt-completion data. A strong answer covers next-token prediction on completions, conversational formats, and small learning rates.

LLMs & Generative AI30 sec read

Design dynamic few-shot example retrieval from a vector database

Tests RAG-style prompt engineering with semantic retrieval and latency. Use shared embeddings, approximate nearest neighbors with metadata filters, diversity reranking, and token-bounded prompt templates.

LLMs & Generative AI30 sec read

Describe two prompt-based techniques to ensure valid LLM JSON output

This tests output constriction via prompt design. First, embed an exact JSON skeleton with empty values. Second, provide few-shot exemplars mapping inputs to valid JSON. A red flag is suggesting only post-hoc regex repair or larger models.

LLMs & Generative AI30 sec read

What causes sudden loss spikes in long pre-training runs?

WHAT IT TESTS: Diagnosing LLM training instabilities under pressure. ANSWER OUTLINE: Name gradient explosions, LR mismatch, FP16 overflow, and poison batches; propose norm checks, rollback, and LR cuts.

LLMs & Generative AI30 sec read

Explain data, tensor, and pipeline parallelism and hybrid training strategy

Tests communication and memory tradeoffs of core distributed training strategies. Strong answers contrast data parallelism (shard batch), tensor parallelism (shard layers, all-reduce), and pipeline parallelism (shard stages, p2p), then propose a 3D hybrid…

LLMs & Generative AI30 sec read

Why is self-attention O(n^2) and what are the implications?

Tests the attention matrix bottleneck. Strong answers note QK^T yields an N×N matrix, creating quadratic compute and memory that blocks long documents and high-res images. Red flag: confusing model size with activation memory.

LLMs & Generative AI30 sec read

Validation loss increases while training loss decreases: what is this?

This tests recognition of overfitting and regularization. A strong answer names it, offers early stopping, dropout or weight decay, and data augmentation or more data. A red flag is suggesting longer training or more parameters without fixing generalization.

LLMs & Generative AI30 sec read

Explain word embeddings and why they beat one-hot encoding for large vocabularies

WHAT IT TESTS: dense semantic vectors versus sparse symbolic encodings. ANSWER OUTLINE: embeddings cluster similar meanings in low-dimensional space, while one-hot vectors are orthogonal, huge, and semantically blank.

LLMs & Generative AI31 sec read

Google TPU: Built for Matrix Math

A TPU is a specialist ASIC, not a faster GPU; it trades graphics flexibility for matrix-math throughput per watt. Google deploys them for TensorFlow, JAX, and PyTorch at scale. They excel at CNNs but can lag on tasks needing rasterization or recurrent logic.

LLMs & Generative AI · Tezvyn