Computer Vision

Image/video models, diffusion, OCR, multimodal

38 bites

Computer Vision30 sec read

Walk me through Canny edge detection and why it beats Sobel thresholding

Tests multi-scale edge detection and noise robustness versus raw gradient thresholding. Strong answer lists Gaussian blur, Sobel gradients, non-maximum suppression, double thresholding, hysteresis. Red flag: calling it blurred Sobel without hysteresis or NMS.

Computer Vision33 sec read

How does filter separability optimize Gaussian blur and its complexity?

This tests if you know a 2D Gaussian separates into two 1D convolutions. A strong answer gives complexity as O(N^2 K^2) dropping to O(N^2 K) for an N-by-N image and K-by-K kernel. A red flag is claiming all kernels are separable or omitting dimensions.

Computer Vision30 sec read

Zero-padding vs reflect vs replicate padding and their visual artifacts

This tests boundary assumptions in convolution. Zero-padding adds black borders causing dark vignettes; reflect padding mirrors edges for continuity; replicate padding repeats edge values outward. A red flag is saying padding choice does not affect outputs.

Computer Vision30 sec read

How does the Sobel operator approximate image gradients for edge detection?

This tests discrete gradient approximation via separable convolution. A strong answer covers 3x3 Gx and Gy kernels as smoothed central differences, then combines magnitude as sqrt(Gx^2 + Gy^2) or L1 norm. A red flag is treating them as arbitrary blur filters.

Computer Vision30 sec read

How would you implement a simple box blur on a grayscale image?

WHAT IT TESTS: Spatial convolution and image filtering basics. ANSWER OUTLINE: Iterate interior pixels, sum the N by N neighborhood, divide by kernel area, write to a new buffer. RED FLAG: In place updates that blur already blurred values.

Computer Vision30 sec read

Describe the BRDF, its advantage over Lambertian, and critical CV tasks

Tests 4D view-dependent reflectance. Strong answers define BRDF as dL_r/dE_i (sr^-1) over four angles; note Lambertian is isotropic; cite photometric stereo and shape-from-shading where specularity breaks the model. Red flag: calling it albedo.

Computer Vision30 sec read

Why is RGB Euclidean distance a poor measure of perceptual color difference?

This tests perceptual uniformity. A good answer explains that RGB distance does not match human vision, then describes CIELAB as a space where deltas approximate perceived differences, making segmentation align with human vision.

Computer Vision30 sec read

Compare YCbCr and RGB. Why chroma subsampling for compression?

Tests color decorrelation and perceptual redundancy. Contrast correlated RGB with YCbCr's luma-chroma split; eyes resolve brightness better than color, so 4:2:0/4:2:2 cuts chroma bandwidth ~50-75% with little loss.

Computer Vision31 sec read

Explain the pinhole camera model and intrinsic matrix K

Tests projective geometry and mapping sensor properties to K. Good answers derive perspective projection via similar triangles, list fx, fy, cx, cy, skew, and explain pixel scaling. Red flag: mixing intrinsics with extrinsics or saying K includes distortion.

Computer Vision30 sec read

Describe a grayscale histogram and its use in exposure and equalization

Tests pixel distribution intuition. A strong answer covers intensity bin counts, left or right clustering for exposure errors, and CDF-based redistribution for equalization. Red flag: calling equalization min-max stretching without cumulative mapping.

Computer Vision30 sec read

What is the difference between lossy and lossless image compression?

This tests irreversible discard versus perfect reconstruction. A strong answer defines lossy as dropping detail, lossless as fully reversible, names JPEG, PNG, and chooses lossless for masters, lossy for web. Red flag: claiming lossless is always smaller.

Computer Vision30 sec read

Non-Maximum Suppression: One Box Per Object

Non-Maximum Suppression (NMS) ensures each detected object gets just one bounding box. It sorts all proposed boxes by confidence, keeps the best one, and discards others that overlap it too much.

Computer Vision30 sec read

3D Object Detection: Seeing in Depth, Not Just Pixels

3D object detection adds depth to a 2D flat view, understanding an object's true size, distance, and orientation. It's vital for autonomous cars and robotics that need spatial awareness.

Computer Vision32 sec read

Sliding Windows: Scanning Images for Objects

A sliding window scans an image with a fixed-size box to find objects. At each location, a classifier checks the window's contents. Combined with an image pyramid, it can detect objects at various scales, localizing exactly where they are.

Computer Vision30 sec read

BRIEF: Fast, Compact Binary Feature Descriptors

BRIEF describes image features as a compact binary string instead of a complex vector. This makes it extremely fast for real-time matching on low-power devices. The footgun: its raw form isn't rotation-invariant, trading that robustness for raw speed.

Computer Vision30 sec read

TensorRT: From Trained Model to Production Speed

TensorRT is a compiler that turns a trained model into a specialized, high-speed engine for a specific NVIDIA GPU. It's used to deploy models in production where low latency is critical.

Computer Vision30 sec read

Model Quantization: Trading Precision for Performance

Model quantization trades numerical precision for a smaller memory footprint. It reduces model weights from high-precision types like fp32 to lower ones like int8 or int4, making large models fit on consumer hardware.

Computer Vision30 sec read

Coded Aperture: Imaging Without a Lens

A coded aperture images radiation that can't be focused, like X-rays. Instead of a lens, it uses a patterned mask to cast a complex shadow, which is then mathematically decoded into an image. It's crucial for X-ray astronomy.

Computer Vision30 sec read

Tone Mapping: Fitting HDR Light onto LDR Screens

Tone mapping compresses a vast range of light (HDR) to fit on a standard screen (LDR). It's how HDR photos and games look good on your monitor instead of having blown-out whites or crushed blacks. The footgun is creating unnatural, "over-cooked" images.

Computer Vision30 sec read

Visual Commonsense Reasoning (VCR): From Recognition to Cognition

VCR pushes AI from simple object recognition to human-like reasoning by asking not just 'what' is in an image, but 'why.' Models must select both the correct answer and the correct rationale, exposing models that guess answers based on shallow correlations.

Computer Vision · Tezvyn