tezvyn:

🤖AI & ML

Artificial intelligence, machine learning, and data science

291 bites

LLMs & Generative AI30 sec read

Hugging Face Hub: The GitHub for Machine Learning

Think of the Hugging Face Hub as the GitHub for machine learning. It's a central platform to find, share, and collaborate on millions of models, datasets, and demo apps. Use it to download a pre-trained model or share your own.

LLMs & Generative AI30 sec read

The EU AI Act: Risk-Based AI Regulation

The EU AI Act isn't a blanket ban but a risk-based framework. It sorts AI into tiers—from unacceptable to minimal risk—and applies rules proportionally, affecting any company with AI users in the EU. The footgun is assuming it only applies to EU companies.

LLMs & Generative AI30 sec read

Fairness Metrics: Quantifying AI's Impact on People

Fairness metrics translate "fairness" into a measurable score, checking if a model treats groups equitably. They are crucial for models in hiring or lending.

LLMs & Generative AI30 sec read

vLLM: Faster LLM Inference with PagedAttention

vLLM is a serving engine that speeds up LLM inference by treating GPU memory like virtual memory. It's used to serve models with higher throughput by batching requests without wasting memory on padding.

LLMs & Generative AI30 sec read

FlashAttention: Faster, Memory-Efficient Exact Attention

FlashAttention is an IO-aware algorithm that computes exact attention faster and with less memory. It avoids slow GPU memory transfers, making it a key optimization for training and serving large models on modern GPUs.

LLMs & Generative AI30 sec read

ONNX Runtime: Run Any AI Model, Anywhere

ONNX Runtime is a universal engine for AI models, letting you run them efficiently on any hardware, from cloud GPUs to a user's browser. It's used to deploy models for fast inference on servers or mobile devices.

LLMs & Generative AI30 sec read

Post-Training Quantization: Shrink Models Without Retraining

Post-Training Quantization (PTQ) shrinks a pre-trained model by converting its weights to lower precision, like turning a WAV file into an MP3. Use it to run large models on consumer GPUs without costly retraining.

LLMs & Generative AI30 sec read

Model Cards: The 'Nutrition Label' for AI Models

A model card is the nutrition label for an AI model, summarizing its ingredients, intended use, and risks. Found in model repos, it details training data, performance, and ethical guardrails.

LLMs & Generative AI30 sec read

Human Evaluation: Judging AI When Metrics Aren't Enough

Human evaluation is the ultimate reality check for AI, using people to judge qualities like fluency and coherence that automated scores can't capture. It's essential for tasks like summarization but is too slow and costly to use for everything.

LLMs & Generative AI30 sec read

BLIP: Bootstrapping Better Vision-Language Models

BLIP is a pre-training framework that masters both image understanding and generation by creating its own training data. It uses a captioner and filter to generate clean image-text pairs from noisy web data.

LLMs & Generative AI30 sec read

LLMs Get 'Lost in the Middle' of Long Contexts

LLMs struggle to find information buried in the middle of long prompts. Performance is highest when key facts are at the beginning or end of the context. This impacts multi-document QA and RAG.

LLMs & Generative AI30 sec read

Hybrid Search: Combining Keyword and Vector Search

Hybrid search combines keyword precision with vector search's conceptual understanding in one query. It excels at retrieving relevant documents for RAG by finding both exact matches (like names) and similar ideas.

LLMs & Generative AI30 sec read

Dense Passage Retrieval (DPR): Semantic Search for QA

DPR finds answers by meaning, not just keywords. It converts questions and documents into vectors and finds the closest matches, forming the core of Retrieval-Augmented Generation (RAG).

LLMs & Generative AI30 sec read

Reward Modeling: Teaching an LLM What 'Good' Means

A reward model is a judge that scores an LLM's outputs based on human preferences. It learns to assign a numerical 'goodness' score to text, turning subjective quality into an optimizable signal for training models like ChatGPT.

LLMs & Generative AI30 sec read

PEFT: Fine-Tune Large Models on a Budget

Parameter-Efficient Fine-Tuning (PEFT) adapts huge models without retraining everything. It's like adding a task-specific cheat sheet to a genius brain. Use it to specialize LLMs on consumer GPUs.

LLMs & Generative AI30 sec read

ReAct: Teaching LLMs to Think, Act, and Observe

ReAct teaches an LLM to solve problems by interleaving thought, action, and observation. This is key for agents that search the web or query APIs to answer questions with external data.

LLMs & Generative AI30 sec read

Tensor Parallelism: Split Layers, Not Just Models

Tensor Parallelism splits a single large model layer, like a weight matrix, across multiple GPUs to run in parallel. This is crucial for inference with models whose layers exceed a single GPU's VRAM.

LLMs & Generative AI30 sec read

Pipeline Parallelism: An Assembly Line for Your Model

Think of training a huge model like an assembly line. Pipeline parallelism splits a model's layers into stages across multiple GPUs, allowing you to train models too large for one device.

LLMs & Generative AI30 sec read

Data Parallelism: One Task, Many Data Chunks

Data parallelism splits a huge dataset across multiple processors, each running the same task on its own chunk. It's how large models are trained on massive datasets, with each GPU handling a different batch of data.

LLMs & Generative AI30 sec read

Common Crawl: A Free Snapshot of the Entire Web

Common Crawl is a public library of the internet—a massive, free snapshot of web text and links. It's the raw material for training many LLMs and for academic research on web-scale data. The footgun: it's unfiltered, containing everything from facts to spam.