Vector Databases: Searching by Meaning, Not Matches
A vector database organizes data by meaning, not just exact values. Instead of finding a record by its ID, you find it by its similarity to a query. This powers AI features like Retrieval-Augmented Generation (RAG), where an LLM finds relevant documents, and recommendation engines. The main footgun is that it finds *approximate* matches, trading perfect accuracy for speed and the ability to search unstructured data.
Attention: Weighing Input by Relative Importance
The attention mechanism lets a model decide which parts of a sequence are most important relative to others. In natural language processing, it assigns 'soft' weights to words, allowing the model to focus on what's most relevant for a given task. It's used to encode sequences of token embeddings, from short phrases to massive documents. The main pitfall is forgetting that these weights are contextual and relative, not absolute measures of a word's importance.

Transformers: Processing Language in Parallel with Attention
Transformers process all input text at once, weighing which words are most important to each other in parallel. This 'attention' mechanism is the core of models like GPT, allowing them to understand context over long sequences. Text is broken into tokens, turned into vectors, and then contextualized by multiple attention 'heads'. The key footgun is that attention alone is order-agnostic; without explicit positional encodings, the model can't distinguish 'dog bites man' from 'man bites dog'.
Softmax Function: Turning Scores into Probabilities
The softmax function turns a list of raw scores from a model into a clean probability distribution where all values sum to 1. It's most often the final step in a neural network for multi-class classification, like deciding if an image is a 'cat', 'dog', or 'bird'. The main footgun is mistaking a high softmax probability for high model confidence; it only reflects the score's strength relative to the other scores, not its absolute certainty.
Prompt Engineering: How to Talk to AIs
Think of prompt engineering as giving a smart but literal intern a precise set of instructions. It's the skill of structuring your text input to guide a generative AI toward a specific, desired output, moving beyond simple keywords. This is essential for getting reliable results, from formatted JSON to correctly styled text. The biggest mistake is treating the AI like a search engine instead of a collaborator that needs clear direction.
Prompt Engineering: Steering AI with Words
Prompt engineering is steering an AI with carefully chosen words instead of code. You use it to get reliable results from chatbots like ChatGPT or to build applications that use large language models (LLMs). The biggest mistake is treating the AI like a search engine; effective prompts provide context, examples, and constraints to guide the model, rather than just asking a simple question.
AI Hallucination: Confabulation, Not Perception
AI hallucination is when a model confidently invents plausible-sounding facts to fill gaps in its knowledge. This isn't a perceptual error but a confabulation—an erroneously constructed response. It occurs when an AI must generate an answer but lacks verifiable data, such as when asked about niche topics. The biggest footgun is trusting an AI's fluent, confident-sounding output without independent verification, as it may be entirely fabricated.
Cosine Similarity: Measuring Direction, Not Distance
Cosine similarity measures the angle between two vectors, not their distance, to gauge similarity. It asks, "Do these point in the same direction?" This is fundamental in AI for comparing text embeddings, where a vector's direction represents its meaning. The main footgun is confusing it with Euclidean distance; cosine similarity ignores vector magnitude, so two vectors can be far apart in space but still be considered nearly identical if their orientation is the same.
Word Embeddings: Turning Words into Vectors
Word embeddings turn words into numerical vectors, like coordinates on a map of meaning. Words with similar meanings, like "king" and "queen," are placed close together in this vector space. This is fundamental for text analysis in machine learning, allowing models to grasp semantic relationships instead of just matching text. The footgun is assuming the vector's individual numbers are human-interpretable; they are abstract features learned from data.
Byte Pair Encoding: Compressing Text for LLMs
Think of Byte Pair Encoding (BPE) as creating custom abbreviations for common letter pairs to compress text. It repeatedly finds the most frequent pair, like 'th', and merges it into a new token. LLMs use this to build vocabularies of common sub-word units, helping them understand rare words. The main footgun is that the final vocabulary size is fixed; choosing the wrong size can hurt model performance and efficiency.
Large Language Models (LLMs)
A large language model is a sophisticated pattern-matching engine trained on a massive library of text. They power modern chatbots and can generate, summarize, or translate text by predicting the most probable next word based on the patterns they've learned. The key footgun is that their output reflects the biases and inaccuracies of their training data, making them confident but potentially unreliable.