Reference Guide

AI Terms Glossary

Simplified definitions for advanced AI terminology.

🧠 Core Concepts ⚡ Capabilities 🔬 How It Works 🏗️ Infrastructure 🔧 Features & Modes 🤖 Model Types ⚠️ Limitations & Safety 📊 Benchmarks & Metrics

🧠

Core Concepts

Artificial Intelligence (AI)

Computer software designed to perform tasks that usually require human intelligence — recognizing patterns, making decisions, or understanding language. The broad umbrella for all 'smart' machine behavior.

A field of computer science focused on creating systems capable of performing tasks that typically require human cognitive functions. Modern AI encompasses machine learning, deep learning, natural language processing, computer vision, and robotics. The field is divided into narrow AI (specialized for specific tasks) and the theoretical concept of Artificial General Intelligence (AGI), which would match or exceed human-level reasoning across all domains.

Machine Learning (ML)

A subset of AI where computers learn rules from data instead of being explicitly programmed. Rather than coding 'If X, do Y,' you feed it examples and it figures out the patterns.

A computational approach where algorithms iteratively learn from data to improve performance on a specific task without explicit programming. ML encompasses supervised learning (labeled training data), unsupervised learning (pattern discovery in unlabeled data), semi-supervised learning, and reinforcement learning (learning through reward signals). Core techniques include gradient descent optimization, regularization, cross-validation, and hyperparameter tuning.

Deep Learning

A specialized type of Machine Learning using Neural Networks with many layers. Excels at complex problems like image recognition and language understanding. The foundation of modern AI breakthroughs.

A subset of machine learning utilizing artificial neural networks with multiple hidden layers (hence 'deep'). These architectures automatically learn hierarchical feature representations from raw data. Key architectures include Convolutional Neural Networks (CNNs) for spatial data, Recurrent Neural Networks (RNNs) for sequential data, and Transformers for attention-based processing. Training requires backpropagation, large datasets, and significant computational resources (GPUs/TPUs).

Neural Network

Software architecture loosely inspired by the human brain. Layers of interconnected 'neurons' process information, with each layer learning increasingly abstract patterns from data.

A computational model consisting of interconnected nodes (neurons) organized in layers. Each neuron applies a weighted sum of inputs followed by a non-linear activation function (ReLU, sigmoid, tanh). Networks learn by adjusting weights through backpropagation and gradient descent to minimize a loss function. Architecture variations include feedforward networks, convolutional networks, recurrent networks, and transformer networks, each optimized for different data types and tasks.

Generative AI

AI that creates new content (text, images, code, music) based on patterns learned from training data. Unlike traditional AI that analyzes, generative AI produces. ChatGPT, Midjourney, and Claude are generative AI.

AI systems designed to generate novel content by learning the underlying probability distribution of training data. Techniques include autoregressive models (GPT), diffusion models (Stable Diffusion), Generative Adversarial Networks (GANs), and Variational Autoencoders (VAEs). These models learn to sample from high-dimensional probability spaces to produce outputs statistically similar to training data while introducing controlled variation.

Transformer

The neural network architecture behind modern AI models like GPT and Claude. Invented in 2017, it uses 'attention' to understand relationships between all words in a sentence simultaneously.

A neural network architecture introduced in 'Attention Is All You Need' (Vaswani et al., 2017). It replaces recurrence with self-attention mechanisms that compute relationships between all positions in a sequence in parallel. Key components include multi-head attention, positional encodings, layer normalization, and feedforward sublayers. Variants include encoder-only (BERT), decoder-only (GPT), and encoder-decoder (T5) architectures. Transformers scale efficiently and have become the foundation of modern NLP and increasingly vision tasks.

AI Agent

A system designed to perceive its environment and take autonomous actions to achieve specific goals. Think of them as digital specialists — some are writers, others are coders, and some create art.

An autonomous system that perceives its environment through sensors/inputs, maintains internal state, and executes actions to achieve specified objectives. Modern AI agents typically combine LLMs for reasoning with tool use capabilities (APIs, code execution, web browsing). Key concepts include planning, memory management, reflection, and multi-agent collaboration. Architectures like ReAct (Reasoning + Acting) and chain-of-thought prompting enable complex task decomposition and execution.

LLM (Large Language Model)

The brain behind most AI agents. A type of AI trained on massive amounts of text that can understand and generate human-like language. Examples: GPT-5, Claude, Gemini.

Neural networks (typically transformer-based) trained on internet-scale text corpora using self-supervised learning objectives like next-token prediction or masked language modeling. LLMs learn statistical patterns in language that enable few-shot and zero-shot task performance. Key scaling laws relate model size (parameters), dataset size, and compute to performance. Training involves pre-training on general text followed by instruction fine-tuning and alignment techniques (RLHF, DPO, Constitutional AI).

Multimodal

An AI that can understand and work with multiple types of input — text, images, audio, and video. Gemini is a leading multimodal model.

AI systems capable of processing and generating multiple data modalities (text, images, audio, video) within a unified architecture. Approaches include early fusion (combining modalities at input), late fusion (combining at output), and cross-attention mechanisms. Models like GPT-4V, Gemini, and Claude use vision encoders (often CLIP-based or ViT) to project images into the same embedding space as text tokens, enabling integrated reasoning across modalities.

Context Window

How much text an AI can 'remember' in a single conversation. Measured in tokens. Larger context windows (like Claude's 200k tokens) allow for longer documents and more complex tasks.

The maximum sequence length a transformer model can process in a single forward pass, determined by positional encoding limits and memory constraints (attention is O(n²) in sequence length). Extended context techniques include sparse attention patterns, sliding window attention, memory compression, Rotary Position Embeddings (RoPE), and retrieval-augmented approaches. Effective context utilization varies—models often exhibit 'lost in the middle' phenomena where information in the center of long contexts is poorly recalled.

Token

The basic unit AI uses to process text. Roughly ¾ of a word or 4 characters. 'Hello world' = 2 tokens. Context windows and pricing are measured in tokens. 1,000 tokens ≈ 750 words.

The atomic unit of text processing in language models, produced by tokenization algorithms like Byte-Pair Encoding (BPE), WordPiece, or SentencePiece. Tokenizers build vocabularies (typically 32K-100K tokens) by iteratively merging frequent character sequences. Tokens map to embedding vectors that serve as model inputs. Tokenization affects model behavior—rare words split into multiple tokens, different languages have varying token efficiency, and token boundaries can impact generation quality.

⚡

Capabilities

Natural Language Processing (NLP)

AI's ability to understand, interpret, and generate human language. Powers chatbots, translation, sentiment analysis, and voice assistants.

A subfield of AI and linguistics focused on enabling computers to process human language. Traditional NLP used rule-based and statistical methods (n-grams, TF-IDF, HMMs). Modern NLP is dominated by neural approaches: word embeddings (Word2Vec, GloVe), contextual embeddings (ELMo, BERT), and generative models (GPT). Tasks include tokenization, POS tagging, named entity recognition, parsing, sentiment analysis, machine translation, question answering, and text generation.

Computer Vision

AI that can 'see' and interpret images or video. Used for facial recognition, medical imaging, autonomous vehicles, and searching your photo library for 'beach photos.'

A field enabling machines to derive information from visual inputs. Core tasks include image classification, object detection (YOLO, Faster R-CNN), semantic segmentation, instance segmentation, pose estimation, and 3D reconstruction. Modern approaches use Convolutional Neural Networks (CNNs) like ResNet, EfficientNet, and increasingly Vision Transformers (ViT). Self-supervised pretraining (CLIP, DINO) has enabled powerful visual representations that generalize across tasks.

Recommendation System

AI that predicts what content you'll like based on your history and similar users. Powers Netflix suggestions, TikTok's For You page, and Amazon's product recommendations.

Systems that predict user preferences for items. Approaches include collaborative filtering (user-user or item-item similarity), content-based filtering (item features), and hybrid methods. Modern systems use deep learning: neural collaborative filtering, autoencoders, and transformer-based sequential recommendation. Key challenges include cold start (new users/items), scalability, and balancing exploitation (known preferences) vs. exploration (discovering new interests). Evaluation metrics include precision, recall, NDCG, and diversity.

Reasoning

The AI's ability to think through problems step-by-step. Advanced reasoning models 'think' before answering, often producing more accurate results on complex tasks.

The capacity of AI systems to draw inferences, solve problems, and make decisions through logical or analogical processes. In LLMs, reasoning emerges from scale and can be enhanced through chain-of-thought prompting, self-consistency (multiple reasoning paths), tree-of-thought search, and dedicated reasoning tokens. Reasoning benchmarks test mathematical reasoning (GSM8K, MATH), commonsense reasoning (HellaSwag, ARC), and multi-step logical inference. Current models show capabilities but also systematic failures in certain reasoning types.

Prompt Engineering

The skill of crafting effective instructions for AI. Good prompts are specific, provide context, and may include examples. The difference between mediocre and excellent AI output.

The practice of designing inputs to elicit desired outputs from language models. Techniques include zero-shot prompting, few-shot learning (in-context examples), chain-of-thought prompting, role-playing/persona assignment, output format specification, and system prompts. Advanced methods include automatic prompt optimization, prompt chaining for complex tasks, and retrieval-augmented prompting. Prompt sensitivity varies by model—small wording changes can significantly impact outputs.

Chain of Thought

A prompting technique that asks AI to 'think step-by-step.' Forces the model to generate intermediate reasoning, significantly reducing errors on math and logic problems.

A prompting paradigm (Wei et al., 2022) where models generate intermediate reasoning steps before final answers. This decomposes complex problems into manageable substeps, improving accuracy on arithmetic, commonsense, and symbolic reasoning tasks. Variants include zero-shot CoT ('Let's think step by step'), self-consistency (sampling multiple reasoning paths and voting), and tree-of-thought (exploring branching solution paths). CoT is hypothesized to work by reducing the effective complexity per generation step.

Vibe Coding

Building apps by describing what you want in plain English. Tools like v0, Bolt.new, and Lovable generate working code from your descriptions.

An emerging development paradigm where developers describe desired functionality in natural language and AI generates implementation code. This leverages LLMs' code generation capabilities trained on open-source repositories. Tools provide iterative refinement through conversation, visual previews, and one-click deployment. Limitations include generated code quality variance, difficulty with complex architectures, and potential security vulnerabilities. Best suited for prototyping, simple applications, and developers who can review generated code.

Agentic Coding

AI that can autonomously write, test, and debug code across multiple files. Goes beyond simple autocomplete to handle entire development tasks.

AI systems that autonomously perform software development tasks including writing code, executing tests, debugging errors, refactoring, and managing files across a codebase. These agents combine LLMs with tool use (terminal, file system, browser) and typically employ ReAct-style architectures with planning, execution, and reflection loops. Examples include Devin, Claude's computer use, and GitHub Copilot Workspace. Challenges include long-horizon planning, error recovery, and maintaining codebase coherence.

RAG (Retrieval-Augmented Generation)

A technique where AI retrieves relevant information from a database before generating a response. This is how tools like Perplexity cite sources.

An architecture combining retrieval systems with generative models to ground outputs in external knowledge. The pipeline: (1) encode query into embedding, (2) retrieve relevant documents via vector similarity search (using indexes like FAISS, Pinecone), (3) concatenate retrieved context with query, (4) generate response conditioned on both. Benefits include reduced hallucination, updateable knowledge, and source attribution. Challenges include retrieval quality, context length limits, and handling conflicting information across sources.

Fine-tuning

Training an existing AI model on specific data to specialize it for particular tasks. Like teaching a general doctor to become a specialist.

The process of continuing training a pre-trained model on task-specific or domain-specific data. Full fine-tuning updates all parameters; parameter-efficient methods (LoRA, QLoRA, adapters) update only small additional parameters, reducing compute requirements. Fine-tuning requires careful learning rate selection, typically 10-100x smaller than pretraining. Risks include catastrophic forgetting of general capabilities and overfitting to small datasets. Alternatives include prompt-tuning and in-context learning for smaller adaptations.

🔬

How It Works

Attention Mechanism

The breakthrough that powers Transformers. Instead of reading left-to-right, attention lets AI consider all words simultaneously and learn which relationships matter. 'It was tired' — attention links 'it' to 'animal' or 'street' based on context.

A mechanism computing weighted combinations of value vectors based on query-key compatibility. Self-attention: Q, K, V are linear projections of the same input; cross-attention uses different sources. Attention(Q,K,V) = softmax(QK^T/√d_k)V. Multi-head attention runs parallel attention operations with different learned projections, then concatenates results. Computational complexity is O(n²) in sequence length, motivating efficient variants: sparse attention, linear attention, flash attention (memory-efficient exact attention), and sliding window approaches.

Next-Token Prediction

How LLMs actually work. They calculate which word (token) is statistically most likely to come next, based on all previous text. They don't 'know' facts — they predict probability.

The autoregressive language modeling objective where models predict P(x_t | x_1, ..., x_{t-1})—the probability distribution over the next token given all previous tokens. Training minimizes cross-entropy loss between predicted and actual next tokens across massive corpora. At inference, tokens are sampled from this distribution using strategies like greedy decoding, beam search, nucleus sampling (top-p), or temperature-scaled sampling. This simple objective, at scale, produces emergent capabilities including apparent reasoning, knowledge recall, and instruction-following.

Training

The process of teaching an AI model by showing it massive amounts of data. The model adjusts its internal parameters to get better at predictions. Training frontier models costs $100M+ and takes months.

The optimization process of adjusting model parameters to minimize a loss function on training data. For LLMs: (1) Pre-training on web-scale text using next-token prediction, requiring thousands of GPUs for weeks/months. (2) Supervised fine-tuning on curated instruction-response pairs. (3) Alignment via RLHF (training reward models on human preferences, then PPO optimization) or alternatives like DPO, RLAIF. Training infrastructure involves distributed computing (data/tensor/pipeline parallelism), mixed-precision training, gradient checkpointing, and careful hyperparameter scheduling.

Inference

Using a trained AI model to generate responses. When you chat with ChatGPT, that's inference. Much cheaper and faster than training, but still requires significant compute.

The process of running a trained model to generate outputs from inputs. For LLMs, inference is autoregressive—generating one token at a time, each conditioned on all previous tokens. Optimization techniques include KV-cache (storing computed key-value pairs), speculative decoding (draft model proposes, main model verifies), quantization (INT8, INT4), continuous batching, and tensor parallelism for large models. Inference cost scales with output length and model size; serving infrastructure must balance latency, throughput, and cost.

Temperature

A setting that controls how predictable or creative AI responses are. Low temperature (0.0-0.3) produces consistent, 'safe' outputs ideal for code and facts. High temperature (0.8-1.0) produces more creative, varied, sometimes surprising outputs.

A sampling hyperparameter that scales logits before softmax: P(x_i) = exp(z_i/T) / Σexp(z_j/T). Temperature T=1.0 preserves the learned distribution. T<1 sharpens the distribution (higher probability tokens become more likely, approaching greedy/argmax at T→0). T>1 flattens the distribution (lower probability tokens become relatively more likely, increasing randomness). Often combined with top-p (nucleus) or top-k sampling to truncate the distribution tail while maintaining controlled randomness.

Vector Embeddings

How AI understands meaning. Words and sentences are converted into lists of numbers (vectors) where similar meanings land close together. 'Hungry' and 'restaurant' share zero letters but have nearby vectors because they appear in similar contexts.

Dense, continuous vector representations of discrete tokens learned during neural network training. Embedding layers map tokens to points in high-dimensional space (typically 768-4096 dimensions) where geometric relationships encode semantic similarity. Training objectives (word2vec, BERT's MLM, contrastive learning) position semantically related items nearby via cosine similarity or Euclidean distance. Applications include semantic search, clustering, recommendation systems, and RAG retrieval. Models like OpenAI's text-embedding-ada-002 and Cohere's embed are purpose-built for generating these representations.

🏗️

Infrastructure

GPU (Graphics Processing Unit)

The specialized chips that power AI. Unlike CPUs (general-purpose), GPUs can do thousands of math operations simultaneously — perfect for neural networks. NVIDIA's H100 is the gold standard.

Massively parallel processors originally designed for graphics rendering, now essential for deep learning due to their ability to perform thousands of floating-point operations simultaneously. Key specs: TFLOPS (compute), HBM capacity and bandwidth (memory), NVLink/InfiniBand (interconnect). NVIDIA dominates with CUDA ecosystem; alternatives include AMD (ROCm), Google TPUs (specialized for matrix ops), and emerging AI accelerators. Training large models requires clusters of thousands of GPUs with high-bandwidth interconnects.

Data Center

The physical buildings housing AI infrastructure. The 'cloud' is actually massive facilities filled with hot, noisy hardware. AI racks use 40kW+ of power (vs 8kW standard) and require extensive cooling.

Facilities housing computing infrastructure with power delivery, cooling, networking, and physical security. AI workloads have transformed requirements: GPU clusters demand 40-100kW per rack (vs traditional 8-15kW), requiring liquid cooling solutions. Training runs need high-bisection-bandwidth networks (InfiniBand, RoCE) to minimize communication bottlenecks in distributed training. Major operators (hyperscalers, colo providers) are racing to build AI-optimized facilities with renewable energy, advanced cooling (rear-door, immersion), and on-site power generation.

🔧

Features & Modes

Deep Think / Thinking Mode

An enhanced reasoning mode where the AI takes more time to 'think' before responding. Produces more accurate answers for complex problems.

Product features that allocate additional compute at inference time for improved reasoning. Implementations vary: some use explicit chain-of-thought generation (visible or hidden 'thinking' tokens), others employ search/sampling strategies (generating multiple solutions and selecting best), and some use separate reasoning models or extended generation budgets. These approaches trade latency and cost for quality, particularly on math, coding, and complex analysis tasks. Related research: 'Let's Verify Step by Step', process reward models, and test-time compute scaling.

Artifacts / Projects

Claude's feature for generating interactive content (code, diagrams, documents) that appears in a separate panel. Now evolved into 'Projects' for larger work.

Interface features enabling AI assistants to create and render standalone content outside the chat stream. Artifacts can include executable code (React components rendered live), visualizations (SVG, charts), documents (Markdown), and interactive applications. Projects extend this with persistent workspaces, file management, and multi-artifact organization. These features require sandboxed execution environments, content security policies, and interfaces for iterative refinement. Similar concepts appear across AI products as 'canvas', 'workspaces', or 'documents'.

Search Mode

When an AI browses the live internet to find current information before responding. Essential for news, stock prices, and recent events.

Integration of real-time web search capabilities with language models. The system formulates search queries (often multiple), retrieves results from search APIs, and synthesizes information into responses with citations. Challenges include query formulation, source evaluation, handling conflicting information, respecting robots.txt/ToS, and avoiding outdated cached content. Implementations range from simple search-then-summarize to sophisticated multi-step research agents that iteratively refine queries based on findings.

Voice Mode

Conversational AI you can speak to naturally. ChatGPT's Advanced Voice Mode allows real-time back-and-forth conversation.

End-to-end speech interaction systems combining automatic speech recognition (ASR), language model processing, and text-to-speech (TTS) synthesis. Advanced implementations use native speech-to-speech models (like GPT-4o) that process audio directly without intermediate text, enabling natural prosody, interruption handling, and emotional expression. Technical challenges include streaming latency, voice activity detection, speaker diarization, handling ambient noise, and maintaining conversation state. Evaluation considers word error rate, latency, naturalness (MOS scores), and conversation flow.

🤖

Model Types

Frontier Model

The most advanced AI models available. Currently includes GPT-5.1, Claude Opus 4.5, Gemini 3 Pro, and Grok 4.1.

State-of-the-art models representing the current capability frontier, typically distinguished by scale (hundreds of billions to trillions of parameters), training compute (10²⁴+ FLOPs), and benchmark performance. Frontier models are subject to responsible scaling policies including capability evaluations, red-teaming, and staged deployment. Development is concentrated among well-resourced labs due to compute requirements ($100M+ training runs). The frontier advances through scaling, architectural innovations, data quality improvements, and post-training techniques.

Open Source / Open Weights

AI models where the underlying code or weights are publicly available. Allows developers to run them locally or customize them. Examples: Llama, Mistral.

Models distributed with publicly available weights, enabling local deployment, fine-tuning, and inspection. 'Open weights' (weights released, training code/data may not be) differs from 'fully open source' (complete reproducibility). Licenses vary: some permit commercial use (Apache 2.0, MIT), others restrict it (Llama's license). Benefits include transparency, customization, privacy (local inference), and reduced vendor lock-in. Tradeoffs include responsibility for safety measures and compute costs for deployment. The open ecosystem has narrowed the gap with proprietary models.

Closed Source / Proprietary

AI models only accessible through the company's API or interface. Most frontier models (GPT, Claude, Gemini) are closed source.

Models accessible only via API/product interfaces with proprietary weights, architecture, and training details. Providers manage infrastructure, safety measures, and updates. Business models include usage-based API pricing, subscriptions, and enterprise contracts. Advantages: lower barrier to use, managed safety/moderation, consistent updates. Disadvantages: vendor dependency, limited customization, data privacy concerns (inputs may be logged), rate limits, potential service changes. Leading closed models often benchmark higher but the gap with open alternatives continues to narrow.

⚠️

Limitations & Safety

Hallucination

When an AI confidently states something that's factually incorrect or made up. Happens because the model predicts probable words, not true words. Always verify important facts.

The generation of content that is fluent and confident but factually incorrect, nonsensical, or unfaithful to provided context. Causes include training data gaps/errors, the fundamental disconnect between statistical patterns and truth, and the model's inability to distinguish knowledge boundaries. Types: intrinsic (contradicting source material) vs. extrinsic (fabricating facts). Mitigation strategies include retrieval augmentation (RAG), citation requirements, confidence calibration, and training interventions. Hallucination detection remains an active research area; current models cannot reliably identify their own hallucinations.

Bias (in AI)

AI inherits biases present in its training data — gender, racial, cultural. If the internet over-represents certain viewpoints, the AI will too. An active area of research and mitigation.

Systematic patterns in model outputs that reflect or amplify societal biases present in training data. Types include representational bias (stereotyped associations), allocational bias (differential performance across groups), and linguistic bias (performance variations across languages/dialects). Sources: biased training data, annotation artifacts, objective function misspecification. Measurement uses benchmarks like WinoBias, BBQ, and demographic parity metrics. Mitigation approaches include data balancing, debiasing embeddings, adversarial training, RLHF with diverse feedback, and Constitutional AI principles. Complete debiasing remains an unsolved challenge.

📊

Benchmarks & Metrics

Elo Rating

A scoring system (borrowed from chess) used to rank AI models. Higher is better. LMArena's Text Arena uses Elo to compare chatbots based on human preferences.

A relative rating system adapted from chess where scores update based on pairwise comparison outcomes. In AI evaluation (e.g., Chatbot Arena), users vote between two model responses; ratings update based on outcomes relative to expected results given current ratings. Properties: only relative differences are meaningful, ratings converge with more comparisons, new entrants have high uncertainty. Variants like Bradley-Terry models provide statistical foundations. Advantages: captures holistic human preferences. Limitations: sensitive to comparison distribution, prompt selection, and voter demographics.

AIME / GPQA / SWE-Bench

Common AI benchmarks. AIME tests math competition problems, GPQA tests PhD-level science, SWE-Bench tests real-world coding ability. Used to compare model capabilities.

Standardized evaluation datasets measuring specific capabilities. AIME: American Invitational Mathematics Examination problems testing mathematical reasoning. GPQA: Graduate-level science questions requiring expert knowledge. SWE-Bench: Real GitHub issues requiring codebase understanding and multi-file changes. Other key benchmarks: MMLU (broad knowledge), HumanEval/MBPP (code generation), HellaSwag (commonsense), TruthfulQA (factuality). Benchmark limitations include training data contamination, overfitting through targeted optimization, and gaps between benchmark performance and real-world utility.

Missing a term? This glossary is regularly updated as the AI landscape evolves.