Glossary
Key terms and definitions used across AI Foundations, Claude Training, and AI Evals. 63 terms organized by category.
AI Foundations
Artificial Intelligence (AI)
The broad field of computer science focused on building systems that perform tasks requiring human intelligence. Includes everything from spam filters to self-driving cars to language models. AI has existed since the 1950s, but modern AI is almost entirely built on machine learning.
Machine Learning (ML)
A subset of AI where systems learn patterns from data rather than following explicit rules. Instead of hand-coding logic, you provide labeled examples and the system learns to generalize. Includes techniques like decision trees, SVMs, and neural networks.
Neural Network
A type of ML model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Each connection has a weight adjusted during training. Simple neural networks have 3 layers; deep neural networks can have hundreds.
Deep Learning
Neural networks with many layers (hence 'deep'). The depth allows the model to learn increasingly abstract representations. Image recognition, speech recognition, and language understanding all became practical through deep learning.
Generative AI (GenAI)
The category of deep learning models that create new content: text, images, code, audio, or video. Unlike traditional ML that classifies or predicts, generative models produce original outputs. LLMs are generative AI systems specialized in language.
Supervised Learning
Training a model on labeled examples where both the input and the correct output are provided. Used in LLM fine-tuning, where human-curated question-answer pairs teach the model to follow instructions.
RLHF (Reinforcement Learning from Human Feedback)
A training technique where human raters rank model outputs by quality, and the model optimizes to produce higher-ranked responses. Used by Anthropic (Constitutional AI variant) and OpenAI to make models helpful, harmless, and honest.
LLM (Large Language Model)
A neural network trained on massive text datasets that generates text by predicting the next token. Examples include Claude, GPT, Gemini, and Llama. LLMs are stateless by default and do not retain memory between conversations.
Transformer
The neural network architecture behind all modern LLMs. Uses self-attention mechanisms to process tokens in parallel rather than sequentially, enabling much faster training and better long-range understanding than earlier RNN or LSTM architectures.
Token
The smallest unit of text an LLM processes. A token is roughly 3-4 characters or about 0.75 words. Input tokens (your prompt) and output tokens (the response) are priced separately, with output tokens typically costing 3-5x more.
Context Window
The maximum number of tokens an LLM can process in a single conversation. Claude Opus 4.7 supports a 1M token context window. Larger windows allow more information but increase cost and latency.
RAG (Retrieval Augmented Generation)
A pattern that retrieves relevant documents from a knowledge base before generating a response. Combines search (BM25 or semantic) with LLM generation to produce factual, cited answers grounded in your data rather than the model's training data alone.
Embedding
A numerical vector representation of text that captures semantic meaning. Similar texts produce similar vectors. Used in RAG systems to find relevant documents by comparing query embeddings against document embeddings in a vector database.
Vector Database
A database optimized for storing and searching embedding vectors. Supports similarity search (finding the most relevant documents for a query). Examples include Pinecone, Weaviate, Chroma, and Supabase pg_vector.
Agentic AI
AI systems that can plan, reason, use tools, and take multi-step actions autonomously. Unlike basic LLMs that just generate text, agents can read files, run code, call APIs, and make decisions. Claude Code is an example of an agentic AI system.
Fine-tuning
Training a pre-existing model on a smaller, specialized dataset to improve performance on specific tasks. An alternative to RAG for domain-specific knowledge. More expensive and less flexible than RAG but can improve response quality for narrow domains.
Prompt Engineering
The practice of crafting inputs to LLMs to get better outputs. Includes techniques like contract prompts, XML structuring, few-shot examples, chain-of-thought reasoning, and system prompts. The difference between a 60% and 95% success rate.
Hallucination
When an LLM generates information that is factually incorrect but presented with confidence. RAG systems reduce hallucinations by grounding responses in retrieved documents, but they don't eliminate them entirely.
Temperature
A parameter (0-1) controlling randomness in LLM outputs. Lower temperature (0.0-0.3) produces more deterministic, focused responses. Higher temperature (0.7-1.0) produces more creative, varied responses. Use low temperature for factual tasks, higher for creative work.
Claude Products
Claude
Anthropic's family of AI models. Includes Opus (most capable), Sonnet (balanced), and Haiku (fastest/cheapest). Available through claude.ai, the API, Claude Code, and cloud platforms like Vertex AI and Bedrock.
Claude Code
Anthropic's agentic coding CLI. Reads your codebase, plans changes, writes code, runs tests, and commits. Supports CLAUDE.md for project context, subagents for parallel work, hooks for automation, and MCP servers for external integrations.
Claude Apps (claude.ai)
The web and desktop interface for Claude. Includes Projects (persistent workspaces with knowledge files), Artifacts (interactive outputs), Memory (persistent preferences), web search, file uploads, and Connectors to external tools.
Cowork
Claude's desktop agent for non-coding knowledge work. Can organize files, synthesize research, generate Excel with formulas, and perform multi-step tasks. Available on Windows and Mac with Pro, Max, Team, and Enterprise plans.
Claude Design
An Anthropic Labs product for creating visual outputs like designs, prototypes, slides, and one-pagers using Claude.
Extended Thinking
A Claude API feature that allocates internal reasoning tokens before generating a response. Controlled via the budget_tokens parameter. Improves quality on complex reasoning, math, code analysis, and planning tasks but increases latency and cost.
Agent View
A feature in Claude Code that provides real-time visibility into agent activity. Shows tool calls, decision-making processes, and multi-step workflows as they happen. Essential for debugging and monitoring production agents.
MCP
MCP (Model Context Protocol)
An open protocol that lets AI models connect to external tools and data sources through a standardized interface. Adopted by Anthropic, OpenAI, Google, and Microsoft. Has 97M monthly SDK downloads and 17,000+ servers across registries.
MCP Server
A program that exposes tools, resources, and prompts to AI models via the MCP protocol. Can be local (stdio transport) or remote (StreamableHTTP). Examples include GitHub, Sentry, Playwright, and custom business integrations.
MCP Client
The AI application that connects to MCP servers and uses their tools. Claude Code, Claude Desktop, and custom applications can all act as MCP clients. The client discovers available tools and calls them during conversations.
MCP Transport
The communication layer between MCP clients and servers. Two main types: stdio (local, launched as a subprocess) and StreamableHTTP (remote, accessed over HTTP). Stdio is simpler; HTTP supports remote deployment and authentication.
A2A (Agent-to-Agent Protocol)
Google's protocol for agent-to-agent communication. Complements MCP (which connects agents to tools) by enabling agents to discover and collaborate with each other. Supported by 150+ organizations and integrated into AWS, Azure, and Google Cloud.
FastMCP
A Python framework for building MCP servers quickly using decorators. Simplifies tool definition, resource exposure, and server configuration. The recommended approach for most MCP server development.
Claude Code
CLAUDE.md
A markdown file in your project root that provides persistent context to Claude Code. Contains project conventions, build commands, coding standards, and team preferences. Loaded automatically at the start of every conversation.
Hooks
Deterministic shell commands that execute at specific points in Claude Code's lifecycle (pre-tool-use, post-tool-use, notification). Used for auto-formatting, blocking dangerous operations, compliance logging, and credential scanning. Unlike skills, hooks always run and cannot be overridden.
Subagents
Isolated Claude Code instances that run in separate context windows. Used to delegate tasks (research, testing, implementation) without filling up the main conversation. Can be spawned with /agents or via the Agent tool.
Skills
Reusable instruction sets (SKILL.md files) that teach Claude specialized workflows. Triggered on demand when relevant. Examples: TDD workflow, code review checklist, deployment playbook. Can be shared via repositories, plugins, or managed settings.
Plan Mode
A Claude Code mode where the AI explores and plans before writing code. Activated with /plan or the --plan flag. Useful for complex tasks where you want to review the approach before implementation begins.
Managed Settings
Organization-wide configuration files pushed to Claude Code installations via MDM (Jamf, Intune, GPO). Enforce security policies, restrict models, allowlist MCP servers, and mandate hooks across all users. The enterprise equivalent of CLAUDE.md.
Routines
Cloud automation features in Claude Code that run scheduled or triggered tasks without requiring a local machine. Can be configured to execute recurring workflows like daily reports or automated code reviews.
Dreaming
A Claude Code feature where agents review past sessions to find patterns and self-improve. Agents analyze what worked and what didn't, then adjust their approach for future tasks.
Outcomes
A Claude Code feature that defines success rubrics for agent tasks. A separate grading agent scores the output and can kick tasks back for retry if they don't meet the defined criteria.
Enterprise
SSO (Single Sign-On)
Authentication that lets users sign in to Claude using their existing corporate identity (Azure AD, Okta, Google Workspace). Supports SAML and OIDC protocols. Required for Enterprise plans and ensures centralized access control.
SCIM (System for Cross-domain Identity Management)
A protocol for automated user provisioning and deprovisioning. When someone joins or leaves your organization in your IdP, SCIM automatically creates or removes their Claude access. Prevents orphaned accounts.
Connectors
Anthropic-managed integrations that bridge Claude to enterprise SaaS tools (Google Drive, Slack, Jira, Confluence, Salesforce). Connectors are MCP servers under the hood, managed by Anthropic so you don't need to host them.
Enterprise Search
A Claude feature that searches across all connected data sources to answer organizational questions. Respects source-level permissions so users only see what they have access to in the original tool.
Audit Logs
Records of all Claude usage in your organization. Include who used Claude, what they asked, which tools were called, and what data was accessed. Can be exported to SIEM systems (Splunk, CloudWatch, Datadog) for monitoring.
Seat Management
Controlling who has access to Claude Chat seats vs. Claude Code seats. Code seats are more expensive (~$30/user/month). Proper seat management prevents waste from over-provisioning users who only need the web interface.
AI Evals
AI Evaluation (Eval)
Systematic testing of LLM application quality. Goes beyond accuracy to measure reliability, safety, and business alignment. Includes failure taxonomy, automated grading, CI gates, and continuous production monitoring.
Failure Taxonomy
A categorized list of ways your AI application can fail. Created by reviewing 40-100 production traces and labeling each failure type. The foundation of any eval system: you cannot measure what you have not named.
LLM-as-Judge
Using one LLM to evaluate the output of another. A faster and cheaper alternative to human evaluation for subjective quality judgments. Requires careful rubric design to produce consistent, reliable scores.
CI Gate
An automated check in your CI/CD pipeline that blocks deployment if AI quality metrics fall below a threshold. Prevents regressions from reaching production. Typically checks eval scores, latency, and cost before allowing a merge.
Confusion Matrix
A table showing the relationship between predicted and actual results across evaluation categories. Reveals patterns like which failure types are most common and where the model is improving or degrading.
API & Cost
Prompt Caching
An API feature that caches repeated prompt prefixes for reuse. Provides a 90% cost discount on cached reads with a 25% write premium. Cache has a 5-minute TTL. Ordering of content in the prompt matters since caching is prefix-based.
Batch API
An API mode for processing large volumes of requests asynchronously. Provides a 50% cost discount with a 24-hour completion guarantee. Ideal for nightly reports, bulk classification, and document extraction workflows.
Model Routing
Sending different requests to different models based on complexity. Simple classification tasks go to Haiku (cheapest), standard generation to Sonnet (balanced), complex reasoning to Opus (most capable). The highest-leverage cost optimization.
Compliance
EU AI Act
European Union regulation on artificial intelligence. High-risk AI system requirements become enforceable August 2, 2026. Penalties reach EUR 35 million or 7% of global turnover. Classifies AI into four risk tiers: Unacceptable, High-risk, Limited-risk, and Minimal.
SOC 2 Type II
A security certification that verifies an organization's controls over data security, availability, processing integrity, confidentiality, and privacy. Anthropic holds SOC 2 Type II certification. De facto requirement for B2B enterprise sales.
ISO 42001
An international standard for AI management systems. Helps organizations meet EU AI Act requirements by providing a framework for responsible AI governance, risk management, and compliance documentation.
Data Classification
Categorizing data by sensitivity level to determine how it can be used with AI systems. Typical tiers: Public (free to use), Internal (use with caution), Confidential (restricted channels only), Restricted (never send to AI).
AUP (Acceptable Use Policy)
An organizational policy defining permitted and prohibited uses of AI tools. Covers what data can be shared, what decisions AI can make, accountability structures, and consequences for violations. Required before enterprise AI rollout.
Cloud Platforms
Vertex AI
Google Cloud's AI platform. Provides access to Claude models through Google Cloud billing, IAM authentication, and data residency. Choose Vertex when your organization standardizes on Google Cloud and needs consolidated billing.
Amazon Bedrock
AWS's managed AI service. Provides access to Claude and other models through AWS billing and IAM. Bedrock AgentCore adds runtime, memory, identity, and gateway services for building autonomous agents that can run up to 8 hours.
Observability
LangFuse
An open-source LLM observability platform (acquired by ClickHouse in 2026). Provides tracing, evaluation, and monitoring for AI applications. Tracks latency, cost, token usage, and quality metrics across production traffic.