Glossary

Key terms and definitions used across AI Foundations, Claude Training, and AI Evals. 63 terms organized by category.

AI Foundations

Artificial Intelligence (AI)

The broad field of computer science focused on building systems that perform tasks requiring human intelligence. Includes everything from spam filters to self-driving cars to language models. AI has existed since the 1950s, but modern AI is almost entirely built on machine learning.

Learn more

Machine Learning (ML)

A subset of AI where systems learn patterns from data rather than following explicit rules. Instead of hand-coding logic, you provide labeled examples and the system learns to generalize. Includes techniques like decision trees, SVMs, and neural networks.

Learn more

Neural Network

A type of ML model inspired by the human brain, consisting of layers of interconnected nodes (neurons). Each connection has a weight adjusted during training. Simple neural networks have 3 layers; deep neural networks can have hundreds.

Learn more

Deep Learning

Neural networks with many layers (hence 'deep'). The depth allows the model to learn increasingly abstract representations. Image recognition, speech recognition, and language understanding all became practical through deep learning.

Learn more

Generative AI (GenAI)

The category of deep learning models that create new content: text, images, code, audio, or video. Unlike traditional ML that classifies or predicts, generative models produce original outputs. LLMs are generative AI systems specialized in language.

Learn more

Supervised Learning

Training a model on labeled examples where both the input and the correct output are provided. Used in LLM fine-tuning, where human-curated question-answer pairs teach the model to follow instructions.

Learn more

RLHF (Reinforcement Learning from Human Feedback)

A training technique where human raters rank model outputs by quality, and the model optimizes to produce higher-ranked responses. Used by Anthropic (Constitutional AI variant) and OpenAI to make models helpful, harmless, and honest.

Learn more

LLM (Large Language Model)

A neural network trained on massive text datasets that generates text by predicting the next token. Examples include Claude, GPT, Gemini, and Llama. LLMs are stateless by default and do not retain memory between conversations.

Learn more

Transformer

The neural network architecture behind all modern LLMs. Uses self-attention mechanisms to process tokens in parallel rather than sequentially, enabling much faster training and better long-range understanding than earlier RNN or LSTM architectures.

Learn more

Token

The smallest unit of text an LLM processes. A token is roughly 3-4 characters or about 0.75 words. Input tokens (your prompt) and output tokens (the response) are priced separately, with output tokens typically costing 3-5x more.

Learn more

Context Window

The maximum number of tokens an LLM can process in a single conversation. Claude Opus 4.7 supports a 1M token context window. Larger windows allow more information but increase cost and latency.

Learn more

RAG (Retrieval Augmented Generation)

A pattern that retrieves relevant documents from a knowledge base before generating a response. Combines search (BM25 or semantic) with LLM generation to produce factual, cited answers grounded in your data rather than the model's training data alone.

Learn more

Embedding

A numerical vector representation of text that captures semantic meaning. Similar texts produce similar vectors. Used in RAG systems to find relevant documents by comparing query embeddings against document embeddings in a vector database.

Learn more

Vector Database

A database optimized for storing and searching embedding vectors. Supports similarity search (finding the most relevant documents for a query). Examples include Pinecone, Weaviate, Chroma, and Supabase pg_vector.

Learn more

Agentic AI

AI systems that can plan, reason, use tools, and take multi-step actions autonomously. Unlike basic LLMs that just generate text, agents can read files, run code, call APIs, and make decisions. Claude Code is an example of an agentic AI system.

Learn more

Fine-tuning

Training a pre-existing model on a smaller, specialized dataset to improve performance on specific tasks. An alternative to RAG for domain-specific knowledge. More expensive and less flexible than RAG but can improve response quality for narrow domains.

Learn more

Prompt Engineering

The practice of crafting inputs to LLMs to get better outputs. Includes techniques like contract prompts, XML structuring, few-shot examples, chain-of-thought reasoning, and system prompts. The difference between a 60% and 95% success rate.

Learn more

Hallucination

When an LLM generates information that is factually incorrect but presented with confidence. RAG systems reduce hallucinations by grounding responses in retrieved documents, but they don't eliminate them entirely.

Temperature

A parameter (0-1) controlling randomness in LLM outputs. Lower temperature (0.0-0.3) produces more deterministic, focused responses. Higher temperature (0.7-1.0) produces more creative, varied responses. Use low temperature for factual tasks, higher for creative work.

Claude Products

Claude

Anthropic's family of AI models. Includes Opus (most capable), Sonnet (balanced), and Haiku (fastest/cheapest). Available through claude.ai, the API, Claude Code, and cloud platforms like Vertex AI and Bedrock.

Learn more

Claude Code

Anthropic's agentic coding CLI. Reads your codebase, plans changes, writes code, runs tests, and commits. Supports CLAUDE.md for project context, subagents for parallel work, hooks for automation, and MCP servers for external integrations.

Learn more

Claude Apps (claude.ai)

The web and desktop interface for Claude. Includes Projects (persistent workspaces with knowledge files), Artifacts (interactive outputs), Memory (persistent preferences), web search, file uploads, and Connectors to external tools.

Learn more

Cowork

Claude's desktop agent for non-coding knowledge work. Can organize files, synthesize research, generate Excel with formulas, and perform multi-step tasks. Available on Windows and Mac with Pro, Max, Team, and Enterprise plans.

Learn more

Claude Design

An Anthropic Labs product for creating visual outputs like designs, prototypes, slides, and one-pagers using Claude.

Extended Thinking

A Claude API feature that allocates internal reasoning tokens before generating a response. Controlled via the budget_tokens parameter. Improves quality on complex reasoning, math, code analysis, and planning tasks but increases latency and cost.

Learn more

Agent View

A feature in Claude Code that provides real-time visibility into agent activity. Shows tool calls, decision-making processes, and multi-step workflows as they happen. Essential for debugging and monitoring production agents.

Learn more

MCP

MCP (Model Context Protocol)

An open protocol that lets AI models connect to external tools and data sources through a standardized interface. Adopted by Anthropic, OpenAI, Google, and Microsoft. Has 97M monthly SDK downloads and 17,000+ servers across registries.

Learn more

MCP Server

A program that exposes tools, resources, and prompts to AI models via the MCP protocol. Can be local (stdio transport) or remote (StreamableHTTP). Examples include GitHub, Sentry, Playwright, and custom business integrations.

Learn more

MCP Client

The AI application that connects to MCP servers and uses their tools. Claude Code, Claude Desktop, and custom applications can all act as MCP clients. The client discovers available tools and calls them during conversations.

Learn more

MCP Transport

The communication layer between MCP clients and servers. Two main types: stdio (local, launched as a subprocess) and StreamableHTTP (remote, accessed over HTTP). Stdio is simpler; HTTP supports remote deployment and authentication.

Learn more

A2A (Agent-to-Agent Protocol)

Google's protocol for agent-to-agent communication. Complements MCP (which connects agents to tools) by enabling agents to discover and collaborate with each other. Supported by 150+ organizations and integrated into AWS, Azure, and Google Cloud.

FastMCP

A Python framework for building MCP servers quickly using decorators. Simplifies tool definition, resource exposure, and server configuration. The recommended approach for most MCP server development.

Learn more

Claude Code

CLAUDE.md

A markdown file in your project root that provides persistent context to Claude Code. Contains project conventions, build commands, coding standards, and team preferences. Loaded automatically at the start of every conversation.

Learn more

Hooks

Deterministic shell commands that execute at specific points in Claude Code's lifecycle (pre-tool-use, post-tool-use, notification). Used for auto-formatting, blocking dangerous operations, compliance logging, and credential scanning. Unlike skills, hooks always run and cannot be overridden.

Learn more

Subagents

Isolated Claude Code instances that run in separate context windows. Used to delegate tasks (research, testing, implementation) without filling up the main conversation. Can be spawned with /agents or via the Agent tool.

Learn more

Skills

Reusable instruction sets (SKILL.md files) that teach Claude specialized workflows. Triggered on demand when relevant. Examples: TDD workflow, code review checklist, deployment playbook. Can be shared via repositories, plugins, or managed settings.

Learn more

Plan Mode

A Claude Code mode where the AI explores and plans before writing code. Activated with /plan or the --plan flag. Useful for complex tasks where you want to review the approach before implementation begins.

Learn more

Managed Settings

Organization-wide configuration files pushed to Claude Code installations via MDM (Jamf, Intune, GPO). Enforce security policies, restrict models, allowlist MCP servers, and mandate hooks across all users. The enterprise equivalent of CLAUDE.md.

Learn more

Routines

Cloud automation features in Claude Code that run scheduled or triggered tasks without requiring a local machine. Can be configured to execute recurring workflows like daily reports or automated code reviews.

Dreaming

A Claude Code feature where agents review past sessions to find patterns and self-improve. Agents analyze what worked and what didn't, then adjust their approach for future tasks.

Outcomes

A Claude Code feature that defines success rubrics for agent tasks. A separate grading agent scores the output and can kick tasks back for retry if they don't meet the defined criteria.

Enterprise

SSO (Single Sign-On)

Authentication that lets users sign in to Claude using their existing corporate identity (Azure AD, Okta, Google Workspace). Supports SAML and OIDC protocols. Required for Enterprise plans and ensures centralized access control.

Learn more

SCIM (System for Cross-domain Identity Management)

A protocol for automated user provisioning and deprovisioning. When someone joins or leaves your organization in your IdP, SCIM automatically creates or removes their Claude access. Prevents orphaned accounts.

Learn more

Connectors

Anthropic-managed integrations that bridge Claude to enterprise SaaS tools (Google Drive, Slack, Jira, Confluence, Salesforce). Connectors are MCP servers under the hood, managed by Anthropic so you don't need to host them.

Learn more

Enterprise Search

A Claude feature that searches across all connected data sources to answer organizational questions. Respects source-level permissions so users only see what they have access to in the original tool.

Learn more

Audit Logs

Records of all Claude usage in your organization. Include who used Claude, what they asked, which tools were called, and what data was accessed. Can be exported to SIEM systems (Splunk, CloudWatch, Datadog) for monitoring.

Learn more

Seat Management

Controlling who has access to Claude Chat seats vs. Claude Code seats. Code seats are more expensive (~$30/user/month). Proper seat management prevents waste from over-provisioning users who only need the web interface.

Learn more

AI Evals

AI Evaluation (Eval)

Systematic testing of LLM application quality. Goes beyond accuracy to measure reliability, safety, and business alignment. Includes failure taxonomy, automated grading, CI gates, and continuous production monitoring.

Learn more

Failure Taxonomy

A categorized list of ways your AI application can fail. Created by reviewing 40-100 production traces and labeling each failure type. The foundation of any eval system: you cannot measure what you have not named.

Learn more

LLM-as-Judge

Using one LLM to evaluate the output of another. A faster and cheaper alternative to human evaluation for subjective quality judgments. Requires careful rubric design to produce consistent, reliable scores.

Learn more

CI Gate

An automated check in your CI/CD pipeline that blocks deployment if AI quality metrics fall below a threshold. Prevents regressions from reaching production. Typically checks eval scores, latency, and cost before allowing a merge.

Learn more

Confusion Matrix

A table showing the relationship between predicted and actual results across evaluation categories. Reveals patterns like which failure types are most common and where the model is improving or degrading.

Learn more

API & Cost

Prompt Caching

An API feature that caches repeated prompt prefixes for reuse. Provides a 90% cost discount on cached reads with a 25% write premium. Cache has a 5-minute TTL. Ordering of content in the prompt matters since caching is prefix-based.

Learn more

Batch API

An API mode for processing large volumes of requests asynchronously. Provides a 50% cost discount with a 24-hour completion guarantee. Ideal for nightly reports, bulk classification, and document extraction workflows.

Learn more

Model Routing

Sending different requests to different models based on complexity. Simple classification tasks go to Haiku (cheapest), standard generation to Sonnet (balanced), complex reasoning to Opus (most capable). The highest-leverage cost optimization.

Learn more

Compliance

EU AI Act

European Union regulation on artificial intelligence. High-risk AI system requirements become enforceable August 2, 2026. Penalties reach EUR 35 million or 7% of global turnover. Classifies AI into four risk tiers: Unacceptable, High-risk, Limited-risk, and Minimal.

SOC 2 Type II

A security certification that verifies an organization's controls over data security, availability, processing integrity, confidentiality, and privacy. Anthropic holds SOC 2 Type II certification. De facto requirement for B2B enterprise sales.

Learn more

ISO 42001

An international standard for AI management systems. Helps organizations meet EU AI Act requirements by providing a framework for responsible AI governance, risk management, and compliance documentation.

Data Classification

Categorizing data by sensitivity level to determine how it can be used with AI systems. Typical tiers: Public (free to use), Internal (use with caution), Confidential (restricted channels only), Restricted (never send to AI).

Learn more

AUP (Acceptable Use Policy)

An organizational policy defining permitted and prohibited uses of AI tools. Covers what data can be shared, what decisions AI can make, accountability structures, and consequences for violations. Required before enterprise AI rollout.

Learn more

Cloud Platforms

Vertex AI

Google Cloud's AI platform. Provides access to Claude models through Google Cloud billing, IAM authentication, and data residency. Choose Vertex when your organization standardizes on Google Cloud and needs consolidated billing.

Learn more

Amazon Bedrock

AWS's managed AI service. Provides access to Claude and other models through AWS billing and IAM. Bedrock AgentCore adds runtime, memory, identity, and gateway services for building autonomous agents that can run up to 8 hours.

Observability

LangFuse

An open-source LLM observability platform (acquired by ClickHouse in 2026). Provides tracing, evaluation, and monitoring for AI applications. Tracks latency, cost, token usage, and quality metrics across production traffic.

Learn more