AI & Agents Trainer Podcast

10 episodes covering prompting, Claude Code, MCP, enterprise security, RAG, and AI evaluation. Listen while you learn.

Core Foundations - Prompting and productivity essentials

The Prompt Engineering Playbook That Changed How We Talk to AI

~25 min

0:000:00

"Write me a marketing email" is a wish. "Given this product brief, write a 3-paragraph email targeting CTOs, emphasizing ROI, in a professional but warm tone" is a contract. We dissect why most engineers get mediocre results from AI and reveal the prompting patterns that the best practitioners use daily. Nidhi and Alex walk through real before-and-after transformations using contract prompts, XML structuring, and few-shot calibration - the same techniques that turn a 60% success rate into 95%+.

Read the lesson

Inside Claude Apps: The Features Power Users Swear By

~27 min

0:000:00

Most people use Claude like a search engine. They are missing 90% of what it can do. In this episode, we explore the features that turn Claude from a helpful chatbot into a genuine work companion - Projects that remember everything about your codebase, Artifacts that generate interactive apps on the fly, and Connectors that pull live data from Slack, Drive, and Notion. If you have only ever used the chat interface, this episode will change how you work.

Read the lesson

Claude Code - CLI workflows and developer tools

The Secret Workflow Behind Engineers Who Ship 10x Faster with Claude Code

~20 min

0:000:00

We watched an engineer fix a production auth bug in 8 minutes without touching a single file manually. Claude Code read the codebase, identified the root cause, planned the fix, wrote the code, ran the tests, and committed - all while the engineer reviewed and steered. This episode reveals the Explore-Plan-Code-Commit workflow that makes this possible, plus the lesser-known features like Plan Mode, subagents for parallel work, and hooks that auto-format every file Claude touches.

Read the lesson

Skills - Extend Claude with reusable workflows

How One Markdown File Transformed Our Entire Engineering Team's Output

~28 min

0:000:00

Here is a question that will change how you think about AI: what if you could take your best engineer's debugging instincts, your team's code review checklist, and your deployment playbook - and teach them to Claude permanently? That is what Skills do. We show the dramatic before-and-after of a TDD skill (Claude goes from writing all tests at once to proper red-green-refactor), walk through building your own from a blank SKILL.md file, and explain why npx skills@latest add is becoming every team's first command.

Read the lesson

MCP - Model Context Protocol and integrations

The Protocol That Made OpenAI, Google, and Microsoft Agree on Something

~23 min

0:000:00

Every AI company used to build integrations differently. GitHub had one API for ChatGPT, another for Claude, another for Gemini. MCP ended that. In this episode, we unpack the protocol that OpenAI, Google, and Microsoft all adopted - the "USB standard" that lets any AI connect to any tool through one universal interface. We break down the architecture, the three primitives every MCP server exposes, and why understanding this protocol is the single most important skill for AI engineers in 2026.

Read the lesson

We Built 27 GitHub Tools in One Session. Here's Exactly How.

~25 min

0:000:00

We built a GitHub MCP server with 27 tools in a single session - repos, issues, PRs, search, Actions, the works. Then we asked Claude to use it, and it started creating issues and reviewing pull requests without us writing a single line of glue code. This episode is a complete walkthrough: the 4-phase builder approach, FastMCP decorators, Pydantic validation, behavioral annotations like readOnlyHint, and a 10-question evaluation suite that proves your server actually works. Code-heavy and practical.

Read the lesson

Claude API - Build production applications

The API Architecture That Cut Our AI Costs from $2,100 to $340 a Month

~23 min

0:000:00

A startup we know was spending $2,100/month on Claude API calls. After one architecture session, they cut it to $340/month - same quality, same throughput. This episode shows you how. We cover the 5-step request lifecycle from client to server to model and back, the streaming pattern that makes responses feel instant, the agentic tool use loop that lets Claude call your functions, and the two features most teams miss: prompt caching (90% savings) and Batch API (50% off for async work).

Read the lesson

The $900 Coffee Machine: When Your AI Confidently Gives the Wrong Answer

~25 min

0:000:00

A company's HR bot told an employee they could expense a $900 coffee machine. The retrieval system found the right policy document, the answer was perfectly faithful to the retrieved text - and it was completely wrong because the system missed the exclusion clause two paragraphs down. This is the gap between RAG demos and production RAG. We cover chunking strategies, the case for hybrid search (semantic + BM25), reciprocal rank fusion, and the three evaluation metrics that would have caught this before a single employee saw the answer.

Read the lesson

Enterprise - Identity, security, and governance

The $72K Mistake Every Enterprise Makes When Rolling Out Claude

~27 min

0:000:00

An enterprise admin told us they discovered 47 inactive Claude Code seats - $28K/year wasted because nobody was tracking utilization. That is the kind of mistake this episode prevents. We cover the full admin stack: wiring up SSO with Azure AD in under an hour, SCIM provisioning that auto-revokes access when someone leaves, audit logs piped to your SIEM, managed settings that enforce hooks org-wide, and the incident response playbook you hope you never need but absolutely must have ready.

Read the lesson

AI Evals - Measure and improve AI quality

The $500K Hallucination That Could Have Been Caught in CI

~22 min

0:000:00

A fintech company shipped an AI advisor that passed every benchmark they tested. Six weeks later, it hallucinated a fund recommendation that cost a client $500K. The benchmarks measured capability. Nobody measured reliability. This episode introduces the evaluation methodology that separates production-grade AI from expensive demos: the 5-step loop that starts with naming your failure modes, the M.A.G.I. framework for building automated judges, CI gates that block bad deployments before users see them, and three case studies that will make you rethink how you test AI.

Read the lesson