<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"><title>Vinayak</title><link href="https://vinayakvitthal.github.io/" rel="alternate"/><link href="https://vinayakvitthal.github.io/feeds/all.atom.xml" rel="self"/><id>https://vinayakvitthal.github.io/</id><updated>2026-04-19T00:00:00+05:30</updated><entry><title>Claude Code Project Structure: Every File and Folder Explained</title><link href="https://vinayakvitthal.github.io/claude-code-project-structure-every-file-explained.html" rel="alternate"/><published>2026-04-19T00:00:00+05:30</published><updated>2026-04-19T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-19:/claude-code-project-structure-every-file-explained.html</id><summary type="html">&lt;h1&gt;Claude Code Project Structure: Every File and Folder Explained&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Most Claude Code projects start with a &lt;code&gt;CLAUDE.md&lt;/code&gt; and nothing else. Here's the full structure that turns Claude from a coding assistant into an engineering partner.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-structure-matters"&gt;Why Structure Matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-complete-directory"&gt;The Complete Directory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#file-by-file-breakdown"&gt;File-by-File Breakdown&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-claudemd--the-session-brain"&gt;1. CLAUDE.md …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;h1&gt;Claude Code Project Structure: Every File and Folder Explained&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Most Claude Code projects start with a &lt;code&gt;CLAUDE.md&lt;/code&gt; and nothing else. Here's the full structure that turns Claude from a coding assistant into an engineering partner.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-structure-matters"&gt;Why Structure Matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#the-complete-directory"&gt;The Complete Directory&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#file-by-file-breakdown"&gt;File-by-File Breakdown&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-claudemd--the-session-brain"&gt;1. CLAUDE.md — The Session Brain&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-claudelocalmd--your-personal-overrides"&gt;2. CLAUDE.local.md — Your Personal Overrides&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-mcpjson--external-tool-connections"&gt;3. .mcp.json — External Tool Connections&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-claudesettingsjson--permissions--model-control"&gt;4. .claude/settings.json — Permissions &amp;amp; Model Control&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#5-clauderules--contextual-coding-standards"&gt;5. .claude/rules/ — Contextual Coding Standards&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#6-claudecommands--repeatable-slash-workflows"&gt;6. .claude/commands/ — Repeatable Slash Workflows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#7-claudeskills--context-aware-capability-packs"&gt;7. .claude/skills/ — Context-Aware Capability Packs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#8-claudeagents--specialized-sub-agents"&gt;8. .claude/agents/ — Specialized Sub-Agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#9-claudehooks--automated-guardrails"&gt;9. .claude/hooks/ — Automated Guardrails&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#putting-it-all-together"&gt;Putting It All Together&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#starter-template"&gt;Starter Template&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#key-principles"&gt;Key Principles&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Why Structure Matters&lt;/h2&gt;
&lt;p&gt;The quality of Claude Code's output is &lt;strong&gt;directly proportional to the quality of your project structure&lt;/strong&gt;.&lt;/p&gt;
&lt;p&gt;A raw Claude Code session is powerful. But without structure, every session starts from zero — you re-explain conventions, re-define standards, re-specify what "done" looks like. That cognitive overhead compounds over weeks and across teams.&lt;/p&gt;
&lt;p&gt;A well-structured project means:
- Claude understands your codebase architecture from session start
- Coding standards are enforced automatically, not re-stated in every prompt
- Workflows run with a single slash command instead of multi-paragraph instructions
- Sub-agents handle specialized tasks without polluting the main context
- Hooks catch unsafe operations before they run&lt;/p&gt;
&lt;p&gt;Invest in the structure once. Every session benefits.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;The Complete Directory&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="c1"&gt;# Session context — loaded at start&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;CLAUDE&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;                  &lt;/span&gt;&lt;span class="c1"&gt;# Personal overrides (gitignored)&lt;/span&gt;
&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="w"&gt;                        &lt;/span&gt;&lt;span class="c1"&gt;# MCP tool integrations (shared via git)&lt;/span&gt;
&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;claude&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;# Permissions, model, hooks config&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;settings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;local&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="w"&gt;          &lt;/span&gt;&lt;span class="c1"&gt;# Personal settings overrides (gitignored)&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;rules&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;style&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;            &lt;/span&gt;&lt;span class="c1"&gt;# Code formatting &amp;amp; style standards&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;               &lt;/span&gt;&lt;span class="c1"&gt;# Testing patterns &amp;amp; requirements&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;api&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;conventions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="c1"&gt;# API design rules&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;commands&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;                &lt;/span&gt;&lt;span class="c1"&gt;# /project:review — full code review workflow&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fix&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;issue&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="c1"&gt;# /project:fix-issue — issue resolution steps&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;skills&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deploy&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;SKILL&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;             &lt;/span&gt;&lt;span class="c1"&gt;# Deployment procedures&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;       &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deploy&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="c1"&gt;# Environment &amp;amp; config details&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;├──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;# Dedicated review agent&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;security&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;auditor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="c1"&gt;# Security-focused analysis agent&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="err"&gt;└──&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;validate&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;bash&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sh&lt;/span&gt;&lt;span class="w"&gt;         &lt;/span&gt;&lt;span class="c1"&gt;# Pre-execution bash validation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;File-by-File Breakdown&lt;/h2&gt;
&lt;h3&gt;1. CLAUDE.md — The Session Brain&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Loaded automatically at every session start.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This is the single most important file in your project. It gives Claude the context it needs to be immediately useful — without you having to re-explain anything.&lt;/p&gt;
&lt;p&gt;A well-written &lt;code&gt;CLAUDE.md&lt;/code&gt; covers:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# Project: Acme API&lt;/span&gt;

&lt;span class="gu"&gt;## Overview&lt;/span&gt;
A REST API for the Acme SaaS platform. Handles auth, billing, and user management.

&lt;span class="gu"&gt;## Tech Stack&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Runtime: Node.js 20, TypeScript 5.3
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Framework: Fastify
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Database: PostgreSQL 15 + Prisma ORM
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Auth: JWT + refresh tokens
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Testing: Vitest + Supertest
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;CI: GitHub Actions

&lt;span class="gu"&gt;## Architecture&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;src/routes/       — Route handlers (thin, delegate to services)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;src/services/     — Business logic
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;src/repositories/ — Database layer (Prisma calls only here)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;src/middleware/   — Auth, validation, error handling

&lt;span class="gu"&gt;## Key Commands&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`npm run dev`&lt;/span&gt;     — Start dev server (port 3000)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`npm run test`&lt;/span&gt;    — Run full test suite
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`npm run lint`&lt;/span&gt;    — ESLint + Prettier check
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="sb"&gt;`npm run migrate`&lt;/span&gt; — Run pending DB migrations

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;All routes must have Zod input validation
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Services never import from other services directly
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Every new endpoint requires an integration test
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;No raw SQL — use Prisma query builder

&lt;span class="gu"&gt;## Current Focus&lt;/span&gt;
Refactoring the billing module to support usage-based pricing.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;What goes in CLAUDE.md:&lt;/strong&gt;
- Project overview and purpose
- Tech stack with specific versions
- Directory structure and architectural decisions
- Key commands (dev, test, build, deploy)
- Non-obvious conventions the model shouldn't have to guess
- Current work context / active focus area&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;What doesn't go in CLAUDE.md:&lt;/strong&gt;
- Personal preferences (use &lt;code&gt;CLAUDE.local.md&lt;/code&gt;)
- Secrets or credentials (never)
- Verbose documentation that belongs in your actual docs&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;2. CLAUDE.local.md — Your Personal Overrides&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Gitignored. Never committed. Just for you.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This file lets individual developers customize Claude's behavior without affecting teammates. It overrides or extends &lt;code&gt;CLAUDE.md&lt;/code&gt; for a single person's environment.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# Local Overrides — Sandipan&lt;/span&gt;

&lt;span class="gu"&gt;## Personal Preferences&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;I prefer concise responses without extensive explanation unless asked
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;When suggesting refactors, show the diff format, not just the new code
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;My local DB runs on port 5433 (not the default 5432)

&lt;span class="gu"&gt;## Dev Environment&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Using Cursor as my editor — optimize suggestions for Cursor workflows
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Node version: 20.11.0 via nvm

&lt;span class="gu"&gt;## Current Task&lt;/span&gt;
Working on the PaymentWebhookHandler — focus suggestions here.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;A teammate with different preferences runs the same project with their own &lt;code&gt;CLAUDE.local.md&lt;/code&gt;. No conflicts, no git noise.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;3. .mcp.json — External Tool Connections&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Shared via git. Controls every external tool your agent can reach.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;MCP (Model Context Protocol) is how Claude Code connects to external services — GitHub, JIRA, Slack, databases, and more. Your &lt;code&gt;.mcp.json&lt;/code&gt; defines those connections in one place.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;mcpServers&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;github&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;command&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;npx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;args&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;@modelcontextprotocol/server-github&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;GITHUB_PERSONAL_ACCESS_TOKEN&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${GITHUB_TOKEN}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;postgres&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;command&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;npx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;args&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;@modelcontextprotocol/server-postgres&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;POSTGRES_CONNECTION_STRING&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${DATABASE_URL}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;slack&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;command&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;npx&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;args&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;-y&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;@modelcontextprotocol/server-slack&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;env&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;SLACK_BOT_TOKEN&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;${SLACK_BOT_TOKEN}&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;With this in place, Claude can:
- Read and create GitHub issues and PRs
- Query your database directly
- Post updates to Slack channels&lt;/p&gt;
&lt;p&gt;Commit &lt;code&gt;.mcp.json&lt;/code&gt; to git. Your whole team gets the same integrations. Store secrets in environment variables, never in the file itself.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;4. .claude/settings.json — Permissions &amp;amp; Model Control&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Controls what Claude is allowed to do. Defaults to safe.&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;model&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;claude-opus-4-5&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;permissions&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;allow&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bash:npm run *&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bash:git status&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bash:git diff *&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bash:git log *&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;read:**&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;write:src/**&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;write:tests/**&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;deny&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bash:git push *&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bash:rm -rf *&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;bash:curl *&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;write:.env*&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;hooks&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;&amp;quot;preToolUse&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.claude/hooks/validate-bash.sh&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key things to configure:&lt;/strong&gt;
- &lt;code&gt;model&lt;/code&gt; — specify which Claude model handles this project
- &lt;code&gt;allow&lt;/code&gt; — explicit list of permitted tool uses
- &lt;code&gt;deny&lt;/code&gt; — hard blocks (push to git, delete files, write to env)
- &lt;code&gt;hooks&lt;/code&gt; — which scripts run before/after tool use&lt;/p&gt;
&lt;p&gt;&lt;code&gt;settings.local.json&lt;/code&gt; (gitignored) lets individual devs override their own model preference or permissions without touching the shared config.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;5. .claude/rules/ — Contextual Coding Standards&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Modular. Targeted. Loaded only when relevant.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Instead of dumping all your standards into &lt;code&gt;CLAUDE.md&lt;/code&gt;, rules files let you organize conventions by topic and have Claude load them contextually — &lt;code&gt;code-style.md&lt;/code&gt; when writing code, &lt;code&gt;testing.md&lt;/code&gt; when generating tests.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/rules/code-style.md&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# Code Style Rules&lt;/span&gt;

&lt;span class="gu"&gt;## TypeScript&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Explicit return types on all exported functions
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;No &lt;span class="sb"&gt;`any`&lt;/span&gt; types — use &lt;span class="sb"&gt;`unknown`&lt;/span&gt; and narrow
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Prefer &lt;span class="sb"&gt;`const`&lt;/span&gt; over &lt;span class="sb"&gt;`let`&lt;/span&gt;; never use &lt;span class="sb"&gt;`var`&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Interfaces over type aliases for object shapes

&lt;span class="gu"&gt;## Naming&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Files: kebab-case (&lt;span class="sb"&gt;`user-service.ts`&lt;/span&gt;)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Classes: PascalCase (&lt;span class="sb"&gt;`UserService`&lt;/span&gt;)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Functions/variables: camelCase (&lt;span class="sb"&gt;`getUserById`&lt;/span&gt;)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Constants: SCREAMING_SNAKE_CASE (&lt;span class="sb"&gt;`MAX_RETRY_COUNT`&lt;/span&gt;)

&lt;span class="gu"&gt;## Imports&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Group: external libs → internal modules → types
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;No barrel imports from index files in the same module
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Absolute paths only (&lt;span class="sb"&gt;`@/services/user`&lt;/span&gt; not &lt;span class="sb"&gt;`../../services/user`&lt;/span&gt;)

&lt;span class="gu"&gt;## Error Handling&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Always use custom error classes from &lt;span class="sb"&gt;`src/errors/`&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Never swallow errors silently — log or rethrow
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Async functions must handle rejection
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/rules/testing.md&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# Testing Standards&lt;/span&gt;

&lt;span class="gu"&gt;## Requirements&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Every new route: at minimum one integration test
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Every service function: unit test with mocked dependencies
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Test files colocated with source: &lt;span class="sb"&gt;`user-service.test.ts`&lt;/span&gt;

&lt;span class="gu"&gt;## Patterns&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Use &lt;span class="sb"&gt;`describe`&lt;/span&gt; blocks matching the function/class name
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Test names: &amp;quot;should [behavior] when [condition]&amp;quot;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;No test should depend on another test&amp;#39;s state
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Use factories from &lt;span class="sb"&gt;`tests/factories/`&lt;/span&gt; for test data

&lt;span class="gu"&gt;## Coverage&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Services: 90% line coverage minimum
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Routes: 80% minimum
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Utils: 100% — they&amp;#39;re pure functions
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/rules/api-conventions.md&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# API Conventions&lt;/span&gt;

&lt;span class="gu"&gt;## Endpoints&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;RESTful naming: /users, /users/:id, /users/:id/orders
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Versioned: all endpoints under /api/v1/
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Plural resource names always

&lt;span class="gu"&gt;## Request/Response&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;All inputs validated with Zod schemas
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Success: { data: T, meta?: PaginationMeta }
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Error: { error: { code: string, message: string, details?: unknown } }
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;HTTP 422 for validation errors, 409 for conflicts, 404 for not found

&lt;span class="gu"&gt;## Auth&lt;/span&gt;
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;JWT in Authorization: Bearer header
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Refresh token in httpOnly cookie
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;All endpoints authenticated unless marked &lt;span class="ni"&gt;@public&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Rules files can also use glob patterns to target specific paths — Claude can be told to only apply &lt;code&gt;api-conventions.md&lt;/code&gt; when working in &lt;code&gt;src/routes/&lt;/code&gt;.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;6. .claude/commands/ — Repeatable Slash Workflows&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Type &lt;code&gt;/project:review&lt;/code&gt;. Claude runs your entire code review process.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Commands let you encode multi-step workflows as slash commands — callable with a single &lt;code&gt;/project:name&lt;/code&gt; invocation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/commands/review.md&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# Code Review Workflow&lt;/span&gt;

You are performing a thorough code review. Follow these steps in order:

&lt;span class="k"&gt;1.&lt;/span&gt; &lt;span class="gs"&gt;**Understand the change**&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Run &lt;span class="sb"&gt;`git diff main`&lt;/span&gt; to see all changes
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Identify what problem this change solves

&lt;span class="k"&gt;2.&lt;/span&gt; &lt;span class="gs"&gt;**Check correctness**&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Does the logic handle edge cases?
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Are error cases handled explicitly?
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Any off-by-one errors or null pointer risks?

&lt;span class="k"&gt;3.&lt;/span&gt; &lt;span class="gs"&gt;**Check standards compliance**&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Apply rules from .claude/rules/code-style.md
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Apply rules from .claude/rules/testing.md
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Are all new endpoints following api-conventions.md?

&lt;span class="k"&gt;4.&lt;/span&gt; &lt;span class="gs"&gt;**Check test coverage**&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Run &lt;span class="sb"&gt;`npm test -- --coverage`&lt;/span&gt; and report gaps
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Are happy path AND error paths tested?

&lt;span class="k"&gt;5.&lt;/span&gt; &lt;span class="gs"&gt;**Security check**&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Any user input used without sanitization?
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Any secrets or credentials hardcoded?
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Auth checks present on all new endpoints?

&lt;span class="k"&gt;6.&lt;/span&gt; &lt;span class="gs"&gt;**Output**&lt;/span&gt;
   Provide a structured review with sections:
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;✅ What&amp;#39;s good
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;⚠️ Minor issues (suggestions)
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;❌ Must fix before merge
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/commands/fix-issue.md&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# Fix GitHub Issue Workflow&lt;/span&gt;

Given an issue number, follow these steps:

&lt;span class="k"&gt;1.&lt;/span&gt; Fetch the issue: use GitHub MCP to get issue #$ARGUMENTS
&lt;span class="k"&gt;2.&lt;/span&gt; Understand the bug — read related source files
&lt;span class="k"&gt;3.&lt;/span&gt; Reproduce: identify the code path causing the issue
&lt;span class="k"&gt;4.&lt;/span&gt; Fix: make the minimal change that resolves the issue
&lt;span class="k"&gt;5.&lt;/span&gt; Test: write a test that would have caught this bug
&lt;span class="k"&gt;6.&lt;/span&gt; Commit: &lt;span class="sb"&gt;`git commit -m &amp;quot;fix: [issue title] (#$ARGUMENTS)&amp;quot;`&lt;/span&gt;
&lt;span class="k"&gt;7.&lt;/span&gt; Summary: explain what was wrong and what you changed
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Invoke with &lt;code&gt;/project:fix-issue 247&lt;/code&gt; — Claude fetches issue #247, diagnoses it, fixes it, and tests it.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;7. .claude/skills/ — Context-Aware Capability Packs&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Auto-triggered based on task context. Loads only when needed.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Skills are task-specific knowledge bundles that activate when Claude detects a relevant context. They keep &lt;code&gt;CLAUDE.md&lt;/code&gt; lean while making specialized knowledge available on demand.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/skills/deploy/SKILL.md&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="gh"&gt;# Deployment Skill&lt;/span&gt;

&lt;span class="gu"&gt;## When to activate&lt;/span&gt;
Load this skill when the user mentions: deploy, deployment, release, production, staging, rollback.

&lt;span class="gu"&gt;## Deployment Process&lt;/span&gt;

&lt;span class="gu"&gt;### Pre-deploy checklist&lt;/span&gt;
&lt;span class="k"&gt;- [ ]&lt;/span&gt; All tests passing (&lt;span class="sb"&gt;`npm test`&lt;/span&gt;)
&lt;span class="k"&gt;- [ ]&lt;/span&gt; No TypeScript errors (&lt;span class="sb"&gt;`npm run type-check`&lt;/span&gt;)
&lt;span class="k"&gt;- [ ]&lt;/span&gt; Migrations reviewed and tested
&lt;span class="k"&gt;- [ ]&lt;/span&gt; Feature flags configured for gradual rollout

&lt;span class="gu"&gt;### Deploy to staging&lt;/span&gt;
```bash
gh workflow run deploy.yml -f environment=staging -f version=$(git rev-parse HEAD)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Deploy to production&lt;/h3&gt;
&lt;p&gt;Production deploys require two approvals in GitHub. Never deploy directly.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;gh&lt;span class="w"&gt; &lt;/span&gt;workflow&lt;span class="w"&gt; &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;deploy.yml&lt;span class="w"&gt; &lt;/span&gt;-f&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;production&lt;span class="w"&gt; &lt;/span&gt;-f&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;git&lt;span class="w"&gt; &lt;/span&gt;rev-parse&lt;span class="w"&gt; &lt;/span&gt;HEAD&lt;span class="k"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Rollback procedure&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get previous stable version&lt;/span&gt;
git&lt;span class="w"&gt; &lt;/span&gt;log&lt;span class="w"&gt; &lt;/span&gt;--oneline&lt;span class="w"&gt; &lt;/span&gt;-10
&lt;span class="c1"&gt;# Trigger rollback&lt;/span&gt;
gh&lt;span class="w"&gt; &lt;/span&gt;workflow&lt;span class="w"&gt; &lt;/span&gt;run&lt;span class="w"&gt; &lt;/span&gt;rollback.yml&lt;span class="w"&gt; &lt;/span&gt;-f&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;previous-sha&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Post-deploy verification&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;Check error rate in Datadog (should be &amp;lt;0.1%)&lt;/li&gt;
&lt;li&gt;Verify key user flows in staging mirror&lt;/li&gt;
&lt;li&gt;Monitor p95 latency for 10 minutes&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Claude&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;won&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;t load deployment procedures when you&amp;#39;&lt;/span&gt;&lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;writing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;It&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;loads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;them&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;when&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;signals&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;deployment&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;keeps&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;window&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;efficient&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;---&lt;/span&gt;

&lt;span class="c1"&gt;### 8. .claude/agents/ — Specialized Sub-Agents&lt;/span&gt;

&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;Isolated&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Custom&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Specific&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;roles&lt;/span&gt;&lt;span class="o"&gt;.**&lt;/span&gt;

&lt;span class="n"&gt;Agents&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;specialized&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Claude&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;instances&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;their&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;own&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;windows&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;system&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;prompts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;access&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;They&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;handle&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;focused&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;without&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;polluting&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;conversation&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="err"&gt;`&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;claude&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;md&lt;/span&gt;&lt;span class="err"&gt;`&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;
&lt;span class="err"&gt;```&lt;/span&gt;&lt;span class="n"&gt;markdown&lt;/span&gt;
&lt;span class="o"&gt;---&lt;/span&gt;
&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;reviewer&lt;/span&gt;
&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Performs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;thorough&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reviews&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Invoke&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;when&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reviewing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;PRs&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;checking&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;quality&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;claude&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;opus&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bash&lt;/span&gt;
&lt;span class="o"&gt;---&lt;/span&gt;

&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;senior&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;engineer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;specializing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;review&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;focus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;areas&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Correctness&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;edge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;handling&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Security&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;vulnerabilities&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;injection&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;auth&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bypass&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;exposure&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Performance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;implications&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;queries&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;unnecessary&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;allocations&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Maintainability&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;naming&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;complexity&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;coupling&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;coverage&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;test&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;quality&lt;/span&gt;

&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;direct&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;real&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;issues&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;clearly&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;don&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;t pad reviews with excessive praise.&lt;/span&gt;
&lt;span class="n"&gt;Format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;✅&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;⚠️&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;❌&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;categorize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;findings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/agents/security-auditor.md&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;---
name: security-auditor
description: Security-focused code analysis. Invoke for security reviews before major releases.
model: claude-opus-4-5
tools:
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;read
&lt;span class="gu"&gt;  - bash&lt;/span&gt;
&lt;span class="gu"&gt;---&lt;/span&gt;

You are a security engineer performing a threat-focused audit.

Check for:
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Injection vulnerabilities (SQL, command, LDAP)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Authentication and authorization flaws
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Insecure data exposure (logging PII, unencrypted storage)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Dependency vulnerabilities (&lt;span class="sb"&gt;`npm audit`&lt;/span&gt;)
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;Secrets in code or git history
&lt;span class="k"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;OWASP Top 10 issues

Report severity: CRITICAL / HIGH / MEDIUM / LOW.
For each issue: describe the vulnerability, show the affected code, explain the attack vector, recommend the fix.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Each agent operates in isolation — the security auditor's findings don't bleed into your main coding session. You get focused, expert-mode output from a clean context.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;9. .claude/hooks/ — Automated Guardrails&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;Event-driven. Runs before or after Claude takes action.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Hooks are shell scripts that execute automatically at defined trigger points. They're your last line of defense before Claude does something irreversible.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;&lt;code&gt;.claude/hooks/validate-bash.sh&lt;/code&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="ch"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="c1"&gt;# Pre-execution hook — validates bash commands before Claude runs them&lt;/span&gt;

&lt;span class="nv"&gt;COMMAND&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;

&lt;span class="c1"&gt;# Block destructive git operations&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$COMMAND&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;-qE&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;git push|git force|git reset --hard&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;BLOCKED: Direct git push/force operations require manual execution.&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c1"&gt;# Block production environment access&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$COMMAND&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;-qE&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;NODE_ENV=production|--env production&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;BLOCKED: Production commands must be run through CI/CD pipeline.&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c1"&gt;# Block deletion of critical files&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$COMMAND&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;-qE&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;rm -rf|rmdir /s&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;BLOCKED: Recursive deletion requires manual confirmation.&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c1"&gt;# Block direct database mutations in production&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="nv"&gt;$COMMAND&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;grep&lt;span class="w"&gt; &lt;/span&gt;-qE&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;psql.*production|prisma migrate.*production&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;BLOCKED: Production database operations require DBA approval.&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;1&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c1"&gt;# Allow everything else&lt;/span&gt;
&lt;span class="nb"&gt;exit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Hook trigger points:&lt;/strong&gt;
- &lt;code&gt;preToolUse&lt;/code&gt; — runs before any tool execution (most common)
- &lt;code&gt;postToolUse&lt;/code&gt; — runs after tool execution (for logging, formatting)
- &lt;code&gt;preFileWrite&lt;/code&gt; — runs before writing to a file
- &lt;code&gt;postFileWrite&lt;/code&gt; — auto-lint or format after Claude writes code&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Practical hook uses:&lt;/strong&gt;
- Auto-run Prettier after every file write
- Block writes to &lt;code&gt;.env&lt;/code&gt; files
- Log all bash commands to an audit trail
- Run ESLint on modified files and report errors back to Claude&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Putting It All Together&lt;/h2&gt;
&lt;p&gt;Here's how a real session plays out with a fully structured project:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Developer&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Review the PR for the new payment webhook handler&amp;quot;&lt;/span&gt;

&lt;span class="nv"&gt;Claude&lt;/span&gt;:
&lt;span class="mi"&gt;1&lt;/span&gt;.&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Loads&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;CLAUDE&lt;/span&gt;.&lt;span class="nv"&gt;md&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;understands&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;it&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;s a Fastify/Prisma project&lt;/span&gt;
&lt;span class="err"&gt;2. Loads CLAUDE.local.md → knows to show diff format, use port 5433&lt;/span&gt;
&lt;span class="err"&gt;3. MCP GitHub connection → fetches the PR diff automatically&lt;/span&gt;
&lt;span class="err"&gt;4. Loads .claude/rules/code-style.md → applies TypeScript standards&lt;/span&gt;
&lt;span class="err"&gt;5. Loads .claude/rules/api-conventions.md → checks endpoint structure&lt;/span&gt;
&lt;span class="err"&gt;6. Invokes code-reviewer agent → isolated, focused review context&lt;/span&gt;
&lt;span class="err"&gt;7. Hook: validate-bash.sh → validates any commands before running&lt;/span&gt;
&lt;span class="err"&gt;8. Output: structured review with ✅ ⚠️ ❌ findings&lt;/span&gt;

&lt;span class="err"&gt;Total prompting required from developer: one sentence.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Without structure, that same task requires several paragraphs of context, repeated for every session.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Starter Template&lt;/h2&gt;
&lt;p&gt;Clone and adapt this minimal structure to get started:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;mkdir&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;.claude/&lt;span class="o"&gt;{&lt;/span&gt;rules,commands,skills,agents,hooks&lt;span class="o"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Create the essential files&lt;/span&gt;
touch&lt;span class="w"&gt; &lt;/span&gt;CLAUDE.md
touch&lt;span class="w"&gt; &lt;/span&gt;CLAUDE.local.md
&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;CLAUDE.local.md&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&amp;gt;&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;.gitignore
&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;.claude/settings.local.json&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&amp;gt;&amp;gt;&lt;span class="w"&gt; &lt;/span&gt;.gitignore

touch&lt;span class="w"&gt; &lt;/span&gt;.mcp.json
touch&lt;span class="w"&gt; &lt;/span&gt;.claude/settings.json
touch&lt;span class="w"&gt; &lt;/span&gt;.claude/rules/code-style.md
touch&lt;span class="w"&gt; &lt;/span&gt;.claude/rules/testing.md
touch&lt;span class="w"&gt; &lt;/span&gt;.claude/commands/review.md
touch&lt;span class="w"&gt; &lt;/span&gt;.claude/hooks/validate-bash.sh
chmod&lt;span class="w"&gt; &lt;/span&gt;+x&lt;span class="w"&gt; &lt;/span&gt;.claude/hooks/validate-bash.sh
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Then fill in &lt;code&gt;CLAUDE.md&lt;/code&gt; with your project context. That's the highest-leverage starting point — everything else builds on it.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Key Principles&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;1. CLAUDE.md is infrastructure, not documentation.&lt;/strong&gt;
Write it for Claude, not for humans. It should enable immediate, correct action — not explain things a human already knows from the codebase.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Separate shared from personal.&lt;/strong&gt;
&lt;code&gt;.md&lt;/code&gt; → committed. &lt;code&gt;.local.md&lt;/code&gt; → gitignored. Team standards stay consistent. Individual preferences stay personal.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Keep context lean.&lt;/strong&gt;
Rules and skills load contextually for a reason. Don't dump everything into &lt;code&gt;CLAUDE.md&lt;/code&gt;. A bloated context window dilutes attention on what matters.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Hooks are guardrails, not restrictions.&lt;/strong&gt;
They block irreversible operations and enforce automation — they don't limit what Claude can help you think through. Block the action, not the thinking.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Agents for isolation, commands for workflow.&lt;/strong&gt;
Complex multi-step workflows → commands. Tasks requiring focused, expert-mode reasoning → agents. Both beat re-explaining the same thing every session.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/docs/claude-code"&gt;Claude Code Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/"&gt;MCP (Model Context Protocol) Spec&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/anthropics/claude-code"&gt;Claude Code GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://agentbuild.ai"&gt;agentbuild.ai — Community learning Agentic AI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Credit: Project structure diagram by &lt;a href="https://agentbuild.ai"&gt;Sandipan Bhaumik&lt;/a&gt; — Data &amp;amp; AI Leader at agentbuild.ai&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Found this useful? ⭐ Star the repo and share it with your team.&lt;/em&gt;
&lt;em&gt;Have additions or corrections? Open an issue or submit a PR.&lt;/em&gt;&lt;/p&gt;</content><category term="GenAI"/><category term="claude"/><category term="claude-code"/><category term="project-structure"/><category term="anthropic"/><category term="ai-tools"/><category term="llm"/><category term="developer-tools"/></entry><entry><title>LLM Fine-Tuning vs RAG: When to Use Which</title><link href="https://vinayakvitthal.github.io/llm-finetuning-vs-rag-when-to-use-which.html" rel="alternate"/><published>2026-04-18T00:00:00+05:30</published><updated>2026-04-18T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-18:/llm-finetuning-vs-rag-when-to-use-which.html</id><summary type="html">&lt;h1&gt;LLM Fine-Tuning vs RAG: When to Use Which&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A practical decision framework for teams building with LLMs — with real trade-offs, cost analysis, and when to combine both&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-core-question"&gt;The Core Question&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-is-rag"&gt;What Is RAG?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-is-fine-tuning"&gt;What Is Fine-Tuning?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#head-to-head-comparison"&gt;Head-to-Head Comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#when-to-choose-rag"&gt;When to Choose RAG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#when-to-choose-fine-tuning"&gt;When to Choose Fine-Tuning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#when-to-use-both"&gt;When …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;h1&gt;LLM Fine-Tuning vs RAG: When to Use Which&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A practical decision framework for teams building with LLMs — with real trade-offs, cost analysis, and when to combine both&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#the-core-question"&gt;The Core Question&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-is-rag"&gt;What Is RAG?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-is-fine-tuning"&gt;What Is Fine-Tuning?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#head-to-head-comparison"&gt;Head-to-Head Comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#when-to-choose-rag"&gt;When to Choose RAG&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#when-to-choose-fine-tuning"&gt;When to Choose Fine-Tuning&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#when-to-use-both"&gt;When to Use Both&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#cost--complexity-analysis"&gt;Cost &amp;amp; Complexity Analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#decision-framework"&gt;Decision Framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#implementation-quickstart"&gt;Implementation Quickstart&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#common-mistakes"&gt;Common Mistakes&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#resources"&gt;Resources&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;The Core Question&lt;/h2&gt;
&lt;p&gt;You're building an AI product. Your LLM doesn't know your data, your domain, or your tone. How do you fix that?&lt;/p&gt;
&lt;p&gt;Two approaches dominate:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;RAG (Retrieval-Augmented Generation):&lt;/strong&gt; Give the model relevant information at query time by retrieving it from a knowledge base.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fine-Tuning:&lt;/strong&gt; Re-train the model on your data so the knowledge is baked into the weights.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Both work. Both have real trade-offs. Picking the wrong one costs months and thousands of dollars. This guide gives you a clear framework for deciding.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;What Is RAG?&lt;/h2&gt;
&lt;p&gt;RAG keeps the base model frozen and dynamically injects relevant context at inference time.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Query&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Embed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Search&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Retrieve&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Augmented&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;
&lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;generates&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;grounded&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;retrieved&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The pipeline:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pinecone&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;pc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;YOUR_KEY&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;knowledge-base&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;rag_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Embed the question&lt;/span&gt;
    &lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;text-embedding-3-small&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Retrieve relevant chunks&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;text&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Generate grounded answer&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;gpt-4o&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;system&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Answer using only the provided context. If the answer isn&amp;#39;t in the context, say so.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;user&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;What Is Fine-Tuning?&lt;/h2&gt;
&lt;p&gt;Fine-tuning continues training a pre-trained model on your dataset, updating its weights to encode new knowledge, style, or behavior.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;Base Model (frozen knowledge)
    ↓
[Your training data: (prompt, ideal_response) pairs]
    ↓
[Gradient updates via supervised learning]
    ↓
Fine-Tuned Model (knowledge baked into weights)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Training data format (OpenAI JSONL):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;messages&amp;quot;&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;{&amp;quot;role&amp;quot;: &amp;quot;system&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;You are a support agent for Acme SaaS.&amp;quot;}, {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;How do I reset my API key?&amp;quot;}, {&amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;To reset your API key: go to Settings → API → Regenerate Key. Your old key is immediately invalidated.&amp;quot;}&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;
&lt;span class="err"&gt;{&lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;messages&amp;quot;&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;{&amp;quot;role&amp;quot;: &amp;quot;system&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;You are a support agent for Acme SaaS.&amp;quot;}, {&amp;quot;role&amp;quot;: &amp;quot;user&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;What&amp;#39;s the rate limit on the free plan?&amp;quot;}, {&amp;quot;role&amp;quot;: &amp;quot;assistant&amp;quot;, &amp;quot;content&amp;quot;: &amp;quot;Free plan: 100 requests/minute, 10,000 requests/month. Upgrade to Pro for 1,000 req/min.&amp;quot;}&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="err"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Launching a fine-tune (OpenAI):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Upload training file&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;training_data.jsonl&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;rb&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;file&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;files&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;purpose&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;fine-tune&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start fine-tune job&lt;/span&gt;
&lt;span class="n"&gt;job&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fine_tuning&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;jobs&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;training_file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;file&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;gpt-4o-mini-2024-07-18&amp;quot;&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Fine-tune job started: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;job&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Monitor: client.fine_tuning.jobs.retrieve(job.id)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;Head-to-Head Comparison&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;RAG&lt;/th&gt;
&lt;th&gt;Fine-Tuning&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Knowledge updates&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Real-time — just update the DB&lt;/td&gt;
&lt;td&gt;Requires retraining (hours/days)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data freshness&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Always current&lt;/td&gt;
&lt;td&gt;Stale until retrained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Setup complexity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Medium (pipeline + vector DB)&lt;/td&gt;
&lt;td&gt;High (data prep + training loop)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cost to update&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low (upsert new docs)&lt;/td&gt;
&lt;td&gt;High (full training run)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Inference cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Higher (embedding + retrieval + generation)&lt;/td&gt;
&lt;td&gt;Lower (just generation)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Handles new facts&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;td&gt;❌ Needs retraining&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Changes model behavior/style&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Limited&lt;/td&gt;
&lt;td&gt;✅ Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reduces hallucination&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Strong (grounded in retrieved text)&lt;/td&gt;
&lt;td&gt;⚠️ Moderate&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data requirements&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Documents/chunks&lt;/td&gt;
&lt;td&gt;50–1000+ (prompt, response) pairs&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Transparency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High (can cite sources)&lt;/td&gt;
&lt;td&gt;Low (black box)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Privacy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Data stays in your DB&lt;/td&gt;
&lt;td&gt;Data sent to training provider&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Time to production&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Days&lt;/td&gt;
&lt;td&gt;Weeks&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2&gt;When to Choose RAG&lt;/h2&gt;
&lt;p&gt;RAG is the right default for most teams. Choose it when:&lt;/p&gt;
&lt;h3&gt;✅ Your knowledge changes frequently&lt;/h3&gt;
&lt;p&gt;News, product documentation, pricing, inventory, policy — anything that updates weekly, daily, or in real time. Retraining a model every time your docs change is impractical. RAG lets you update your knowledge base and the model immediately reflects it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Good&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;RAG&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;cases:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Internal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;knowledge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;base&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;assistant&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Customer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;support&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bot&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;evolving&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Legal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Q&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;regulations&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;change&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;E&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;commerce&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;catalog&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Q&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;News&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;summarization&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;research&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;assistant&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;✅ You need source citations&lt;/h3&gt;
&lt;p&gt;RAG retrieves specific chunks — you always know which document the answer came from. This is essential for compliance, legal, and medical contexts where "the AI told me" isn't sufficient.&lt;/p&gt;
&lt;h3&gt;✅ You have large volumes of long-tail knowledge&lt;/h3&gt;
&lt;p&gt;A model can't memorize 50,000 support articles. RAG surfaces the right 3 at query time. Fine-tuning on 50,000 articles would require enormous training data and still wouldn't guarantee retrieval of the right fact.&lt;/p&gt;
&lt;h3&gt;✅ You're prototyping or iterating fast&lt;/h3&gt;
&lt;p&gt;Stand up a RAG pipeline in a day. Fine-tuning takes weeks of data preparation, training, and evaluation. Ship with RAG, decide later if fine-tuning adds enough value.&lt;/p&gt;
&lt;h3&gt;✅ Reducing hallucinations is the priority&lt;/h3&gt;
&lt;p&gt;By forcing the model to answer from retrieved context, RAG significantly reduces hallucinations on factual questions. It's not perfect, but it's the most reliable grounding technique available today.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;When to Choose Fine-Tuning&lt;/h2&gt;
&lt;p&gt;Fine-tuning earns its cost when RAG can't solve the problem.&lt;/p&gt;
&lt;h3&gt;✅ You need to change &lt;em&gt;how&lt;/em&gt; the model behaves, not just what it knows&lt;/h3&gt;
&lt;p&gt;RAG adds context. Fine-tuning changes behavior. If you need the model to consistently write in your brand's voice, follow a specific output schema every time, or reason like a domain expert — fine-tuning is the lever.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Good&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;fine&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;tuning&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;cases&lt;/span&gt;:
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Consistent&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;brand&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;tone&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;across&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;all&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;outputs&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Domain&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;specific&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;reasoning&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;medical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;diagnosis&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;legal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;analysis&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Structured&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;output&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;compliance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;always&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;valid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;schema&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;generation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;internal&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;framework&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;style&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Language&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;localization&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;dialect&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;formality&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;level&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;✅ You have a well-defined, stable task&lt;/h3&gt;
&lt;p&gt;Fine-tuning excels at narrow, repeated tasks with clear right answers. Classify this support ticket. Extract these fields from this document. Convert this natural language query to SQL.&lt;/p&gt;
&lt;h3&gt;✅ Latency and cost matter at scale&lt;/h3&gt;
&lt;p&gt;RAG requires an embedding call + vector search + generation. Fine-tuning requires only generation. At very high volume (millions of queries/day), that difference matters. Fine-tuned smaller models can also match GPT-4 quality on narrow tasks at a fraction of the cost.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Fine-tuned gpt-4o-mini for SQL generation&lt;/span&gt;
&lt;span class="c1"&gt;# vs. RAG + gpt-4o for same task&lt;/span&gt;
&lt;span class="c1"&gt;# Cost difference at 1M queries/day: ~$800/day vs ~$120/day&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;✅ You have high-quality labeled examples (50+)&lt;/h3&gt;
&lt;p&gt;Fine-tuning requires (prompt, ideal_response) pairs. If you've already logged thousands of correct interactions, or have domain experts who can label examples, that's the signal you need.&lt;/p&gt;
&lt;h3&gt;✅ The task requires reasoning patterns, not facts&lt;/h3&gt;
&lt;p&gt;Teaching a model &lt;em&gt;how to think&lt;/em&gt; about a problem (legal reasoning, medical differential diagnosis, financial analysis frameworks) is better done through fine-tuning than RAG. You're not injecting facts — you're adjusting the reasoning process.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;When to Use Both&lt;/h2&gt;
&lt;p&gt;The most powerful production systems combine both. This is called &lt;strong&gt;Fine-Tuned RAG&lt;/strong&gt; or &lt;strong&gt;Retrieval-Augmented Fine-Tuning (RAFT)&lt;/strong&gt;.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;Fine-Tuning handles:          RAG handles:
- Output format               - Current facts
- Domain reasoning style      - Specific document retrieval
- Consistent tone             - Source citation
- Task-specific behavior      - Knowledge updates
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Real-world example — Cursor (AI code editor):&lt;/strong&gt;
- &lt;strong&gt;Fine-tuned&lt;/strong&gt; on code understanding, editing patterns, and diff formats
- &lt;strong&gt;RAG&lt;/strong&gt; over your local codebase for file-specific context&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real-world example — Medical AI assistant:&lt;/strong&gt;
- &lt;strong&gt;Fine-tuned&lt;/strong&gt; on clinical reasoning patterns and medical note formats
- &lt;strong&gt;RAG&lt;/strong&gt; over current drug databases, clinical guidelines, and patient records&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Implementation pattern:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fine_tuned_rag_query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Step 1: Retrieve relevant context (RAG)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;retrieve_context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Step 2: Query fine-tuned model with retrieved context&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;ft:gpt-4o-mini:your-org:your-model-id&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# fine-tuned model&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;system&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DOMAIN_SYSTEM_PROMPT&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;user&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Context:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;Cost &amp;amp; Complexity Analysis&lt;/h2&gt;
&lt;h3&gt;RAG Cost Profile&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;One-time&lt;/th&gt;
&lt;th&gt;Ongoing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Embedding documents&lt;/td&gt;
&lt;td&gt;$5–50 (1M tokens)&lt;/td&gt;
&lt;td&gt;Per update&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector DB hosting&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;$70–700/mo (Pinecone) or free (self-hosted)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference (per query)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~$0.002–0.01/query&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering setup&lt;/td&gt;
&lt;td&gt;2–5 days&lt;/td&gt;
&lt;td&gt;Low maintenance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;h3&gt;Fine-Tuning Cost Profile&lt;/h3&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;One-time&lt;/th&gt;
&lt;th&gt;Ongoing&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Data preparation&lt;/td&gt;
&lt;td&gt;1–4 weeks&lt;/td&gt;
&lt;td&gt;Per retrain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Training run&lt;/td&gt;
&lt;td&gt;$50–500 (small model)&lt;/td&gt;
&lt;td&gt;Per retrain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Evaluation&lt;/td&gt;
&lt;td&gt;1–2 weeks&lt;/td&gt;
&lt;td&gt;Per retrain&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Inference (per query)&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~30–50% cheaper than base&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Engineering setup&lt;/td&gt;
&lt;td&gt;3–8 weeks&lt;/td&gt;
&lt;td&gt;Medium maintenance&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Break-even rule of thumb:&lt;/strong&gt; Fine-tuning starts making financial sense when you have &amp;gt;500K queries/month on a well-defined task, AND the task is stable enough that you won't need frequent retraining.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Decision Framework&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;knowledge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;dynamic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;changes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;weekly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;more&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;RAG&lt;/span&gt;

&lt;span class="k"&gt;Do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;need&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;source&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;citations&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;RAG&lt;/span&gt;

&lt;span class="nv"&gt;Is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;primary&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;problem&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;behavior&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;style&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="nv"&gt;reasoning&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;consistency&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Fine&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;Tuning&lt;/span&gt;

&lt;span class="k"&gt;Do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;have&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;high&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;quality&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;labeled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;prompt&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;response&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;pairs&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;No&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;RAG&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;re not ready for fine-tuning)&lt;/span&gt;
&lt;span class="err"&gt;  └─ Yes → Fine-Tuning is viable&lt;/span&gt;

&lt;span class="err"&gt;Is latency/cost critical at &amp;gt;500K queries/month?&lt;/span&gt;
&lt;span class="err"&gt;  └─ Yes → Consider Fine-Tuning or Fine-Tuned RAG&lt;/span&gt;

&lt;span class="err"&gt;Are you still iterating on the product?&lt;/span&gt;
&lt;span class="err"&gt;  └─ Yes → RAG (faster to change)&lt;/span&gt;
&lt;span class="err"&gt;  └─ No, task is stable → Fine-Tuning&lt;/span&gt;

&lt;span class="err"&gt;Do you need both domain behavior AND current knowledge?&lt;/span&gt;
&lt;span class="err"&gt;  └─ Yes → Fine-Tuned RAG&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Default recommendation:&lt;/strong&gt; Start with RAG. It's faster, cheaper to iterate, and solves 80% of use cases. Add fine-tuning only when you have a stable task, quality training data, and a clear gap that RAG can't close.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Implementation Quickstart&lt;/h2&gt;
&lt;h3&gt;RAG in 30 minutes (Chroma + OpenAI)&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;chromadb&lt;span class="w"&gt; &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;chromadb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;chroma&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chroma&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;knowledge-base&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_documents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;text-embedding-3-small&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;docs&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;doc_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;docs&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;q_emb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;text-embedding-3-small&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;q_emb&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;documents&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;gpt-4o-mini&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;system&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Answer only from the context provided.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;user&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Context: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Q: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Fine-Tuning Checklist&lt;/h3&gt;
&lt;ul&gt;
&lt;li&gt;[ ] Collect 50–1000 (prompt, ideal_response) examples&lt;/li&gt;
&lt;li&gt;[ ] Ensure examples cover edge cases, not just easy ones&lt;/li&gt;
&lt;li&gt;[ ] Format as JSONL with &lt;code&gt;messages&lt;/code&gt; array (system, user, assistant)&lt;/li&gt;
&lt;li&gt;[ ] Hold out 10–20% as validation set&lt;/li&gt;
&lt;li&gt;[ ] Run fine-tune job (OpenAI, Together AI, or self-hosted with Axolotl)&lt;/li&gt;
&lt;li&gt;[ ] Evaluate on validation set — compare to base model&lt;/li&gt;
&lt;li&gt;[ ] A/B test in production with 5–10% traffic split&lt;/li&gt;
&lt;li&gt;[ ] Set up retraining pipeline for when data drifts&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Common Mistakes&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;RAG mistakes:&lt;/strong&gt;
- &lt;strong&gt;Chunks too large&lt;/strong&gt; — 500–1000 tokens per chunk is usually optimal. Larger chunks dilute relevance.
- &lt;strong&gt;No metadata filtering&lt;/strong&gt; — Always filter by date, category, or source before vector search.
- &lt;strong&gt;Skipping re-ranking&lt;/strong&gt; — Use a cross-encoder to re-rank retrieved chunks before passing to the LLM.
- &lt;strong&gt;Ignoring chunking strategy&lt;/strong&gt; — Sentence-based chunking often beats fixed-size for prose documents.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fine-tuning mistakes:&lt;/strong&gt;
- &lt;strong&gt;Too little data&lt;/strong&gt; — Under 50 examples rarely produces meaningful improvement.
- &lt;strong&gt;Low-quality examples&lt;/strong&gt; — 100 excellent examples beat 1,000 mediocre ones. Every time.
- &lt;strong&gt;Forgetting catastrophic forgetting&lt;/strong&gt; — Fine-tuning can degrade general capability. Test broadly, not just on your task.
- &lt;strong&gt;No evaluation set&lt;/strong&gt; — Without held-out validation, you can't tell if fine-tuning actually helped.
- &lt;strong&gt;Fine-tuning when prompt engineering would suffice&lt;/strong&gt; — Try a well-crafted few-shot prompt first. You might not need fine-tuning at all.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/fine-tuning"&gt;OpenAI Fine-Tuning Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://python.langchain.com/docs/concepts/rag/"&gt;LangChain RAG Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2403.10131"&gt;RAFT Paper — RAG + Fine-Tuning combined&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/OpenAccess-AI-Collective/axolotl"&gt;Axolotl — Open-source fine-tuning framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.llamaindex.ai/"&gt;LlamaIndex — Production RAG framework&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/explodinggradients/ragas"&gt;Ragas — Evaluate your RAG pipeline&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.together.ai/"&gt;Together AI — Affordable fine-tuning API&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Found this useful? ⭐ Star the repo and share it with your team.&lt;/em&gt;
&lt;em&gt;Have a use case or mistake I missed? Open an issue or submit a PR.&lt;/em&gt;&lt;/p&gt;</content><category term="GenAI"/><category term="fine-tuning"/><category term="rag"/><category term="llm"/><category term="retrieval-augmented-generation"/><category term="ai"/><category term="machine-learning"/><category term="gpt"/></entry><entry><title>Prompt Engineering: Techniques That Actually Matter</title><link href="https://vinayakvitthal.github.io/prompt-engineering-techniques-that-actually-matter.html" rel="alternate"/><published>2026-04-17T00:00:00+05:30</published><updated>2026-04-17T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-17:/prompt-engineering-techniques-that-actually-matter.html</id><summary type="html">&lt;h1&gt;Prompt Engineering: Techniques That Actually Matter&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A practical guide to getting reliable, high-quality outputs from LLMs — with real examples and patterns you can use today&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-prompt-engineering-still-matters"&gt;Why Prompt Engineering Still Matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#core-techniques"&gt;Core Techniques&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-be-explicit-about-format--length"&gt;1. Be Explicit About Format &amp;amp; Length&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-role--context-framing"&gt;2. Role + Context Framing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-chain-of-thought-cot-prompting"&gt;3. Chain-of-Thought (CoT) Prompting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-few-shot-examples"&gt;4 …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;h1&gt;Prompt Engineering: Techniques That Actually Matter&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;A practical guide to getting reliable, high-quality outputs from LLMs — with real examples and patterns you can use today&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#why-prompt-engineering-still-matters"&gt;Why Prompt Engineering Still Matters&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#core-techniques"&gt;Core Techniques&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-be-explicit-about-format--length"&gt;1. Be Explicit About Format &amp;amp; Length&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-role--context-framing"&gt;2. Role + Context Framing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-chain-of-thought-cot-prompting"&gt;3. Chain-of-Thought (CoT) Prompting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-few-shot-examples"&gt;4. Few-Shot Examples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#5-output-constraints--schema-forcing"&gt;5. Output Constraints &amp;amp; Schema Forcing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#6-negative-prompting--tell-it-what-not-to-do"&gt;6. Negative Prompting — Tell It What NOT to Do&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#7-react--reasoning--acting"&gt;7. ReAct — Reasoning + Acting&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#8-self-consistency--sampling"&gt;8. Self-Consistency &amp;amp; Sampling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#system-prompt-architecture"&gt;System Prompt Architecture&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#prompt-patterns-for-common-tasks"&gt;Prompt Patterns for Common Tasks&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-doesnt-work-and-why"&gt;What Doesn't Work (And Why)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#evaluation-how-to-know-if-your-prompt-is-good"&gt;Evaluation: How to Know If Your Prompt Is Good&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#resources"&gt;Resources&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Why Prompt Engineering Still Matters&lt;/h2&gt;
&lt;p&gt;With every new model release, someone declares "prompt engineering is dead." It isn't.&lt;/p&gt;
&lt;p&gt;Models are getting better at understanding intent — but the gap between a mediocre prompt and a great one still produces dramatically different results. In production systems, that gap means the difference between a feature that works reliably and one that fails 20% of the time.&lt;/p&gt;
&lt;p&gt;Prompt engineering is less about magic words and more about &lt;strong&gt;clear communication and constraint&lt;/strong&gt;. Think of it as writing a precise spec for a very capable but very literal contractor.&lt;/p&gt;
&lt;p&gt;This guide focuses on techniques that hold up across models and tasks — not tricks that worked once on GPT-3.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Core Techniques&lt;/h2&gt;
&lt;h3&gt;1. Be Explicit About Format &amp;amp; Length&lt;/h3&gt;
&lt;p&gt;The single highest-ROI change in most prompts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;❌ Vague:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;Summarize this article.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;✅ Explicit:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nx"&gt;Summarize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;article&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;bullet&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;points&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;under&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="nx"&gt;Focus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;only&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;business&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;implications&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Skip&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;technical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;details&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Why it works:&lt;/strong&gt; Models default to verbose, general outputs. Constraints force compression and prioritization — which is usually what you actually want.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Format options to specify:&lt;/strong&gt;
- Output length (&lt;code&gt;under 100 words&lt;/code&gt;, &lt;code&gt;exactly 3 items&lt;/code&gt;, &lt;code&gt;one paragraph&lt;/code&gt;)
- Structure (&lt;code&gt;bullet points&lt;/code&gt;, &lt;code&gt;numbered list&lt;/code&gt;, &lt;code&gt;JSON&lt;/code&gt;, &lt;code&gt;markdown table&lt;/code&gt;)
- Tone (&lt;code&gt;professional&lt;/code&gt;, &lt;code&gt;casual&lt;/code&gt;, &lt;code&gt;like you're explaining to a 10-year-old&lt;/code&gt;)
- What to omit (&lt;code&gt;no caveats&lt;/code&gt;, &lt;code&gt;no preamble&lt;/code&gt;, &lt;code&gt;don't repeat the question&lt;/code&gt;)&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;2. Role + Context Framing&lt;/h3&gt;
&lt;p&gt;Give the model an identity and a situation. This activates relevant "knowledge modes."&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;senior&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;engineer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reviewing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;pull&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;codebase&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;uses&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Python&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;3.11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Postgres&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;team&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;prioritizes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;readability&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;over&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;cleverness&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="n"&gt;Review&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;suggest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;improvements&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;here&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The key components:&lt;/strong&gt;
- &lt;strong&gt;Who&lt;/strong&gt; the model is (&lt;code&gt;senior backend engineer&lt;/code&gt;)
- &lt;strong&gt;What context&lt;/strong&gt; it's operating in (&lt;code&gt;FastAPI, Postgres&lt;/code&gt;)
- &lt;strong&gt;What it values&lt;/strong&gt; (&lt;code&gt;readability over cleverness&lt;/code&gt;)&lt;/p&gt;
&lt;p&gt;Role framing is especially powerful for:
- Code review (activates senior-engineer judgment)
- Writing (activates editorial voice)
- Analysis (activates consultant framing)
- Customer support drafts (activates empathetic tone)&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# In a system prompt for a production app&lt;/span&gt;
&lt;span class="n"&gt;SYSTEM_PROMPT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;span class="s2"&gt;You are a customer support specialist for Acme SaaS.&lt;/span&gt;
&lt;span class="s2"&gt;Your tone is warm, concise, and solution-focused.&lt;/span&gt;
&lt;span class="s2"&gt;You only answer questions about our product.&lt;/span&gt;
&lt;span class="s2"&gt;If you don&amp;#39;t know something, say so and offer to escalate.&lt;/span&gt;
&lt;span class="s2"&gt;Never make up features or pricing.&lt;/span&gt;
&lt;span class="s2"&gt;&amp;quot;&amp;quot;&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h3&gt;3. Chain-of-Thought (CoT) Prompting&lt;/h3&gt;
&lt;p&gt;For reasoning tasks, asking the model to "think step by step" dramatically improves accuracy.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;❌ Direct answer prompt:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;store&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;sells&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;apples&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;$0&lt;/span&gt;.&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;oranges&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;$0&lt;/span&gt;.&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;.
&lt;span class="k"&gt;If&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;buy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;apples&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;oranges&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;what&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;s the total?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;(Model often rushes to an answer and miscalculates)&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;✅ Chain-of-Thought:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;store&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;sells&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;apples&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;$0&lt;/span&gt;.&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;oranges&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mh"&gt;$0&lt;/span&gt;.&lt;span class="mi"&gt;75&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;each&lt;/span&gt;.
&lt;span class="k"&gt;If&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;buy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;apples&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;oranges&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;what&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;s the total?&lt;/span&gt;

&lt;span class="err"&gt;Think through this step by step before giving your final answer.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Zero-shot CoT trigger phrases:&lt;/strong&gt;
- &lt;code&gt;Think step by step.&lt;/code&gt;
- &lt;code&gt;Let's work through this carefully.&lt;/code&gt;
- &lt;code&gt;Break this down before answering.&lt;/code&gt;
- &lt;code&gt;Reason through the problem first.&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Few-shot CoT&lt;/strong&gt; (even more powerful for complex tasks):&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Q&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;If&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;have&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;each&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;remove&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;how&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;many&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;remain&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;me&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;through&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boxes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;×&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;total&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;remain&lt;/span&gt;
&lt;span class="w"&gt;   &lt;/span&gt;&lt;span class="n"&gt;Answer&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;19&lt;/span&gt;

&lt;span class="n"&gt;Q&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;here&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Let&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;me&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;work&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;through&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h3&gt;4. Few-Shot Examples&lt;/h3&gt;
&lt;p&gt;Show, don't just tell. Examples are often more reliable than instructions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The pattern:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;Classify customer feedback as: POSITIVE, NEGATIVE, or NEUTRAL.

Examples:
Input: &amp;quot;This product changed my life, absolutely love it!&amp;quot;
Output: POSITIVE

Input: &amp;quot;Arrived broken, terrible packaging.&amp;quot;
Output: NEGATIVE

Input: &amp;quot;It&amp;#39;s fine, does what it says.&amp;quot;
Output: NEUTRAL

Now classify:
Input: &amp;quot;Took forever to arrive but the quality is great.&amp;quot;
Output:
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Rules for good few-shot examples:&lt;/strong&gt;
- Use 2–5 examples (more isn't always better — it inflates context)
- Cover edge cases you care about
- Keep examples diverse, not repetitive
- Match the format you want in the output exactly
- Put your actual input last&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;5. Output Constraints &amp;amp; Schema Forcing&lt;/h3&gt;
&lt;p&gt;For applications that consume LLM output programmatically, JSON schema forcing is essential.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Without constraints:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;Extract&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;text&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;
&lt;span class="ss"&gt;&amp;quot;Hi, I&amp;#39;m Sarah Chen from Acme Corp. You can reach me at sarah@acme.com&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;em&gt;Output might be a paragraph, a list, a table — inconsistent and hard to parse.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;With schema forcing:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;Extract&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;information&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;text&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;below&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Respond&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;ONLY&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;matching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;exact&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;schema&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;—&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nc"&gt;text&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;

&lt;span class="err"&gt;{&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;name&amp;quot;&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;string&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;email&amp;quot;&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;string or null&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;company&amp;quot;&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;string or null&amp;quot;&lt;/span&gt;
&lt;span class="err"&gt;}&lt;/span&gt;

&lt;span class="nc"&gt;Text&lt;/span&gt;&lt;span class="err"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;&amp;quot;Hi, I&amp;#39;m Sarah Chen from Acme Corp. You can reach me at sarah@acme.com&amp;quot;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;In code — using structured outputs (OpenAI):&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ContactInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
    &lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;company&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;beta&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;gpt-4o-2024-08-06&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;user&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;Extract: Hi, I&amp;#39;m Sarah Chen from Acme Corp...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ContactInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parsed&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ContactInfo(name=&amp;#39;Sarah Chen&amp;#39;, email=&amp;#39;sarah@acme.com&amp;#39;, company=&amp;#39;Acme Corp&amp;#39;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h3&gt;6. Negative Prompting — Tell It What NOT to Do&lt;/h3&gt;
&lt;p&gt;Models respond well to explicit exclusions. Don't just describe what you want — describe what you don't want.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Write&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;short&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;product&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;description&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ergonomic&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;chair&lt;/span&gt;.

&lt;span class="k"&gt;Do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;NOT&lt;/span&gt;:
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;word&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;revolutionary&amp;quot;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;game-changing&amp;quot;&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Make&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;unverifiable&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;health&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;claims&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Exceed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;80&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;words&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;bulleted&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;list&lt;/span&gt;

&lt;span class="k"&gt;DO&lt;/span&gt;:
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Focus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;on&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;comfort&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;design&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;conversational&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;but&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;professional&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;tone&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Common things worth negating:&lt;/strong&gt;
- &lt;code&gt;Do not add caveats or disclaimers unless explicitly asked&lt;/code&gt;
- &lt;code&gt;Do not repeat back the question or instructions&lt;/code&gt;
- &lt;code&gt;Do not say "As an AI language model..."&lt;/code&gt;
- &lt;code&gt;Do not use filler phrases like "Certainly!" or "Great question!"&lt;/code&gt;
- &lt;code&gt;Do not invent information you're not certain about — say "I don't know" instead&lt;/code&gt;&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;7. ReAct — Reasoning + Acting&lt;/h3&gt;
&lt;p&gt;For &lt;strong&gt;agentic tasks&lt;/strong&gt; — where a model needs to take multiple steps, use tools, and observe results — the ReAct pattern structures the loop.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;have&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;access&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;these&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;returns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;search&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;calculator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;expression&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;evaluates&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;expressions&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;returns&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;weather&lt;/span&gt;

&lt;span class="n"&gt;To&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;complete&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="mf"&gt;1.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;THINK&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;about&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;what&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;need&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;next&lt;/span&gt;
&lt;span class="mf"&gt;2.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ACT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;tool&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;
&lt;span class="mf"&gt;3.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;OBSERVE&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;note&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;tool&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;s output&lt;/span&gt;
&lt;span class="mf"&gt;4.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Repeat&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;until&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;give&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;

&lt;span class="n"&gt;Task&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;What&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;s the temperature difference between the current temperature&lt;/span&gt;
&lt;span class="ow"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Mumbai&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ow"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boiling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;water&lt;/span&gt;&lt;span class="err"&gt;?&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Example model output with ReAct:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;THINK&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;need&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Mumbai&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;then&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;compare&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;water&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boiling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;).&lt;/span&gt;
&lt;span class="n"&gt;ACT&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Mumbai&amp;quot;&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;OBSERVE&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;temp&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Mumbai&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;

&lt;span class="n"&gt;THINK&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Boiling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;water&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Difference&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;34&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;66&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;ANSWER&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;The&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boiling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;point&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;water&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;66&lt;/span&gt;&lt;span class="err"&gt;°&lt;/span&gt;&lt;span class="n"&gt;C&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;higher&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;than&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Mumbai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This is the backbone of most AI agent frameworks (LangChain, LlamaIndex, AutoGPT).&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;8. Self-Consistency &amp;amp; Sampling&lt;/h3&gt;
&lt;p&gt;For high-stakes reasoning tasks, generate multiple responses and take a majority vote or the best-reasoned answer.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Counter&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;self_consistent_answer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;chat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;completions&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;gpt-4o&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;role&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;user&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;content&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Think step by step.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;temperature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;  &lt;span class="c1"&gt;# Higher temp = more diverse responses&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;choices&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Extract final answers and take majority vote&lt;/span&gt;
    &lt;span class="c1"&gt;# (In practice, parse the final answer from each response)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;responses&lt;/span&gt;  &lt;span class="c1"&gt;# Then analyze for consensus&lt;/span&gt;

&lt;span class="c1"&gt;# Useful for: classification, fact extraction, code generation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;When to use:&lt;/strong&gt; Legal/compliance classification, medical triage, financial decisions, any task where consistency matters more than speed.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;System Prompt Architecture&lt;/h2&gt;
&lt;p&gt;For production applications, think of your system prompt as having distinct sections:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[IDENTITY]&lt;/span&gt;
&lt;span class="na"&gt;You are [name/role], [brief description].&lt;/span&gt;

&lt;span class="k"&gt;[CONTEXT]&lt;/span&gt;
&lt;span class="na"&gt;You are operating as [product context].&lt;/span&gt;
&lt;span class="na"&gt;Users are [who they are].&lt;/span&gt;

&lt;span class="k"&gt;[CAPABILITIES]&lt;/span&gt;
&lt;span class="na"&gt;You can help with&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;[explicit list]&lt;/span&gt;
&lt;span class="na"&gt;You have access to&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;[tools/data]&lt;/span&gt;

&lt;span class="k"&gt;[CONSTRAINTS]&lt;/span&gt;
&lt;span class="na"&gt;Never&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;[hard limits]&lt;/span&gt;
&lt;span class="na"&gt;Always&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;[non-negotiables]&lt;/span&gt;
&lt;span class="na"&gt;If unsure&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;[fallback behavior]&lt;/span&gt;

&lt;span class="k"&gt;[FORMAT]&lt;/span&gt;
&lt;span class="na"&gt;Respond in [format].&lt;/span&gt;
&lt;span class="na"&gt;Keep responses [length guideline].&lt;/span&gt;
&lt;span class="na"&gt;Tone&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;[tone description].&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Real example:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Aria&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;support&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;assistant&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;CloudBase&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;developer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;infrastructure&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;platform&lt;/span&gt;.

&lt;span class="nv"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;talking&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;software&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;engineers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;DevOps&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;professionals&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;who&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;customers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;CloudBase&lt;/span&gt;.
&lt;span class="nv"&gt;They&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;expect&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;direct&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;technical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;answers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;without&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;hand&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;holding&lt;/span&gt;.

&lt;span class="nv"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;can&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;help&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;with&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;billing&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;questions&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;technical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;troubleshooting&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;account&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;settings&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;feature&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;explanations&lt;/span&gt;.
&lt;span class="nv"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;have&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;access&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;our&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;knowledge&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;base&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;user&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;s account information.&lt;/span&gt;

&lt;span class="nv"&gt;Never&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;make&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;up&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;pricing&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;promise&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;features&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;that&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;don&lt;/span&gt;&lt;span class="s1"&gt;&amp;#39;t exist, or share other customers&amp;#39;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;data&lt;/span&gt;.
&lt;span class="nv"&gt;Always&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;provide&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;documentation&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;links&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;when&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;explaining&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;technical&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;concepts&lt;/span&gt;.
&lt;span class="k"&gt;If&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;can&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;t resolve an issue: offer to create a support ticket with a 4-hour SLA.&lt;/span&gt;

&lt;span class="err"&gt;Keep responses under 150 words unless a technical explanation requires more.&lt;/span&gt;
&lt;span class="err"&gt;Use markdown formatting for code. Tone: direct, friendly, technical.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;Prompt Patterns for Common Tasks&lt;/h2&gt;
&lt;h3&gt;Summarization&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Summarize&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;following&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;document&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;audience&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;
&lt;span class="n"&gt;Focus&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;on&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;themes&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;
&lt;span class="nl"&gt;Ignore&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;what&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;skip&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;
&lt;span class="nl"&gt;Format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;bullet&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;points&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;paragraph&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;executive&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;
&lt;span class="nl"&gt;Length&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;under&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;N&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;words&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Classification&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Classify&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;exactly&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;these&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;categories&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;[&lt;span class="nv"&gt;A&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;B&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;C&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;D&lt;/span&gt;].
&lt;span class="k"&gt;If&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;ambiguous&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;choose&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;closest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;match&lt;/span&gt;.
&lt;span class="nv"&gt;Respond&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;only&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;category&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;name&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;—&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;explanation&lt;/span&gt;.

&lt;span class="nv"&gt;Examples&lt;/span&gt;:
[&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;labeled&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;examples&lt;/span&gt;]

&lt;span class="nv"&gt;Input&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;[&lt;span class="nv"&gt;text&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;classify&lt;/span&gt;]
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Code Generation&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Write&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;that&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;specific&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;behavior&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;
&lt;span class="nl"&gt;Requirements&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;requirement&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;requirement&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nl"&gt;Include&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;hints&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;A&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;docstring&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2-3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;unit&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;tests&lt;/span&gt;
&lt;span class="n"&gt;Do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;not&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;include&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;package&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;imports&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;I&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;didn&lt;/span&gt;&lt;span class="err"&gt;&amp;#39;&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;ask&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;boilerplate&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Data Extraction&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;Extract&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;following&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;fields&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;from&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;below&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;If&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;field&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;missing&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;null&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;Return&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;only&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;valid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;JSON&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;no&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nl"&gt;Fields&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;field1&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;field2&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;field3&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;type&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nl"&gt;Text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;What Doesn't Work (And Why)&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Technique&lt;/th&gt;
&lt;th&gt;Why It Fails&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"Answer as best you can"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Too vague — the model already tries to do this&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"Be creative but accurate"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Contradictory constraints confuse the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Extremely long system prompts&lt;/td&gt;
&lt;td&gt;Critical instructions at the end get lost (recency/primacy bias)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"Never make mistakes"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Models can't guarantee correctness — adds false confidence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Repeating the same instruction 5x&lt;/td&gt;
&lt;td&gt;Repetition ≠ emphasis; use structure instead&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;"Think outside the box"&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Generic phrase with no actionable meaning&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Over-constraining&lt;/td&gt;
&lt;td&gt;Too many "don't do X" rules creates failure modes&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;The #1 failure mode:&lt;/strong&gt; Prompts that describe &lt;em&gt;what you want the output to look like&lt;/em&gt; but not &lt;em&gt;what the model should actually reason about&lt;/em&gt;. Describe the reasoning process, not just the output.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Evaluation: How to Know If Your Prompt Is Good&lt;/h2&gt;
&lt;p&gt;Gut feeling isn't good enough for production. Use these methods:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. Regression testing&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;TEST_CASES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;input&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;expected&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;POSITIVE&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;input&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;...&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;expected&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;NEGATIVE&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# 20-50 cases covering edge cases&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;eval_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;correct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;output&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;call_llm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;input&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;case&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;expected&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;correct&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;correct&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;test_cases&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;2. LLM-as-judge&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;You&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;evaluating&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;an&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;AI&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;quality&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="nl"&gt;Criteria&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Accuracy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1-5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;information&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;correct&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Relevance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1-5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Does&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;answer&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;asked&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Conciseness&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1-5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;Is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;appropriately&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;brief&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;

&lt;span class="nl"&gt;Question&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;original&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nl"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;Score&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;each&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;criterion&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;provide&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;one&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;sentence&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;justification&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;3. A/B testing in production&lt;/strong&gt; — Route 5-10% of traffic to a new prompt variant. Measure task completion, user corrections, escalation rate.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/prompt-engineering"&gt;OpenAI Prompt Engineering Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/overview"&gt;Anthropic's Prompt Engineering Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2201.11903"&gt;Chain-of-Thought Paper (Wei et al., 2022)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2210.03629"&gt;ReAct Paper (Yao et al., 2022)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arxiv.org/abs/2203.11171"&gt;Self-Consistency Paper (Wang et al., 2022)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/stanfordnlp/dspy"&gt;DSPY — Programmatic Prompt Optimization&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/promptfoo/promptfoo"&gt;PromptFoo — Prompt Testing Framework&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Found this useful? ⭐ Star the repo and share it with your team.&lt;/em&gt;&lt;br&gt;
&lt;em&gt;Have a technique I missed? Open an issue or submit a PR.&lt;/em&gt;&lt;/p&gt;</content><category term="GenAI"/><category term="prompt-engineering"/><category term="llm"/><category term="gpt"/><category term="claude"/><category term="few-shot"/><category term="chain-of-thought"/><category term="ai"/><category term="generative-ai"/></entry><entry><title>Claude's Hidden Power: Skills, Plugins, and the .md Files That Make It Extraordinary</title><link href="https://vinayakvitthal.github.io/claude-skills-plugins-md-files.html" rel="alternate"/><published>2026-04-15T00:00:00+05:30</published><updated>2026-04-15T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-15:/claude-skills-plugins-md-files.html</id><summary type="html">&lt;p&gt;Claude's Hidden Power: Skills, Plugins, and the .md Files That Make It Extraordinary**&lt;/p&gt;
&lt;p&gt;Most people use Claude like a search engine. The ones getting extraordinary results have figured out something different — and it starts with a plain text file.*&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;There is a version of Claude that writes generic emails and …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Claude's Hidden Power: Skills, Plugins, and the .md Files That Make It Extraordinary**&lt;/p&gt;
&lt;p&gt;Most people use Claude like a search engine. The ones getting extraordinary results have figured out something different — and it starts with a plain text file.*&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;There is a version of Claude that writes generic emails and summarizes articles. And then there is the version that builds full PowerPoint decks, generates production-ready PDFs, writes Word documents with tables of contents, connects to your Google Calendar and drafts meeting invites — all from a single prompt.&lt;/p&gt;
&lt;p&gt;The difference is not a smarter model. It is a system called &lt;strong&gt;Skills&lt;/strong&gt; — and almost nobody talks about it.&lt;/p&gt;
&lt;p&gt;This article is your complete guide. We will cover what Skills are, how .md files give Claude expert-level instructions, and how plugins and MCP tools turn Claude into an autonomous agent that can actually get things done.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Why Claude feels different from other AI tools&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Most AI assistants are stateless — they respond to whatever you type with their general training. Claude, when used through Anthropic's platform, is different. It has access to a structured file system of instructions, and it reads those instructions before tackling your task.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Think of Claude as a brilliant generalist who, when given a specialized task, reaches for the right training manual before starting. That manual is a Skill file — a plain Markdown document containing expert-level instructions for exactly that type of work.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;These Skill files live at paths like &lt;code&gt;/mnt/skills/public/pptx/SKILL.md&lt;/code&gt; and &lt;code&gt;/mnt/skills/public/pdf/SKILL.md&lt;/code&gt;. When you ask Claude to "create a PowerPoint," it does not just start generating slides. It first reads the PPTX Skill file, absorbs the best practices encoded there, and then executes your request with the precision of someone who has built hundreds of decks.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;What exactly is a .md file?&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Markdown (the .md extension) is a lightweight text format created in 2004. You write plain text with simple symbols, and it renders as formatted content. A # becomes a heading. A &lt;strong&gt;word&lt;/strong&gt; becomes bold. A - starts a bullet list.&lt;/p&gt;
&lt;p&gt;Here is what a minimal Markdown file looks like:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="err"&gt;#&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;My&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Skill&lt;/span&gt;

&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Overview&lt;/span&gt;
&lt;span class="nx"&gt;This&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;skill&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;creates&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;professional&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;PDF&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;reports&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;

&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Steps&lt;/span&gt;
&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Install&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;required&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kn"&gt;library&lt;/span&gt;
&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;up&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;document&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;structure&lt;/span&gt;
&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Add&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;content&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;and&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;styling&lt;/span&gt;
&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Save&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;output&lt;/span&gt;

&lt;span class="err"&gt;##&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Best&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;practices&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Always&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;A4&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;page&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;size&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;business&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;docs&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Keep&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;fonts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;to&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;families&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;maximum&lt;/span&gt;
&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Never&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;skip&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;table&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;contents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;That is it. No code. No complex configuration. Just structured text that Claude can read and follow. This simplicity is the genius of the system — anyone can write a Skill file, and Claude can follow it perfectly.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;The anatomy of a Skill file&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A well-crafted SKILL.md typically has six parts. Understanding them will help you both use existing skills effectively and write your own.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;1. The frontmatter block&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;At the very top, a YAML block (between triple dashes) contains metadata: the skill's name, a description, and trigger phrases that tell Claude when to load it.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;---
name: pdf
description: Use this skill whenever the user wants to
             create, read, or manipulate PDF files.
             Triggers: &amp;#39;create PDF&amp;#39;, &amp;#39;pdf report&amp;#39;, &amp;#39;.pdf&amp;#39;
---
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This description is critical. It is how Claude matches your request to the right skill. A vague description means the skill never gets triggered.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Overview&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A brief explanation of what the skill does, what libraries it uses, and any important limitations to keep in mind.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. Quick Start&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The minimum viable code or steps to get something working. Claude prioritizes this when you need a fast result.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Detailed instructions&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Step-by-step guidance for the full range of scenarios, including edge cases. This is the bulk of any serious Skill file.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Best practices and warnings&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The hard-won wisdom — things that break, common mistakes, and the non-obvious rules that separate a mediocre output from a great one. For example, the ReportLab PDF skill contains an explicit warning: never use Unicode subscript characters in PDFs because the built-in fonts do not support them and they render as solid black boxes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. Quick reference table&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;A summary of the most common operations at a glance. Claude uses this to quickly orient itself in complex tasks.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Every time Claude uses a skill, it is following a recipe that has been tested and refined. You are not getting a one-off guess — you are getting a repeatable, high-quality process. This is why Claude can produce a 20-page formatted PDF with headers, tables, and page numbers in under 30 seconds.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Plugins: when Claude needs to act, not just think&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Skills give Claude knowledge. Plugins give Claude hands.&lt;/p&gt;
&lt;p&gt;In Claude's ecosystem, a plugin (also called a tool or connector) is an integration that lets Claude interact with the real world. Send an email. Read a calendar. Create a file. Search the web. Run terminal commands.&lt;/p&gt;
&lt;p&gt;Claude's built-in tools include:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;web_search&lt;/strong&gt; — Search the internet for current information&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;bash_tool&lt;/strong&gt; — Run actual terminal commands on a Linux machine&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;create_file&lt;/strong&gt; — Generate and save files of any type&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;view&lt;/strong&gt; — Read files and directories (including Skill files)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;image_search&lt;/strong&gt; — Find and display images from the web&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;places_search&lt;/strong&gt; — Search Google Maps for locations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;weather_fetch&lt;/strong&gt; — Get real-time weather data&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;But the more interesting category is MCP servers.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;MCP: the protocol that connects Claude to everything&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;MCP stands for Model Context Protocol — an open standard Anthropic developed for connecting AI models to external services. Think of it as the USB standard for AI integrations. Once a service implements MCP, Claude can use it without any custom code on your end.&lt;/p&gt;
&lt;p&gt;Popular MCP connectors include Google Calendar, Gmail, Google Drive, Slack, Figma, Jira, GitHub, and dozens more. When connected, Claude does not just know about these services — it can actively use them.&lt;/p&gt;
&lt;blockquote&gt;
&lt;p&gt;Try this prompt with Google Calendar and Gmail connected: "Check what I have tomorrow, identify the longest gap in my schedule, and draft an email to my team suggesting we use that time for a sync." Claude will read your calendar, analyze it, and compose the email — all in one shot.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;How skills and plugins work together&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The real magic happens when skills and tools combine. Here is what Claude does when you ask it to "research AI funding trends and create a professional report":&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reads the docx or pdf Skill file to understand how to create a professional document&lt;/li&gt;
&lt;li&gt;Uses web_search to find recent articles and data&lt;/li&gt;
&lt;li&gt;Uses web_fetch to read full article content&lt;/li&gt;
&lt;li&gt;Uses bash_tool to install any required Python libraries&lt;/li&gt;
&lt;li&gt;Uses create_file to write and execute the document generation code&lt;/li&gt;
&lt;li&gt;Uses present_files to give you a download link&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Six tools, one prompt, zero manual steps. This is the architecture that makes Claude feel qualitatively different from a simple chatbot.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Writing your own Skill file&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Here is the part most guides skip: you can write your own skill files and Claude will use them. If you have a task you do repeatedly — formatting a specific type of report, writing emails in a certain style, processing data in a particular way — you can encode that knowledge in a .md file.&lt;/p&gt;
&lt;p&gt;The structure is straightforward. Start with a YAML frontmatter block. Write a clear description with trigger phrases. Add an overview, quick-start section, detailed steps, and best practices. Upload it and tell Claude to use it.&lt;/p&gt;
&lt;p&gt;Three things that make a skill file great:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Write precise triggers.&lt;/strong&gt; The description field determines when Claude uses your skill. Be specific about which requests should trigger it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Include real examples.&lt;/strong&gt; Code snippets and sample outputs in your skill file give Claude concrete patterns to follow, not abstract rules.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Document edge cases.&lt;/strong&gt; The most valuable part of any skill file is the warnings section — what breaks, what to avoid, what looks right but isn't.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;The prompt patterns that unlock all of this&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Knowing that skills and tools exist changes how you should prompt Claude. Instead of describing what you want in vague terms, you can be explicit about the output and let the skill system handle the how.&lt;/p&gt;
&lt;p&gt;Some prompts that unlock the full capability:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;"Create a &lt;strong&gt;PDF report&lt;/strong&gt; on [topic] with a table of contents, charts, and page numbers."&lt;/li&gt;
&lt;li&gt;"Build a &lt;strong&gt;PowerPoint presentation&lt;/strong&gt; about [subject] with 10 slides, a consistent theme, and speaker notes."&lt;/li&gt;
&lt;li&gt;"Read my &lt;strong&gt;Google Calendar&lt;/strong&gt; for this week and create a time-blocking schedule as a Word document."&lt;/li&gt;
&lt;li&gt;"Search for the latest news on [topic], summarize the key findings, and create an email I can send to my team."&lt;/li&gt;
&lt;li&gt;"Write a &lt;strong&gt;React component&lt;/strong&gt; for a data table, then create a downloadable HTML file I can use immediately."&lt;/li&gt;
&lt;/ul&gt;
&lt;blockquote&gt;
&lt;p&gt;Specify the output format. The moment you say "PDF," "Word document," "PowerPoint," or "React component," Claude knows to load the relevant Skill file. Vague requests get vague results. Specific output formats trigger expert-level execution.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;Where to go from here&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The skills system is, at its core, a knowledge transfer mechanism. Experts encode their best practices into .md files. Claude reads those files and executes accordingly. The barrier between knowing how to do something well and actually doing it well collapses.&lt;/p&gt;
&lt;p&gt;Start by exploring what skills already exist. Ask Claude to "list the available skills" or try prompts that trigger the PDF, DOCX, or PPTX skills — notice how the output quality jumps compared to a generic request. Then, think about the repetitive tasks in your own work and consider what a Skill file for those tasks would look like.&lt;/p&gt;
&lt;p&gt;The people getting the most out of Claude are not the ones with the cleverest prompts. They are the ones who understand the architecture — and use it deliberately.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Try it right now: Open Claude and type: "Create a professional PDF report on any topic you choose, with a cover page, table of contents, and at least 3 sections." Watch what happens when a skill kicks in.&lt;/em&gt;&lt;/p&gt;</content><category term="GenAI"/><category term="claude"/><category term="skills"/><category term="plugins"/><category term="mcp"/><category term="llm"/><category term="markdown"/><category term="anthropic"/><category term="ai-tools"/></entry><entry><title>Vector Databases &amp; Embeddings: The Engine Behind Modern AI Applications</title><link href="https://vinayakvitthal.github.io/vectordb-embeddings-engine-behind-modern-ai.html" rel="alternate"/><published>2026-04-15T00:00:00+05:30</published><updated>2026-04-15T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-15:/vectordb-embeddings-engine-behind-modern-ai.html</id><summary type="html">&lt;h1&gt;Vector Databases &amp;amp; Embeddings: The Engine Behind Modern AI Applications&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;How the technology powering semantic search, recommendation systems, and RAG is quietly reshaping software development&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-are-embeddings"&gt;What Are Embeddings?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-is-a-vector-database"&gt;What Is a Vector Database?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#real-world-use-cases"&gt;Real-World Use Cases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-semantic-search"&gt;Semantic Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-retrieval-augmented-generation-rag"&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-recommendation-systems"&gt;Recommendation Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-anomaly-detection--fraud-prevention"&gt;Anomaly Detection &amp;amp; Fraud Prevention&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#5-multimodal-search"&gt;Multimodal …&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;</summary><content type="html">&lt;h1&gt;Vector Databases &amp;amp; Embeddings: The Engine Behind Modern AI Applications&lt;/h1&gt;
&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;How the technology powering semantic search, recommendation systems, and RAG is quietly reshaping software development&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;
&lt;hr&gt;
&lt;h2&gt;Table of Contents&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="#what-are-embeddings"&gt;What Are Embeddings?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#what-is-a-vector-database"&gt;What Is a Vector Database?&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#real-world-use-cases"&gt;Real-World Use Cases&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#1-semantic-search"&gt;Semantic Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#2-retrieval-augmented-generation-rag"&gt;Retrieval-Augmented Generation (RAG)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#3-recommendation-systems"&gt;Recommendation Systems&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#4-anomaly-detection--fraud-prevention"&gt;Anomaly Detection &amp;amp; Fraud Prevention&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#5-multimodal-search"&gt;Multimodal Search&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#6-customer-support-automation"&gt;Customer Support Automation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#popular-vector-databases-at-a-glance"&gt;Popular Vector Databases at a Glance&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#quick-start-building-a-semantic-search-app"&gt;Quick Start: Building a Semantic Search App&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#choosing-the-right-tool"&gt;Choosing the Right Tool&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="#whats-next"&gt;What's Next?&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;What Are Embeddings?&lt;/h2&gt;
&lt;p&gt;An &lt;strong&gt;embedding&lt;/strong&gt; is a numerical representation of data — text, images, audio, or video — as a list of floating-point numbers (a vector). These numbers are not arbitrary; they encode &lt;em&gt;meaning&lt;/em&gt;. Similar items end up numerically close together in this high-dimensional space.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Example: Two semantically similar sentences map to nearby vectors&lt;/span&gt;
&lt;span class="s2"&gt;&amp;quot;The cat sat on the mat.&amp;quot;&lt;/span&gt;   &lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.12&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.45&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="s2"&gt;&amp;quot;A feline rested on a rug.&amp;quot;&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt;  &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.11&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.43&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.86&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# An unrelated sentence is far away&lt;/span&gt;
&lt;span class="s2"&gt;&amp;quot;Quarterly earnings rose 12%.&amp;quot;&lt;/span&gt; &lt;span class="err"&gt;→&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mf"&gt;0.89&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;0.21&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mf"&gt;0.34&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Embeddings are generated by &lt;strong&gt;embedding models&lt;/strong&gt; — neural networks trained to understand context and semantics. Popular ones include:&lt;/p&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Provider&lt;/th&gt;
&lt;th&gt;Dimensions&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;text-embedding-3-large&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;OpenAI&lt;/td&gt;
&lt;td&gt;3,072&lt;/td&gt;
&lt;td&gt;General text&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;embed-english-v3.0&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Cohere&lt;/td&gt;
&lt;td&gt;1,024&lt;/td&gt;
&lt;td&gt;Search &amp;amp; classification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;all-MiniLM-L6-v2&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;HuggingFace&lt;/td&gt;
&lt;td&gt;384&lt;/td&gt;
&lt;td&gt;Fast, lightweight&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;nomic-embed-text&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Nomic AI&lt;/td&gt;
&lt;td&gt;768&lt;/td&gt;
&lt;td&gt;Open-source, local use&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2&gt;What Is a Vector Database?&lt;/h2&gt;
&lt;p&gt;A &lt;strong&gt;vector database&lt;/strong&gt; is purpose-built to store, index, and query high-dimensional vectors at scale. Unlike traditional databases that match exact values, vector DBs find &lt;em&gt;approximate nearest neighbors (ANN)&lt;/em&gt; — items that are semantically closest to a query.&lt;/p&gt;
&lt;h3&gt;How Similarity Search Works&lt;/h3&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Query&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;affordable electric cars&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;          &lt;/span&gt;↓
&lt;span class="w"&gt;  &lt;/span&gt;[&lt;span class="nv"&gt;Embed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;query&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;vector&lt;/span&gt;]
&lt;span class="w"&gt;          &lt;/span&gt;↓
&lt;span class="w"&gt;  &lt;/span&gt;[&lt;span class="nv"&gt;Search&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;vector&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;DB&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;nearest&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;neighbors&lt;/span&gt;]
&lt;span class="w"&gt;          &lt;/span&gt;↓
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nv"&gt;Returns&lt;/span&gt;:&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;best budget EVs 2024&amp;quot;&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;Tesla Model 3 cost breakdown&amp;quot;&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;...
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The core operation is &lt;strong&gt;cosine similarity&lt;/strong&gt; or &lt;strong&gt;dot product&lt;/strong&gt; — measuring the angle between two vectors to determine how "close" they are in meaning.&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Real-World Use Cases&lt;/h2&gt;
&lt;h3&gt;1. Semantic Search&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Traditional keyword search fails when users don't use the exact right words.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt; Embed both documents and queries. When a user searches, find the documents whose embeddings are closest to the query's embedding.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real Example — Notion AI Search:&lt;/strong&gt;&lt;br&gt;
Notion uses embeddings so when you search "meeting notes from last week about marketing," it finds the right page even if it's titled "Sync — Brand Strategy 03/10" with no exact keyword match.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;pinecone&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;

&lt;span class="n"&gt;pc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Pinecone&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;YOUR_KEY&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pc&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Index&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;docs-index&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;semantic_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Embed the query&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;text-embedding-3-small&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;query_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt;

    &lt;span class="c1"&gt;# Search the vector DB&lt;/span&gt;
    &lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;query_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;include_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Companies using this:&lt;/strong&gt; Notion, Elastic, Algolia, Confluence, GitHub Copilot&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;2. Retrieval-Augmented Generation (RAG)&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; LLMs have a knowledge cutoff and can't access your private data. Fine-tuning is expensive and slow.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt; Store your documents as embeddings. At query time, retrieve the most relevant chunks and inject them into the LLM's prompt as context.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;User&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;asks&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;&amp;quot;What is our refund policy for enterprise clients?&amp;quot;&lt;/span&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Embed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;question&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Search&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;DB&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;→&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Retrieve&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;top&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;relevant&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;
&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Inject&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;chunks&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;into&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="w"&gt;     &lt;/span&gt;&lt;span class="err"&gt;↓&lt;/span&gt;
&lt;span class="n"&gt;LLM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;answers&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;grounded&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;your&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;actual&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Real Example — Cursor (AI Code Editor):&lt;/strong&gt;&lt;br&gt;
Cursor indexes your entire codebase. When you ask "how does auth work in this project?", it retrieves relevant files and functions using embeddings, then feeds them to the LLM — giving context-aware answers without hallucination.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Architecture overview:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[Your Documents]&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="na"&gt;↓&lt;/span&gt;
&lt;span class="k"&gt;[Chunking + Embedding]&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="na"&gt;↓&lt;/span&gt;
&lt;span class="k"&gt;[Vector DB (Pinecone / Weaviate / Chroma)]&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="na"&gt;↓ (retrieval at query time)&lt;/span&gt;
&lt;span class="k"&gt;[LLM (GPT-4, Claude, etc.)] → [Final Answer]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Companies using this:&lt;/strong&gt; Cursor, GitHub Copilot, Intercom Fin, Notion AI, Perplexity&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;3. Recommendation Systems&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Collaborative filtering ("users like you also liked...") fails for new users and new items (cold-start problem). It also can't understand item &lt;em&gt;content&lt;/em&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt; Embed items (products, movies, articles) based on their descriptions and attributes. Recommend items closest in the embedding space to what a user has interacted with.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real Example — Spotify:&lt;/strong&gt;&lt;br&gt;
Spotify's recommendation engine embeds songs using audio features and playlist context. "Discover Weekly" works by finding songs whose vectors are close to your listening history in this embedding space.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified product recommendation&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_recommendations&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Fetch the product&amp;#39;s stored embedding&lt;/span&gt;
    &lt;span class="n"&gt;product_vector&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;vectors&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;values&lt;/span&gt;

    &lt;span class="c1"&gt;# Find similar products&lt;/span&gt;
    &lt;span class="n"&gt;similar&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;product_vector&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;top_k&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# +1 to exclude the product itself&lt;/span&gt;
        &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;in_stock&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;True&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;similar&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;matches&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;product_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Companies using this:&lt;/strong&gt; Spotify, Netflix, Amazon, Pinterest, Etsy&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;4. Anomaly Detection &amp;amp; Fraud Prevention&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Fraud patterns evolve constantly. Rule-based systems become outdated quickly.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt; Embed user behavior sequences (transactions, clicks, login patterns). Flag transactions whose vectors are &lt;em&gt;far&lt;/em&gt; from a user's historical behavior cluster.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real Example — Stripe Radar:&lt;/strong&gt;&lt;br&gt;
Stripe embeds transaction patterns and detects anomalies by identifying transactions whose vector representations are statistical outliers compared to the merchant's and user's typical behavior.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="c1"&gt;# Flag anomalous transactions&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;is_suspicious&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transaction_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_history_embeddings&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.7&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;similarities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;cosine_similarity&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transaction_embedding&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hist_emb&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;hist_emb&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;user_history_embeddings&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;avg_similarity&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;similarities&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;avg_similarity&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;threshold&lt;/span&gt;  &lt;span class="c1"&gt;# Low similarity = suspicious&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Companies using this:&lt;/strong&gt; Stripe, PayPal, Mastercard, Visa, Cloudflare&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;5. Multimodal Search&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Users want to search with images, not just text. Or find visually similar products.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt; Use multimodal embedding models (like CLIP) that map text and images into the &lt;em&gt;same&lt;/em&gt; vector space. A text query can retrieve images, and an image query can retrieve text.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real Example — Pinterest Visual Search:&lt;/strong&gt;&lt;br&gt;
When you tap a section of a Pinterest image to search for similar items, they're using multimodal embeddings to find visually similar content across billions of pins.&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;CLIPProcessor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;CLIPModel&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;torch&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CLIPModel&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;openai/clip-vit-base-patch32&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;processor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;CLIPProcessor&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;from_pretrained&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;openai/clip-vit-base-patch32&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Text-to-image search&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_to_image_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;processor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;text_query&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;return_tensors&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;pt&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;text_embedding&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_text_features&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="c1"&gt;# Search image embeddings in your vector DB&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vector&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text_embedding&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;tolist&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Companies using this:&lt;/strong&gt; Pinterest, Google Lens, Shopify, IKEA, Zalando&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;6. Customer Support Automation&lt;/h3&gt;
&lt;p&gt;&lt;strong&gt;The Problem:&lt;/strong&gt; Support tickets are repetitive. Teams waste time re-answering the same questions. Knowledge bases are hard to search.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The Solution:&lt;/strong&gt; Embed your entire knowledge base and past resolved tickets. Automatically surface the most relevant article or resolution for each new ticket.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Real Example — Intercom Fin:&lt;/strong&gt;&lt;br&gt;
Intercom's AI agent uses embeddings to match incoming customer questions against a company's entire knowledge base. It handles ~70% of tickets autonomously by finding semantically relevant answers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Ticket routing pipeline:&lt;/strong&gt;&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="k"&gt;[New Support Ticket]&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;↓&lt;/span&gt;
&lt;span class="k"&gt;[Embed ticket content]&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;↓&lt;/span&gt;
&lt;span class="k"&gt;[Query vector DB of past tickets + KB articles]&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;&lt;span class="na"&gt;↓&lt;/span&gt;
&lt;span class="na"&gt;[High similarity match] → Auto-resolve with suggested answer&lt;/span&gt;
&lt;span class="na"&gt;[Medium similarity]     → Route to correct team with context&lt;/span&gt;
&lt;span class="na"&gt;[Low similarity]        → Escalate as novel issue&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Companies using this:&lt;/strong&gt; Intercom, Zendesk, Freshdesk, Linear, Atlassian&lt;/p&gt;
&lt;hr&gt;
&lt;h2&gt;Popular Vector Databases at a Glance&lt;/h2&gt;
&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Database&lt;/th&gt;
&lt;th&gt;Best For&lt;/th&gt;
&lt;th&gt;Hosting&lt;/th&gt;
&lt;th&gt;Open Source&lt;/th&gt;
&lt;th&gt;Notable Feature&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pinecone&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Production at scale&lt;/td&gt;
&lt;td&gt;Managed cloud&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Serverless, zero-ops&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weaviate&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Hybrid search&lt;/td&gt;
&lt;td&gt;Cloud + self-hosted&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Built-in BM25 + vector&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Qdrant&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;High performance&lt;/td&gt;
&lt;td&gt;Cloud + self-hosted&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Rust-based, fast filtering&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Chroma&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Local dev &amp;amp; prototyping&lt;/td&gt;
&lt;td&gt;Embedded/self-hosted&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Simplest to get started&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;pgvector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Already using Postgres&lt;/td&gt;
&lt;td&gt;Self-hosted&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;No new infra needed&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Milvus&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Large-scale enterprise&lt;/td&gt;
&lt;td&gt;Cloud + self-hosted&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Handles billions of vectors&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;
&lt;hr&gt;
&lt;h2&gt;Quick Start: Building a Semantic Search App&lt;/h2&gt;
&lt;p&gt;Here's a minimal working example using &lt;strong&gt;Chroma&lt;/strong&gt; (no signup needed) and &lt;strong&gt;OpenAI embeddings&lt;/strong&gt;:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;pip&lt;span class="w"&gt; &lt;/span&gt;install&lt;span class="w"&gt; &lt;/span&gt;chromadb&lt;span class="w"&gt; &lt;/span&gt;openai
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="nn"&gt;chromadb&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="nn"&gt;openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;

&lt;span class="n"&gt;openai_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;OpenAI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;YOUR_OPENAI_KEY&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;chroma_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chroma_client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;my_docs&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Add documents&lt;/span&gt;
&lt;span class="n"&gt;documents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Our return policy allows returns within 30 days of purchase.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;We offer free shipping on orders over $50.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Customer support is available 24/7 via chat and email.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="s2"&gt;&amp;quot;Enterprise plans include dedicated account management.&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;openai_client&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;text-embedding-3-small&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;texts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;embedding&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;doc_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nb"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="p"&gt;))]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Step 2: Query&lt;/span&gt;
&lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;&amp;quot;How do I send something back?&amp;quot;&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query_embeddings&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;embed&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;]),&lt;/span&gt;
    &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nb"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;&amp;quot;documents&amp;quot;&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# → [&amp;#39;Our return policy allows returns within 30 days of purchase.&amp;#39;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;Choosing the Right Tool&lt;/h2&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="nv"&gt;Are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;prototyping&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;building&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;locally&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Chroma&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;pgvector&lt;/span&gt;

&lt;span class="nv"&gt;Are&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;already&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;using&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Postgres&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;pgvector&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;zero&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;new&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;infra&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;Do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;need&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;hybrid&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;search&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="ss"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;keyword&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;semantic&lt;/span&gt;&lt;span class="ss"&gt;)&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Weaviate&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;or&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Elasticsearch&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;vectors&lt;/span&gt;

&lt;span class="k"&gt;Do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;need&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;maximum&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;performance&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;with&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;complex&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;filters&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Qdrant&lt;/span&gt;

&lt;span class="k"&gt;Do&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;want&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;fully&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;managed&lt;/span&gt;,&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;zero&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;ops&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;production&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Pinecone&lt;/span&gt;

&lt;span class="nv"&gt;Handling&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;billions&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;of&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;vectors&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;at&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;enterprise&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;scale&lt;/span&gt;?
&lt;span class="w"&gt;  &lt;/span&gt;└─&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Yes&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;→&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;Milvus&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;hr&gt;
&lt;h2&gt;What's Next?&lt;/h2&gt;
&lt;p&gt;The vector database space is evolving fast:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Multimodal embeddings&lt;/strong&gt; — unified search across text, image, audio, and video&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sparse + dense hybrid search&lt;/strong&gt; — combining keyword precision with semantic understanding&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Streaming vector updates&lt;/strong&gt; — real-time embedding pipelines for live data&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;On-device embeddings&lt;/strong&gt; — privacy-preserving local search on mobile/edge devices&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Graph + vector hybrid stores&lt;/strong&gt; — combining relationship graphs with semantic similarity&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;h2&gt;Resources&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;a href="https://platform.openai.com/docs/guides/embeddings"&gt;OpenAI Embeddings Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.pinecone.io/learn/"&gt;Pinecone Learning Center&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://weaviate.io/developers/weaviate"&gt;Weaviate Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.trychroma.com/getting-started"&gt;Chroma Getting Started&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://qdrant.tech/documentation/"&gt;Qdrant Documentation&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/pgvector/pgvector"&gt;pgvector GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/beir-cellar/beir"&gt;BEIR Benchmark — Evaluate embedding models&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;hr&gt;
&lt;p&gt;&lt;em&gt;Found this useful? ⭐ Star the repo and share it with your team.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;em&gt;Have a use case I missed? Open an issue or submit a PR.&lt;/em&gt;&lt;/p&gt;</content><category term="GenAI"/><category term="vector-database"/><category term="embeddings"/><category term="rag"/><category term="semantic-search"/><category term="llm"/><category term="pinecone"/><category term="chroma"/><category term="weaviate"/><category term="ai"/></entry><entry><title>Agentic System Design Concepts - Patterns Every AI Engineer Should Know</title><link href="https://vinayakvitthal.github.io/agentic-system-design-concepts-patterns-every-ai-engineer-should-know.html" rel="alternate"/><published>2026-04-11T00:00:00+05:30</published><updated>2026-04-11T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-11:/agentic-system-design-concepts-patterns-every-ai-engineer-should-know.html</id><summary type="html">&lt;p&gt;Building reliable AI agents isn't just about picking the right model — it's about the patterns you wire around it. Here's a concise reference of 15 agentic system design concepts worth knowing. Two lines each — just enough to understand what they do and why they matter.&lt;/p&gt;
&lt;h2&gt;Resilience &amp;amp; Failure Isolation&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Agent Circuit …&lt;/strong&gt;&lt;/p&gt;</summary><content type="html">&lt;p&gt;Building reliable AI agents isn't just about picking the right model — it's about the patterns you wire around it. Here's a concise reference of 15 agentic system design concepts worth knowing. Two lines each — just enough to understand what they do and why they matter.&lt;/p&gt;
&lt;h2&gt;Resilience &amp;amp; Failure Isolation&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Agent Circuit Breaker&lt;/strong&gt; — Prevents cascading failures by halting agent execution when downstream services or tools are repeatedly failing. Borrowed from distributed systems engineering, it stops a single broken tool from dragging the entire agent pipeline down.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Blast Radius Limiter&lt;/strong&gt; — Restricts the impact of an agent failure to a defined scope so it can't propagate across the system. Think of it as a blast door: when something goes wrong, the damage stays local.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Dead Letter Queue for Agents&lt;/strong&gt; — A holding area where failed or unprocessable agent tasks are parked for later inspection instead of silently dropped. It gives you a recoverable audit trail when tasks fall through the cracks at runtime.&lt;/p&gt;
&lt;h2&gt;Control Flow &amp;amp; Decision Quality&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Orchestrator vs Choreography&lt;/strong&gt; — Defines whether agent interactions are centrally directed (orchestrator controls all moves) or emergent (agents react to events and coordinate peer-to-peer). The choice shapes coupling, debuggability, and how gracefully the system degrades.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Confidence Threshold Gate&lt;/strong&gt; — Ensures an agent only takes action when its internal confidence in a decision clears a defined threshold. A simple but powerful reliability lever: low-confidence branches pause for human review rather than guessing forward.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Replanning Loop&lt;/strong&gt; — Allows agents to re-evaluate their plan mid-execution when context changes or a step fails, rather than continuing blindly on a stale plan. Essential for long-horizon tasks where the environment isn't static.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Human Escalation Protocol&lt;/strong&gt; — Provides a structured mechanism for agents to hand off to a human when they're stuck, uncertain, or handling high-stakes decisions. It's not a failure mode — it's a designed off-ramp.&lt;/p&gt;
&lt;h2&gt;Tool Invocation Reliability&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Idempotent Tool Calls&lt;/strong&gt; — Ensures that a tool can be called multiple times with the same inputs without producing unintended side effects. Critical in agentic pipelines where retries happen frequently due to timeouts or partial failures.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tool Invocation Timeout&lt;/strong&gt; — Prevents agents from blocking indefinitely on a tool that is slow or unresponsive, forcing a graceful fallback or retry. Without this, a single flaky API can freeze an entire agent run.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Context Window Checkpointing&lt;/strong&gt; — Periodically saves the agent's progress so it can resume from a known-good state rather than restarting from scratch after a context overflow or crash. Especially important for long-running, multi-step tasks.&lt;/p&gt;
&lt;h2&gt;Infrastructure &amp;amp; Routing&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;LLM Gateway Pattern&lt;/strong&gt; — A single abstraction layer that manages all LLM API calls, handling routing, rate limiting, retries, and observability in one place. It decouples agent logic from model-specific SDKs, making provider swaps painless.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Semantic Caching&lt;/strong&gt; — Stores LLM responses keyed on semantic meaning rather than exact input strings, so similar queries hit the cache even when phrased differently. Reduces latency and cost without sacrificing answer quality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Multi-Agent State Sync&lt;/strong&gt; — Maintains a consistent shared state across multiple agents working in parallel or in sequence. Without it, agents operating on stale or divergent state produce contradictory or redundant outputs.&lt;/p&gt;
&lt;h2&gt;Observability &amp;amp; Deployment&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Agentic Observability Tracing&lt;/strong&gt; — Tracks every decision, tool call, handoff, and LLM interaction across an agent run, producing a full execution trace for debugging and performance analysis. The difference between guessing why something failed and knowing.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Canary Agent Deployment&lt;/strong&gt; — Rolls out a new agent version to a small slice of production traffic before full release, allowing you to compare behavior and catch regressions with limited blast radius. Applies standard software deployment discipline to the agent layer.&lt;/p&gt;</content><category term="GenAI"/><category term="GenAI"/><category term="AI-agents"/><category term="LLM"/><category term="agentic-systems"/><category term="design-patterns"/><category term="reliability"/></entry><entry><title>Every Claude Code Concept You Need to Know</title><link href="https://vinayakvitthal.github.io/every-claude-code-concept-you-need-to-know.html" rel="alternate"/><published>2026-04-11T00:00:00+05:30</published><updated>2026-04-11T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-11:/every-claude-code-concept-you-need-to-know.html</id><summary type="html">&lt;p&gt;Claude Code is not a chatbot. It lives in your terminal, reads your actual files, writes code, runs commands, and executes multi-step workflows — all with your permission. Here are 30 concepts you need to understand it properly. No fluff, no hand-holding.&lt;/p&gt;
&lt;h2&gt;The 30 Concepts&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;1. The Terminal&lt;/strong&gt; — Claude Code doesn't …&lt;/p&gt;</summary><content type="html">&lt;p&gt;Claude Code is not a chatbot. It lives in your terminal, reads your actual files, writes code, runs commands, and executes multi-step workflows — all with your permission. Here are 30 concepts you need to understand it properly. No fluff, no hand-holding.&lt;/p&gt;
&lt;h2&gt;The 30 Concepts&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;1. The Terminal&lt;/strong&gt; — Claude Code doesn't run in a browser. It runs in the terminal, the same text-based interface developers use daily. If you've never opened a terminal before, that's your first homework assignment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;2. Installation + Pricing&lt;/strong&gt; — Install with a single command via npm. Pricing is token-based through your Anthropic account. There's no flat monthly fee tied to a UI — you pay for what you use, which means costs scale with how hard you push it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;3. File Access&lt;/strong&gt; — Claude Code reads and edits files directly on your machine, with your permission. Not "paste your doc into a chat window." It opens the actual file, modifies it in-place, and saves it. This is the concept that makes it useful.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;4. Image + PDF Reading&lt;/strong&gt; — Claude Code can ingest images and PDFs as inputs. Point it at a PDF proposal or a screenshot and it processes the content directly — no manual copy-paste required.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;5. Tool Use&lt;/strong&gt; — Claude Code has built-in tools: file reading, file writing, shell execution, and more. These are the primitives it uses to act on your computer. You see each tool call as it happens in real time.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;6. Prompting Techniques&lt;/strong&gt; — Vague prompts produce garbage results. "Help me with my marketing" is useless. "Write a 3-email welcome sequence for my dog walking business targeting first-time pet owners, 150 words each" is not. Specificity is the skill.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;7. CLAUDE.md&lt;/strong&gt; — A markdown file you create in your project directory that tells Claude Code the rules, context, and conventions for that project. Think of it as a standing system prompt that persists across sessions. Every serious Claude Code user has one.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;8. Plan Mode&lt;/strong&gt; — Before Claude Code executes anything, you can ask it to plan first. It outputs what it intends to do, step by step, and waits for your approval. Run in plan mode for anything non-trivial. Review before you let it touch anything.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;9. Context Window&lt;/strong&gt; — The amount of text Claude can "hold in mind" at once during a session. Long conversations, large files, and extensive histories eat into it. When context fills up, older information gets dropped. This affects result quality.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;10. Tokens + Costs&lt;/strong&gt; — Everything processed by Claude Code — your prompts, the files it reads, its responses — is measured in tokens. Tokens drive cost. Reading a 50-page PDF burns tokens. Keep context lean and targeted to control spend.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;11. Model Selection&lt;/strong&gt; — You can choose which Claude model backs your session. Faster, cheaper models work for routine tasks. Heavier models are worth it for complex reasoning or production-grade code. Pick the right tool for the job.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;12. /compact&lt;/strong&gt; — A slash command that compresses your current conversation history into a shorter summary, freeing up context window space without wiping the session. Use it mid-task when context gets bloated.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;13. /clear&lt;/strong&gt; — Wipes the entire conversation and starts fresh. Every new task should start with a clean context. Don't carry leftover noise from a previous task into the next one. Use this more than you think you need to.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;14. Session Management&lt;/strong&gt; — Claude Code has no persistent memory between sessions by default. Start each session with your CLAUDE.md re-read to restore project context. Design your workflow around this statelessness rather than fighting it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;15. Permission Modes&lt;/strong&gt; — By default, Claude Code asks for approval before running any shell command. This gets tedious fast. You can pre-approve safe, non-destructive commands (ls, cat, grep, mkdir, git status) in your settings.local.json. Destructive operations should always require explicit confirmation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;16. Effort Levels&lt;/strong&gt; — You can signal how much effort you want Claude to apply. Quick answers for exploration, thorough analysis for production decisions. Matching effort level to task type saves time and tokens.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;17. Interrupt + Redirect&lt;/strong&gt; — While Claude Code is running a task, you can interrupt it mid-execution and redirect it. If it starts going down the wrong path, stop it early. Don't let it burn tokens on a wrong approach when you can see it happening.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;18. Visual Studio Code&lt;/strong&gt; — Claude Code integrates directly with VS Code. You can run it inside the VS Code terminal and see file changes reflected in your editor in real time. If you're not a terminal-native developer, this is the recommended setup.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;19. Memory&lt;/strong&gt; — Claude Code supports memory files that persist across sessions. Unlike CLAUDE.md (project-specific), memory files can store user-level preferences and context. Useful for encoding your personal conventions once and never repeating them.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;20. Project vs Global&lt;/strong&gt; — Configuration can be scoped at the project level (CLAUDE.md, settings.local.json) or at the global level (applies to all Claude Code sessions on your machine). Know which scope a setting lives in before you modify it.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;21. Slash Commands&lt;/strong&gt; — Built-in commands prefixed with &lt;code&gt;/&lt;/code&gt; that control Claude Code's behavior: /clear, /compact, /help, and more. You can also define custom slash commands (skills) that map to your own workflows.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;22. Skills&lt;/strong&gt; — Custom slash commands you define once and reuse indefinitely. A skill is a markdown file that describes a reusable workflow. You build it once, invoke it with &lt;code&gt;/skill-name&lt;/code&gt;, and Claude follows the instructions every time. Hundreds of community-built skills already exist on GitHub in repos like &lt;code&gt;anthropics/skills&lt;/code&gt; and &lt;code&gt;hesreallyhim/awesome-claude-code&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;23. Hooks&lt;/strong&gt; — Scripts that run automatically before or after Claude Code actions. Quality gate hooks, for example, can intercept Claude's output before it's committed and check it against defined standards. Hooks are how you enforce consistency without relying on Claude to self-police.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;24. Web Browsing&lt;/strong&gt; — Claude Code can browse the web when given the appropriate tool access. It can fetch pages, read documentation, and pull in live information as part of a task — not just work from static local files.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;25. MCP Servers&lt;/strong&gt; — Model Context Protocol servers extend Claude Code's tool access to external services: Airtable, Google Drive, Slack, GitHub, and more. Tools handle what Claude does on your computer. MCP extends that to the internet and third-party APIs. This is the integration layer.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;26. Perplexity MCP&lt;/strong&gt; — A specific MCP integration that gives Claude Code access to Perplexity's search capabilities. Useful when a task requires real-time research as part of a larger automated workflow.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;27. Subagents&lt;/strong&gt; — Multiple Claude Code instances running simultaneously, each handling a distinct subtask. Instead of processing platforms one at a time, you spin up parallel agents and run them concurrently. Subagents are how you turn Claude Code from a sequential tool into a parallel workflow engine.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;28. Remote Control&lt;/strong&gt; — Claude Code can be configured for remote access, meaning you can trigger and manage sessions from another machine or interface. Relevant for server automation and scheduled background tasks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;29. Scheduled Tasks&lt;/strong&gt; — Claude Code workflows can be scheduled to run automatically at defined intervals. Combine this with skills and hooks and you have a self-operating workflow system that runs without manual invocation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;30. Git Version Control&lt;/strong&gt; — Claude Code integrates with git. Every change it makes can be committed, branched, and rolled back through standard git workflows. This is your undo button. Always have Claude Code working inside a git-tracked project. Before: changes happen and you hope nothing breaks. After: every change is versioned, documented, and reversible.&lt;/p&gt;
&lt;h2&gt;The One Rule That Matters&lt;/h2&gt;
&lt;p&gt;Master five concepts before you touch the next five. The shiny object trap — jumping from MCP to subagents to hooks before understanding CLAUDE.md and context windows — is the single biggest waste of time. The gap between people getting real results and people falling behind is not talent. It is reps. Start with file access, prompting, CLAUDE.md, plan mode, and /clear. Everything else builds on those five.&lt;/p&gt;</content><category term="GenAI"/><category term="GenAI"/><category term="Claude-Code"/><category term="LLM"/><category term="agents"/><category term="developer-tools"/><category term="local-AI"/></entry><entry><title>Missing ZIP Option in Windows Right-Click Menu — Here's How to Fix It</title><link href="https://vinayakvitthal.github.io/missing-zip-option-windows-right-click-menu.html" rel="alternate"/><published>2026-04-11T00:00:00+05:30</published><updated>2026-04-11T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-11:/missing-zip-option-windows-right-click-menu.html</id><summary type="html">&lt;p&gt;The classic "Send to → Compressed (zipped) folder" option sometimes disappears from the Windows right-click context menu. Here's what causes it and how to get it back in under two minutes.&lt;/p&gt;
&lt;h2&gt;What Happened&lt;/h2&gt;
&lt;p&gt;Windows ships with a built-in ZIP shell extension handled by &lt;code&gt;zipfldr.dll&lt;/code&gt;. When third-party tools like Git, VLC …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The classic "Send to → Compressed (zipped) folder" option sometimes disappears from the Windows right-click context menu. Here's what causes it and how to get it back in under two minutes.&lt;/p&gt;
&lt;h2&gt;What Happened&lt;/h2&gt;
&lt;p&gt;Windows ships with a built-in ZIP shell extension handled by &lt;code&gt;zipfldr.dll&lt;/code&gt;. When third-party tools like Git, VLC, or OneDrive add their own context menu entries, they can displace or corrupt the ZIP handler registration — leaving you with a bloated menu but no ZIP option.&lt;/p&gt;
&lt;h2&gt;Fix 1 — Check the Send to Submenu&lt;/h2&gt;
&lt;p&gt;Before anything else, right-click your folder or file and hover over &lt;strong&gt;Send to →&lt;/strong&gt;. The "Compressed (zipped) folder" option is sometimes hiding in the submenu even when it's not visible at the top level.&lt;/p&gt;
&lt;h2&gt;Fix 2 — Re-register the ZIP Shell Extension&lt;/h2&gt;
&lt;p&gt;Open Command Prompt as Administrator and run:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;span class="n"&gt;regsvr32&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;zipfldr&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;dll&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This re-registers the native ZIP handler with Windows Shell. Restart Explorer or reboot after running it.&lt;/p&gt;
&lt;h2&gt;Fix 3 — Restart Windows Explorer&lt;/h2&gt;
&lt;p&gt;Sometimes a stale shell session is all that's causing the issue. Run this in CMD:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;taskkill /f /im explorer.exe
start explorer.exe
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;h2&gt;Fix 4 — Verify the Registry Key&lt;/h2&gt;
&lt;p&gt;Press &lt;code&gt;Win + R&lt;/code&gt;, type &lt;code&gt;regedit&lt;/code&gt;, and navigate to:&lt;/p&gt;
&lt;div class="highlight"&gt;&lt;pre&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;HKEY_CLASSES_ROOT\CompressedFolder
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;If this key is missing or corrupted, the ZIP option will not appear anywhere in the context menu. You may need to restore it from another machine or via a &lt;code&gt;.reg&lt;/code&gt; export.&lt;/p&gt;
&lt;h2&gt;Root Cause&lt;/h2&gt;
&lt;p&gt;Heavy context menu contributors — Git Bash, Git GUI, VLC, SkyDrive Pro — are visible in the screenshot. Any one of them can push a bad shell extension that breaks ZIP registration as a side effect. Fix 2 resolves this in most cases.&lt;/p&gt;</content><category term="Windows"/><category term="Windows"/><category term="tips"/><category term="context-menu"/><category term="troubleshooting"/><category term="productivity"/></entry><entry><title>AI Agent Directory - Few Shots LLM Models</title><link href="https://vinayakvitthal.github.io/ai-agent-directory-few-shots-llm-models.html" rel="alternate"/><published>2026-04-10T00:00:00+05:30</published><updated>2026-04-10T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-04-10:/ai-agent-directory-few-shots-llm-models.html</id><summary type="html">&lt;p&gt;The AI agent ecosystem is growing fast. Here's a quick directory of notable AI startups and a couple of few-shot LLM models worth knowing about. Two lines each — just enough to know what they do and why they matter.&lt;/p&gt;
&lt;h2&gt;AI Agent Directory&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Can of Soup&lt;/strong&gt; — An AI-powered app that lets …&lt;/p&gt;</summary><content type="html">&lt;p&gt;The AI agent ecosystem is growing fast. Here's a quick directory of notable AI startups and a couple of few-shot LLM models worth knowing about. Two lines each — just enough to know what they do and why they matter.&lt;/p&gt;
&lt;h2&gt;AI Agent Directory&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Can of Soup&lt;/strong&gt; — An AI-powered app that lets you create fictional photos of you and your friends in imaginary scenarios. Built during Y Combinator, it uses generative AI to place people into any meme, outfit, or movie scene.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Deepgram&lt;/strong&gt; — A foundational voice AI platform offering speech-to-text, text-to-speech, and voice agent APIs. Their Nova models deliver high accuracy and low latency, supporting 30+ languages for real-time transcription.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Diffuse Bio&lt;/strong&gt; — Building generative AI for protein design, using diffusion models to engineer new proteins with control and accuracy. Their foundation model DSG-1 can generate 3D protein structures and design binders from user prompts.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Draftaid&lt;/strong&gt; — An AI-powered CAD tool that converts 3D models into precise 2D manufacturing drawings automatically. It reduces manual drafting time by up to 90%, acting like a copilot for mechanical engineers.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Edgetrace&lt;/strong&gt; — A YC-backed AI video analytics platform that lets users search camera networks using natural language. Primarily used by law enforcement and transportation for real-time threat detection and suspect identification.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;EzDubz&lt;/strong&gt; — A real-time AI dubbing tool that translates videos, livestreams, and phone calls while preserving the original speaker's voice. Their proprietary models clone voices on the fly and even replicate emotions across 20+ languages.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Exa&lt;/strong&gt; — An AI-powered search engine and API built for developers and AI agents. Unlike traditional keyword search, Exa uses neural embeddings for semantic understanding, powering tools like Cursor and Lovable.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Guide Labs&lt;/strong&gt; — Building interpretable AI foundation models that can explain their reasoning and are easy to audit. Their open-source Steerling-8B is an 8-billion-parameter LLM designed for transparency and debuggability.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Infinity AI&lt;/strong&gt; — Now known as Lemon Slice, they build a video foundation model for human motion and emotion. Their tech generates expressive, talking characters across styles from photorealistic to cartoon.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;K-Scale&lt;/strong&gt; — Building open-source humanoid robots for developers, with models starting at $999. Their integrated software, hardware, and ML stack lets developers focus on building applications for embodied AI.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sevn&lt;/strong&gt; — A generative design startup using AI to automate and optimize the creative design process. Users define parameters and constraints, and Sevn generates a range of design options to explore.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Linux Inc&lt;/strong&gt; — An AI startup focused on bringing intelligent tooling to the Linux ecosystem. They aim to simplify Linux administration and development workflows through AI-powered automation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Metalware&lt;/strong&gt; — A copilot for firmware engineers that automates low-level programming for embedded systems. Their binary analysis tool fuzzes ARM-based software to detect defects earlier in the development lifecycle.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Naiver AI&lt;/strong&gt; — Navier AI provides a web-based platform for running CFD (computational fluid dynamics) simulations at scale. Their AI agents handle geometry cleanup, meshing, solver configuration, and cloud resource management autonomously.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Osium AI&lt;/strong&gt; — An AI-powered platform that accelerates materials and chemicals R&amp;amp;D for industry leaders. Their software helps engineers design new materials faster, spanning alloys, polymers, textiles, and bio-based materials.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Phind&lt;/strong&gt; — An AI search engine purpose-built for developers that generates direct, code-inclusive answers to technical questions. It combines real-time web search with specialized models trained on programming languages and frameworks.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Piramidal&lt;/strong&gt; — Building a foundation model for the brain, trained on a massive corpus of EEG brainwave data. Their AI interprets neural signals for neurological diagnostics, already being deployed in ICU settings.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Playground&lt;/strong&gt; — A browser-based AI image generation and design platform used by over 9 million users. It combines text-to-image generation with a full graphic design suite for logos, social media posts, and more.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;PlayHT&lt;/strong&gt; — An AI voice generation platform that offered ultra-realistic text-to-speech with 900+ voices in 142 languages. Known for voice cloning and custom voice creation through deep learning algorithms.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Sonauto&lt;/strong&gt; — An AI music editor that turns prompts, lyrics, or melodies into full songs in any style. It supports thousands of styles with full-length songs up to 4.5 minutes, complete with vocals and instrumentation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Tavus&lt;/strong&gt; — An AI video personalization platform that creates hyper-personalized videos at scale from a single recording. It uses deep learning for voice synthesis and face cloning to generate thousands of unique video variations.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;YonduAI&lt;/strong&gt; — Building the robotic workforce of the future, starting with logistics automation in warehouses. They deploy humanoid robots with remote teleoperation that gradually transitions to full AI-driven automation.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Yoneda Labs&lt;/strong&gt; — Building a foundation model for chemical reactions to help chemists optimize drug discovery. Their AI defines parameters like temperature, concentration, and catalyst to make synthesis faster and cheaper.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SyncLabs&lt;/strong&gt; — An AI lip-sync video generator that creates perfectly synchronized mouth movements from any audio track. Their zero-shot model handles any face in any video context without prior training on specific individuals.&lt;/p&gt;
&lt;h2&gt;Few-Shot LLM Models&lt;/h2&gt;
&lt;p&gt;&lt;strong&gt;Llama 3.1&lt;/strong&gt; — Meta's open-source large language model available in 8B, 70B, and 405B parameter sizes. It supports 128K context length and multilingual capabilities, making it one of the most versatile open-weight models for fine-tuning and deployment.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Mixtral&lt;/strong&gt; — Mistral AI's open-source mixture-of-experts (MoE) model that activates only a subset of parameters per token for efficient inference. It delivers performance comparable to much larger dense models while being significantly faster and more cost-effective to run.&lt;/p&gt;</content><category term="GenAI"/><category term="GenAI"/><category term="AI-agents"/><category term="LLM"/><category term="startups"/><category term="directory"/></entry><entry><title>My GenAI Blogs</title><link href="https://vinayakvitthal.github.io/my-genai-blogs.html" rel="alternate"/><published>2026-01-10T00:00:00+05:30</published><updated>2026-01-10T00:00:00+05:30</updated><author><name>Vinayak Vitthal Kaddi</name></author><id>tag:vinayakvitthal.github.io,2026-01-10:/my-genai-blogs.html</id><summary type="html">&lt;h2&gt;Why GenAI?&lt;/h2&gt;
&lt;p&gt;Generative AI has completely changed how I think about software, creativity, and problem-solving. Over the past year, I've gone deep into the world of large language models, prompt engineering, retrieval-augmented generation, fine-tuning, and AI agents. The pace of change is incredible, and I wanted a place to document …&lt;/p&gt;</summary><content type="html">&lt;h2&gt;Why GenAI?&lt;/h2&gt;
&lt;p&gt;Generative AI has completely changed how I think about software, creativity, and problem-solving. Over the past year, I've gone deep into the world of large language models, prompt engineering, retrieval-augmented generation, fine-tuning, and AI agents. The pace of change is incredible, and I wanted a place to document what I'm learning as I go.&lt;/p&gt;
&lt;p&gt;This blog is that place. I'll be writing about my hands-on experiences with GenAI, the tools I'm experimenting with, things that worked, things that didn't, and the lessons I've picked up along the way.&lt;/p&gt;
&lt;h2&gt;What I've Been Exploring&lt;/h2&gt;
&lt;p&gt;My GenAI journey started with using ChatGPT and Claude for day-to-day coding tasks. That quickly evolved into deeper exploration:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Prompt engineering&lt;/strong&gt; — learning how to get consistent, high-quality outputs from LLMs by structuring prompts effectively.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt; — building pipelines that ground LLM responses in real data using vector databases and embeddings.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Fine-tuning&lt;/strong&gt; — adapting pre-trained models for specific tasks and domains.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;AI agents&lt;/strong&gt; — creating autonomous workflows where LLMs can use tools, reason through multi-step problems, and take actions.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local models&lt;/strong&gt; — running open-source models like LLaMA and Mistral locally to understand how they work under the hood.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I'm not just reading about these topics. I'm building with them, breaking things, and learning from the results.&lt;/p&gt;
&lt;h2&gt;What to Expect&lt;/h2&gt;
&lt;p&gt;I plan to post at least one article a week covering topics like:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Practical tutorials on building GenAI applications&lt;/li&gt;
&lt;li&gt;Comparisons of different models and frameworks&lt;/li&gt;
&lt;li&gt;Deep dives into concepts like embeddings, tokenization, and attention mechanisms&lt;/li&gt;
&lt;li&gt;Real-world use cases and project walkthroughs&lt;/li&gt;
&lt;li&gt;Opinions on where GenAI is heading and what matters for developers&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Some posts will be short and focused, others will be longer walkthroughs. The goal is to share useful, honest content from a developer's perspective.&lt;/p&gt;
&lt;h2&gt;Let's Go&lt;/h2&gt;
&lt;p&gt;I'm excited to start writing and sharing. GenAI is moving fast, and the best way to keep up is to build, experiment, and document. That's exactly what this blog is for.&lt;/p&gt;</content><category term="Announcement"/><category term="GenAI"/><category term="LLM"/><category term="machine-learning"/><category term="deep-learning"/><category term="announcement"/></entry></feed>