Vinayak

Claude Code Project Structure: Every File and Folder Explained

2026-04-19T00:00:00+05:30

Claude Code Project Structure: Every File and Folder Explained

Most Claude Code projects start with a CLAUDE.md and nothing else. Here's the full structure that turns Claude from a coding assistant into an engineering partner.

Why Structure Matters
The Complete Directory
File-by-File Breakdown
1. CLAUDE.md — The Session Brain
2. CLAUDE.local.md — Your Personal Overrides
3. .mcp.json — External Tool Connections
4. .claude/settings.json — Permissions & Model Control
5. .claude/rules/ — Contextual Coding Standards
6. .claude/commands/ — Repeatable Slash Workflows
7. .claude/skills/ — Context-Aware Capability Packs
8. .claude/agents/ — Specialized Sub-Agents
9. .claude/hooks/ — Automated Guardrails
Putting It All Together
Starter Template
Key Principles

Why Structure Matters

The quality of Claude Code's output is directly proportional to the quality of your project structure.

A raw Claude Code session is powerful. But without structure, every session starts from zero — you re-explain conventions, re-define standards, re-specify what "done" looks like. That cognitive overhead compounds over weeks and across teams.

A well-structured project means: - Claude understands your codebase architecture from session start - Coding standards are enforced automatically, not re-stated in every prompt - Workflows run with a single slash command instead of multi-paragraph instructions - Sub-agents handle specialized tasks without polluting the main context - Hooks catch unsafe operations before they run

Invest in the structure once. Every session benefits.

The Complete Directory

your-project/
├── CLAUDE.md                        # Session context — loaded at start
├── CLAUDE.local.md                  # Personal overrides (gitignored)
├── .mcp.json                        # MCP tool integrations (shared via git)
└── .claude/
    ├── settings.json                # Permissions, model, hooks config
    ├── settings.local.json          # Personal settings overrides (gitignored)
    ├── rules/
    │   ├── code-style.md            # Code formatting & style standards
    │   ├── testing.md               # Testing patterns & requirements
    │   └── api-conventions.md       # API design rules
    ├── commands/
    │   ├── review.md                # /project:review — full code review workflow
    │   └── fix-issue.md             # /project:fix-issue — issue resolution steps
    ├── skills/
    │   └── deploy/
    │       ├── SKILL.md             # Deployment procedures
    │       └── deploy-config.md     # Environment & config details
    ├── agents/
    │   ├── code-reviewer.md         # Dedicated review agent
    │   └── security-auditor.md      # Security-focused analysis agent
    └── hooks/
        └── validate-bash.sh         # Pre-execution bash validation

File-by-File Breakdown

1. CLAUDE.md — The Session Brain

Loaded automatically at every session start.

This is the single most important file in your project. It gives Claude the context it needs to be immediately useful — without you having to re-explain anything.

A well-written CLAUDE.md covers:

# Project: Acme API

## Overview
A REST API for the Acme SaaS platform. Handles auth, billing, and user management.

## Tech Stack
- Runtime: Node.js 20, TypeScript 5.3
- Framework: Fastify
- Database: PostgreSQL 15 + Prisma ORM
- Auth: JWT + refresh tokens
- Testing: Vitest + Supertest
- CI: GitHub Actions

## Architecture
- src/routes/       — Route handlers (thin, delegate to services)
- src/services/     — Business logic
- src/repositories/ — Database layer (Prisma calls only here)
- src/middleware/   — Auth, validation, error handling

## Key Commands
- `npm run dev`     — Start dev server (port 3000)
- `npm run test`    — Run full test suite
- `npm run lint`    — ESLint + Prettier check
- `npm run migrate` — Run pending DB migrations

## Conventions
- All routes must have Zod input validation
- Services never import from other services directly
- Every new endpoint requires an integration test
- No raw SQL — use Prisma query builder

## Current Focus
Refactoring the billing module to support usage-based pricing.

What goes in CLAUDE.md: - Project overview and purpose - Tech stack with specific versions - Directory structure and architectural decisions - Key commands (dev, test, build, deploy) - Non-obvious conventions the model shouldn't have to guess - Current work context / active focus area

What doesn't go in CLAUDE.md: - Personal preferences (use CLAUDE.local.md) - Secrets or credentials (never) - Verbose documentation that belongs in your actual docs

2. CLAUDE.local.md — Your Personal Overrides

Gitignored. Never committed. Just for you.

This file lets individual developers customize Claude's behavior without affecting teammates. It overrides or extends CLAUDE.md for a single person's environment.

# Local Overrides — Sandipan

## Personal Preferences
- I prefer concise responses without extensive explanation unless asked
- When suggesting refactors, show the diff format, not just the new code
- My local DB runs on port 5433 (not the default 5432)

## Dev Environment
- Using Cursor as my editor — optimize suggestions for Cursor workflows
- Node version: 20.11.0 via nvm

## Current Task
Working on the PaymentWebhookHandler — focus suggestions here.

A teammate with different preferences runs the same project with their own CLAUDE.local.md. No conflicts, no git noise.

3. .mcp.json — External Tool Connections

Shared via git. Controls every external tool your agent can reach.

MCP (Model Context Protocol) is how Claude Code connects to external services — GitHub, JIRA, Slack, databases, and more. Your .mcp.json defines those connections in one place.

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    },
    "postgres": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-postgres"],
      "env": {
        "POSTGRES_CONNECTION_STRING": "${DATABASE_URL}"
      }
    },
    "slack": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-slack"],
      "env": {
        "SLACK_BOT_TOKEN": "${SLACK_BOT_TOKEN}"
      }
    }
  }
}

With this in place, Claude can: - Read and create GitHub issues and PRs - Query your database directly - Post updates to Slack channels

Commit .mcp.json to git. Your whole team gets the same integrations. Store secrets in environment variables, never in the file itself.

4. .claude/settings.json — Permissions & Model Control

Controls what Claude is allowed to do. Defaults to safe.

{
  "model": "claude-opus-4-5",
  "permissions": {
    "allow": [
      "bash:npm run *",
      "bash:git status",
      "bash:git diff *",
      "bash:git log *",
      "read:**",
      "write:src/**",
      "write:tests/**"
    ],
    "deny": [
      "bash:git push *",
      "bash:rm -rf *",
      "bash:curl *",
      "write:.env*"
    ]
  },
  "hooks": {
    "preToolUse": ".claude/hooks/validate-bash.sh"
  }
}

Key things to configure: - model — specify which Claude model handles this project - allow — explicit list of permitted tool uses - deny — hard blocks (push to git, delete files, write to env) - hooks — which scripts run before/after tool use

settings.local.json (gitignored) lets individual devs override their own model preference or permissions without touching the shared config.

5. .claude/rules/ — Contextual Coding Standards

Modular. Targeted. Loaded only when relevant.

Instead of dumping all your standards into CLAUDE.md, rules files let you organize conventions by topic and have Claude load them contextually — code-style.md when writing code, testing.md when generating tests.

.claude/rules/code-style.md

# Code Style Rules

## TypeScript
- Explicit return types on all exported functions
- No `any` types — use `unknown` and narrow
- Prefer `const` over `let`; never use `var`
- Interfaces over type aliases for object shapes

## Naming
- Files: kebab-case (`user-service.ts`)
- Classes: PascalCase (`UserService`)
- Functions/variables: camelCase (`getUserById`)
- Constants: SCREAMING_SNAKE_CASE (`MAX_RETRY_COUNT`)

## Imports
- Group: external libs → internal modules → types
- No barrel imports from index files in the same module
- Absolute paths only (`@/services/user` not `../../services/user`)

## Error Handling
- Always use custom error classes from `src/errors/`
- Never swallow errors silently — log or rethrow
- Async functions must handle rejection

.claude/rules/testing.md

# Testing Standards

## Requirements
- Every new route: at minimum one integration test
- Every service function: unit test with mocked dependencies
- Test files colocated with source: `user-service.test.ts`

## Patterns
- Use `describe` blocks matching the function/class name
- Test names: "should [behavior] when [condition]"
- No test should depend on another test's state
- Use factories from `tests/factories/` for test data

## Coverage
- Services: 90% line coverage minimum
- Routes: 80% minimum
- Utils: 100% — they're pure functions

.claude/rules/api-conventions.md

# API Conventions

## Endpoints
- RESTful naming: /users, /users/:id, /users/:id/orders
- Versioned: all endpoints under /api/v1/
- Plural resource names always

## Request/Response
- All inputs validated with Zod schemas
- Success: { data: T, meta?: PaginationMeta }
- Error: { error: { code: string, message: string, details?: unknown } }
- HTTP 422 for validation errors, 409 for conflicts, 404 for not found

## Auth
- JWT in Authorization: Bearer header
- Refresh token in httpOnly cookie
- All endpoints authenticated unless marked @public

Rules files can also use glob patterns to target specific paths — Claude can be told to only apply api-conventions.md when working in src/routes/.

6. .claude/commands/ — Repeatable Slash Workflows

Type /project:review. Claude runs your entire code review process.

Commands let you encode multi-step workflows as slash commands — callable with a single /project:name invocation.

.claude/commands/review.md

# Code Review Workflow

You are performing a thorough code review. Follow these steps in order:

1. **Understand the change**
   - Run `git diff main` to see all changes
   - Identify what problem this change solves

2. **Check correctness**
   - Does the logic handle edge cases?
   - Are error cases handled explicitly?
   - Any off-by-one errors or null pointer risks?

3. **Check standards compliance**
   - Apply rules from .claude/rules/code-style.md
   - Apply rules from .claude/rules/testing.md
   - Are all new endpoints following api-conventions.md?

4. **Check test coverage**
   - Run `npm test -- --coverage` and report gaps
   - Are happy path AND error paths tested?

5. **Security check**
   - Any user input used without sanitization?
   - Any secrets or credentials hardcoded?
   - Auth checks present on all new endpoints?

6. **Output**
   Provide a structured review with sections:
   - ✅ What's good
   - ⚠️ Minor issues (suggestions)
   - ❌ Must fix before merge

.claude/commands/fix-issue.md

# Fix GitHub Issue Workflow

Given an issue number, follow these steps:

1. Fetch the issue: use GitHub MCP to get issue #$ARGUMENTS
2. Understand the bug — read related source files
3. Reproduce: identify the code path causing the issue
4. Fix: make the minimal change that resolves the issue
5. Test: write a test that would have caught this bug
6. Commit: `git commit -m "fix: [issue title] (#$ARGUMENTS)"`
7. Summary: explain what was wrong and what you changed

Invoke with /project:fix-issue 247 — Claude fetches issue #247, diagnoses it, fixes it, and tests it.

7. .claude/skills/ — Context-Aware Capability Packs

Auto-triggered based on task context. Loads only when needed.

Skills are task-specific knowledge bundles that activate when Claude detects a relevant context. They keep CLAUDE.md lean while making specialized knowledge available on demand.

.claude/skills/deploy/SKILL.md

# Deployment Skill

## When to activate
Load this skill when the user mentions: deploy, deployment, release, production, staging, rollback.

## Deployment Process

### Pre-deploy checklist
- [ ] All tests passing (`npm test`)
- [ ] No TypeScript errors (`npm run type-check`)
- [ ] Migrations reviewed and tested
- [ ] Feature flags configured for gradual rollout

### Deploy to staging
```bash
gh workflow run deploy.yml -f environment=staging -f version=$(git rev-parse HEAD)

Deploy to production

Production deploys require two approvals in GitHub. Never deploy directly.

gh workflow run deploy.yml -f environment=production -f version=$(git rev-parse HEAD)

Rollback procedure

# Get previous stable version
git log --oneline -10
# Trigger rollback
gh workflow run rollback.yml -f version=<previous-sha>

Post-deploy verification

Check error rate in Datadog (should be <0.1%)
Verify key user flows in staging mirror
Monitor p95 latency for 10 minutes

Claude won't load deployment procedures when you're writing unit tests. It loads them when context signals a deployment task. This keeps the context window efficient.

---

### 8. .claude/agents/ — Specialized Sub-Agents

**Isolated context. Custom tools. Specific roles.**

Agents are specialized Claude instances with their own context windows, system prompts, and tool access. They handle focused tasks without polluting the main conversation.

**`.claude/agents/code-reviewer.md`**
```markdown
---
name: code-reviewer
description: Performs thorough code reviews. Invoke when reviewing PRs or checking code quality.
model: claude-opus-4-5
tools:
  - read
  - bash
---

You are a senior engineer specializing in code review.

Your focus areas:
- Correctness and edge case handling
- Security vulnerabilities (injection, auth bypass, data exposure)
- Performance implications (N+1 queries, unnecessary allocations)
- Maintainability (naming, complexity, coupling)
- Test coverage and test quality

You are direct. You flag real issues clearly. You don't pad reviews with excessive praise.
Format: use ✅ ⚠️ ❌ to categorize findings.

.claude/agents/security-auditor.md

---
name: security-auditor
description: Security-focused code analysis. Invoke for security reviews before major releases.
model: claude-opus-4-5
tools:
  - read
  - bash
---

You are a security engineer performing a threat-focused audit.

Check for:
- Injection vulnerabilities (SQL, command, LDAP)
- Authentication and authorization flaws
- Insecure data exposure (logging PII, unencrypted storage)
- Dependency vulnerabilities (`npm audit`)
- Secrets in code or git history
- OWASP Top 10 issues

Report severity: CRITICAL / HIGH / MEDIUM / LOW.
For each issue: describe the vulnerability, show the affected code, explain the attack vector, recommend the fix.

Each agent operates in isolation — the security auditor's findings don't bleed into your main coding session. You get focused, expert-mode output from a clean context.

9. .claude/hooks/ — Automated Guardrails

Event-driven. Runs before or after Claude takes action.

Hooks are shell scripts that execute automatically at defined trigger points. They're your last line of defense before Claude does something irreversible.

.claude/hooks/validate-bash.sh

#!/bin/bash
# Pre-execution hook — validates bash commands before Claude runs them

COMMAND="$1"

# Block destructive git operations
if echo "$COMMAND" | grep -qE "git push|git force|git reset --hard"; then
  echo "BLOCKED: Direct git push/force operations require manual execution."
  exit 1
fi

# Block production environment access
if echo "$COMMAND" | grep -qE "NODE_ENV=production|--env production"; then
  echo "BLOCKED: Production commands must be run through CI/CD pipeline."
  exit 1
fi

# Block deletion of critical files
if echo "$COMMAND" | grep -qE "rm -rf|rmdir /s"; then
  echo "BLOCKED: Recursive deletion requires manual confirmation."
  exit 1
fi

# Block direct database mutations in production
if echo "$COMMAND" | grep -qE "psql.*production|prisma migrate.*production"; then
  echo "BLOCKED: Production database operations require DBA approval."
  exit 1
fi

# Allow everything else
exit 0

Hook trigger points: - preToolUse — runs before any tool execution (most common) - postToolUse — runs after tool execution (for logging, formatting) - preFileWrite — runs before writing to a file - postFileWrite — auto-lint or format after Claude writes code

Practical hook uses: - Auto-run Prettier after every file write - Block writes to .env files - Log all bash commands to an audit trail - Run ESLint on modified files and report errors back to Claude

Putting It All Together

Here's how a real session plays out with a fully structured project:

Developer: "Review the PR for the new payment webhook handler"

Claude:
1. Loads CLAUDE.md → understands it's a Fastify/Prisma project
2. Loads CLAUDE.local.md → knows to show diff format, use port 5433
3. MCP GitHub connection → fetches the PR diff automatically
4. Loads .claude/rules/code-style.md → applies TypeScript standards
5. Loads .claude/rules/api-conventions.md → checks endpoint structure
6. Invokes code-reviewer agent → isolated, focused review context
7. Hook: validate-bash.sh → validates any commands before running
8. Output: structured review with ✅ ⚠️ ❌ findings

Total prompting required from developer: one sentence.

Without structure, that same task requires several paragraphs of context, repeated for every session.

Starter Template

Clone and adapt this minimal structure to get started:

mkdir -p .claude/{rules,commands,skills,agents,hooks}

# Create the essential files
touch CLAUDE.md
touch CLAUDE.local.md
echo "CLAUDE.local.md" >> .gitignore
echo ".claude/settings.local.json" >> .gitignore

touch .mcp.json
touch .claude/settings.json
touch .claude/rules/code-style.md
touch .claude/rules/testing.md
touch .claude/commands/review.md
touch .claude/hooks/validate-bash.sh
chmod +x .claude/hooks/validate-bash.sh

Then fill in CLAUDE.md with your project context. That's the highest-leverage starting point — everything else builds on it.

Key Principles

1. CLAUDE.md is infrastructure, not documentation. Write it for Claude, not for humans. It should enable immediate, correct action — not explain things a human already knows from the codebase.

2. Separate shared from personal. .md → committed. .local.md → gitignored. Team standards stay consistent. Individual preferences stay personal.

3. Keep context lean. Rules and skills load contextually for a reason. Don't dump everything into CLAUDE.md. A bloated context window dilutes attention on what matters.

4. Hooks are guardrails, not restrictions. They block irreversible operations and enforce automation — they don't limit what Claude can help you think through. Block the action, not the thinking.

5. Agents for isolation, commands for workflow. Complex multi-step workflows → commands. Tasks requiring focused, expert-mode reasoning → agents. Both beat re-explaining the same thing every session.

Resources

Credit: Project structure diagram by Sandipan Bhaumik — Data & AI Leader at agentbuild.ai

Found this useful? ⭐ Star the repo and share it with your team. Have additions or corrections? Open an issue or submit a PR.

LLM Fine-Tuning vs RAG: When to Use Which

2026-04-18T00:00:00+05:30

LLM Fine-Tuning vs RAG: When to Use Which

A practical decision framework for teams building with LLMs — with real trade-offs, cost analysis, and when to combine both

The Core Question
What Is RAG?
What Is Fine-Tuning?
Head-to-Head Comparison
When to Choose RAG
When to Choose Fine-Tuning
When to Use Both
Cost & Complexity Analysis
Decision Framework
Implementation Quickstart
Common Mistakes
Resources

The Core Question

You're building an AI product. Your LLM doesn't know your data, your domain, or your tone. How do you fix that?

Two approaches dominate:

RAG (Retrieval-Augmented Generation): Give the model relevant information at query time by retrieving it from a knowledge base.
Fine-Tuning: Re-train the model on your data so the knowledge is baked into the weights.

Both work. Both have real trade-offs. Picking the wrong one costs months and thousands of dollars. This guide gives you a clear framework for deciding.

What Is RAG?

RAG keeps the base model frozen and dynamically injects relevant context at inference time.

User Query
    ↓
[Embed query] → [Search vector DB] → [Retrieve top-k chunks]
    ↓
[Augmented prompt: retrieved chunks + original query]
    ↓
LLM generates answer grounded in retrieved context

The pipeline:

from openai import OpenAI
from pinecone import Pinecone

client = OpenAI()
pc = Pinecone(api_key="YOUR_KEY")
index = pc.Index("knowledge-base")

def rag_query(user_question: str) -> str:
    # 1. Embed the question
    embedding = client.embeddings.create(
        model="text-embedding-3-small",
        input=user_question
    ).data[0].embedding

    # 2. Retrieve relevant chunks
    results = index.query(vector=embedding, top_k=5, include_metadata=True)
    context = "\n\n".join([r.metadata["text"] for r in results.matches])

    # 3. Generate grounded answer
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer using only the provided context. If the answer isn't in the context, say so."},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_question}"}
        ]
    )
    return response.choices[0].message.content

What Is Fine-Tuning?

Fine-tuning continues training a pre-trained model on your dataset, updating its weights to encode new knowledge, style, or behavior.

Base Model (frozen knowledge)
    ↓
[Your training data: (prompt, ideal_response) pairs]
    ↓
[Gradient updates via supervised learning]
    ↓
Fine-Tuned Model (knowledge baked into weights)

Training data format (OpenAI JSONL):

{"messages": [{"role": "system", "content": "You are a support agent for Acme SaaS."}, {"role": "user", "content": "How do I reset my API key?"}, {"role": "assistant", "content": "To reset your API key: go to Settings → API → Regenerate Key. Your old key is immediately invalidated."}]}
{"messages": [{"role": "system", "content": "You are a support agent for Acme SaaS."}, {"role": "user", "content": "What's the rate limit on the free plan?"}, {"role": "assistant", "content": "Free plan: 100 requests/minute, 10,000 requests/month. Upgrade to Pro for 1,000 req/min."}]}

Launching a fine-tune (OpenAI):

from openai import OpenAI

client = OpenAI()

# Upload training file
with open("training_data.jsonl", "rb") as f:
    file = client.files.create(file=f, purpose="fine-tune")

# Start fine-tune job
job = client.fine_tuning.jobs.create(
    training_file=file.id,
    model="gpt-4o-mini-2024-07-18"
)

print(f"Fine-tune job started: {job.id}")
# Monitor: client.fine_tuning.jobs.retrieve(job.id)

Head-to-Head Comparison

Dimension	RAG	Fine-Tuning
Knowledge updates	Real-time — just update the DB	Requires retraining (hours/days)
Data freshness	Always current	Stale until retrained
Setup complexity	Medium (pipeline + vector DB)	High (data prep + training loop)
Cost to update	Low (upsert new docs)	High (full training run)
Inference cost	Higher (embedding + retrieval + generation)	Lower (just generation)
Handles new facts	✅ Excellent	❌ Needs retraining
Changes model behavior/style	❌ Limited	✅ Excellent
Reduces hallucination	✅ Strong (grounded in retrieved text)	⚠️ Moderate
Data requirements	Documents/chunks	50–1000+ (prompt, response) pairs
Transparency	High (can cite sources)	Low (black box)
Privacy	Data stays in your DB	Data sent to training provider
Time to production	Days	Weeks

When to Choose RAG

RAG is the right default for most teams. Choose it when:

✅ Your knowledge changes frequently

News, product documentation, pricing, inventory, policy — anything that updates weekly, daily, or in real time. Retraining a model every time your docs change is impractical. RAG lets you update your knowledge base and the model immediately reflects it.

Good RAG use cases:
- Internal company knowledge base assistant
- Customer support bot with evolving product docs
- Legal document Q&A (regulations change)
- E-commerce catalog search & Q&A
- News summarization / research assistant

✅ You need source citations

RAG retrieves specific chunks — you always know which document the answer came from. This is essential for compliance, legal, and medical contexts where "the AI told me" isn't sufficient.

✅ You have large volumes of long-tail knowledge

A model can't memorize 50,000 support articles. RAG surfaces the right 3 at query time. Fine-tuning on 50,000 articles would require enormous training data and still wouldn't guarantee retrieval of the right fact.

✅ You're prototyping or iterating fast

Stand up a RAG pipeline in a day. Fine-tuning takes weeks of data preparation, training, and evaluation. Ship with RAG, decide later if fine-tuning adds enough value.

✅ Reducing hallucinations is the priority

By forcing the model to answer from retrieved context, RAG significantly reduces hallucinations on factual questions. It's not perfect, but it's the most reliable grounding technique available today.

When to Choose Fine-Tuning

Fine-tuning earns its cost when RAG can't solve the problem.

✅ You need to change how the model behaves, not just what it knows

RAG adds context. Fine-tuning changes behavior. If you need the model to consistently write in your brand's voice, follow a specific output schema every time, or reason like a domain expert — fine-tuning is the lever.

Good fine-tuning use cases:
- Consistent brand/tone across all outputs
- Domain-specific reasoning (medical diagnosis, legal analysis)
- Structured output compliance (always return valid JSON schema)
- Code generation in your internal framework/style
- Language localization (dialect, formality level)

✅ You have a well-defined, stable task

Fine-tuning excels at narrow, repeated tasks with clear right answers. Classify this support ticket. Extract these fields from this document. Convert this natural language query to SQL.

✅ Latency and cost matter at scale

RAG requires an embedding call + vector search + generation. Fine-tuning requires only generation. At very high volume (millions of queries/day), that difference matters. Fine-tuned smaller models can also match GPT-4 quality on narrow tasks at a fraction of the cost.

# Fine-tuned gpt-4o-mini for SQL generation
# vs. RAG + gpt-4o for same task
# Cost difference at 1M queries/day: ~$800/day vs ~$120/day

✅ You have high-quality labeled examples (50+)

Fine-tuning requires (prompt, ideal_response) pairs. If you've already logged thousands of correct interactions, or have domain experts who can label examples, that's the signal you need.

✅ The task requires reasoning patterns, not facts

Teaching a model how to think about a problem (legal reasoning, medical differential diagnosis, financial analysis frameworks) is better done through fine-tuning than RAG. You're not injecting facts — you're adjusting the reasoning process.

When to Use Both

The most powerful production systems combine both. This is called Fine-Tuned RAG or Retrieval-Augmented Fine-Tuning (RAFT).

Fine-Tuning handles:          RAG handles:
- Output format               - Current facts
- Domain reasoning style      - Specific document retrieval
- Consistent tone             - Source citation
- Task-specific behavior      - Knowledge updates

Real-world example — Cursor (AI code editor): - Fine-tuned on code understanding, editing patterns, and diff formats - RAG over your local codebase for file-specific context

Real-world example — Medical AI assistant: - Fine-tuned on clinical reasoning patterns and medical note formats - RAG over current drug databases, clinical guidelines, and patient records

Implementation pattern:

def fine_tuned_rag_query(user_question: str) -> str:
    # Step 1: Retrieve relevant context (RAG)
    context = retrieve_context(user_question)

    # Step 2: Query fine-tuned model with retrieved context
    response = client.chat.completions.create(
        model="ft:gpt-4o-mini:your-org:your-model-id",  # fine-tuned model
        messages=[
            {"role": "system", "content": DOMAIN_SYSTEM_PROMPT},
            {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {user_question}"}
        ]
    )
    return response.choices[0].message.content

Cost & Complexity Analysis

RAG Cost Profile

Component	One-time	Ongoing
Embedding documents	$5–50 (1M tokens)	Per update
Vector DB hosting	—	$70–700/mo (Pinecone) or free (self-hosted)
Inference (per query)	—	~$0.002–0.01/query
Engineering setup	2–5 days	Low maintenance

Fine-Tuning Cost Profile

Component	One-time	Ongoing
Data preparation	1–4 weeks	Per retrain
Training run	$50–500 (small model)	Per retrain
Evaluation	1–2 weeks	Per retrain
Inference (per query)	—	~30–50% cheaper than base
Engineering setup	3–8 weeks	Medium maintenance

Break-even rule of thumb: Fine-tuning starts making financial sense when you have >500K queries/month on a well-defined task, AND the task is stable enough that you won't need frequent retraining.

Decision Framework

Is your knowledge dynamic (changes weekly or more)?
  └─ Yes → RAG

Do you need source citations?
  └─ Yes → RAG

Is your primary problem behavior/style/reasoning consistency?
  └─ Yes → Fine-Tuning

Do you have 50+ high-quality labeled (prompt, response) pairs?
  └─ No → RAG (you're not ready for fine-tuning)
  └─ Yes → Fine-Tuning is viable

Is latency/cost critical at >500K queries/month?
  └─ Yes → Consider Fine-Tuning or Fine-Tuned RAG

Are you still iterating on the product?
  └─ Yes → RAG (faster to change)
  └─ No, task is stable → Fine-Tuning

Do you need both domain behavior AND current knowledge?
  └─ Yes → Fine-Tuned RAG

Default recommendation: Start with RAG. It's faster, cheaper to iterate, and solves 80% of use cases. Add fine-tuning only when you have a stable task, quality training data, and a clear gap that RAG can't close.

Implementation Quickstart

RAG in 30 minutes (Chroma + OpenAI)

pip install chromadb openai

import chromadb
from openai import OpenAI

client = OpenAI()
chroma = chromadb.Client()
collection = chroma.create_collection("knowledge-base")

def add_documents(docs: list[str]):
    embeddings = [
        client.embeddings.create(model="text-embedding-3-small", input=d).data[0].embedding
        for d in docs
    ]
    collection.add(
        documents=docs,
        embeddings=embeddings,
        ids=[f"doc_{i}" for i in range(len(docs))]
    )

def query(question: str) -> str:
    q_emb = client.embeddings.create(
        model="text-embedding-3-small", input=question
    ).data[0].embedding
    results = collection.query(query_embeddings=[q_emb], n_results=3)
    context = "\n".join(results["documents"][0])
    resp = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": "Answer only from the context provided."},
            {"role": "user", "content": f"Context: {context}\n\nQ: {question}"}
        ]
    )
    return resp.choices[0].message.content

Fine-Tuning Checklist

[ ] Collect 50–1000 (prompt, ideal_response) examples
[ ] Ensure examples cover edge cases, not just easy ones
[ ] Format as JSONL with messages array (system, user, assistant)
[ ] Hold out 10–20% as validation set
[ ] Run fine-tune job (OpenAI, Together AI, or self-hosted with Axolotl)
[ ] Evaluate on validation set — compare to base model
[ ] A/B test in production with 5–10% traffic split
[ ] Set up retraining pipeline for when data drifts

Common Mistakes

RAG mistakes: - Chunks too large — 500–1000 tokens per chunk is usually optimal. Larger chunks dilute relevance. - No metadata filtering — Always filter by date, category, or source before vector search. - Skipping re-ranking — Use a cross-encoder to re-rank retrieved chunks before passing to the LLM. - Ignoring chunking strategy — Sentence-based chunking often beats fixed-size for prose documents.

Fine-tuning mistakes: - Too little data — Under 50 examples rarely produces meaningful improvement. - Low-quality examples — 100 excellent examples beat 1,000 mediocre ones. Every time. - Forgetting catastrophic forgetting — Fine-tuning can degrade general capability. Test broadly, not just on your task. - No evaluation set — Without held-out validation, you can't tell if fine-tuning actually helped. - Fine-tuning when prompt engineering would suffice — Try a well-crafted few-shot prompt first. You might not need fine-tuning at all.

Resources

Found this useful? ⭐ Star the repo and share it with your team. Have a use case or mistake I missed? Open an issue or submit a PR.

Prompt Engineering: Techniques That Actually Matter

2026-04-17T00:00:00+05:30

Prompt Engineering: Techniques That Actually Matter

A practical guide to getting reliable, high-quality outputs from LLMs — with real examples and patterns you can use today

Why Prompt Engineering Still Matters
Core Techniques
1. Be Explicit About Format & Length
2. Role + Context Framing
3. Chain-of-Thought (CoT) Prompting
4. Few-Shot Examples
5. Output Constraints & Schema Forcing
6. Negative Prompting — Tell It What NOT to Do
7. ReAct — Reasoning + Acting
8. Self-Consistency & Sampling
System Prompt Architecture
Prompt Patterns for Common Tasks
What Doesn't Work (And Why)
Evaluation: How to Know If Your Prompt Is Good
Resources

Why Prompt Engineering Still Matters

With every new model release, someone declares "prompt engineering is dead." It isn't.

Models are getting better at understanding intent — but the gap between a mediocre prompt and a great one still produces dramatically different results. In production systems, that gap means the difference between a feature that works reliably and one that fails 20% of the time.

Prompt engineering is less about magic words and more about clear communication and constraint. Think of it as writing a precise spec for a very capable but very literal contractor.

This guide focuses on techniques that hold up across models and tasks — not tricks that worked once on GPT-3.

Core Techniques

1. Be Explicit About Format & Length

The single highest-ROI change in most prompts.

❌ Vague:

Summarize this article.

✅ Explicit:

Summarize this article in 3 bullet points, each under 20 words.
Focus only on the business implications. Skip technical details.

Why it works: Models default to verbose, general outputs. Constraints force compression and prioritization — which is usually what you actually want.

Format options to specify: - Output length (under 100 words, exactly 3 items, one paragraph) - Structure (bullet points, numbered list, JSON, markdown table) - Tone (professional, casual, like you're explaining to a 10-year-old) - What to omit (no caveats, no preamble, don't repeat the question)

2. Role + Context Framing

Give the model an identity and a situation. This activates relevant "knowledge modes."

You are a senior backend engineer reviewing a pull request.
The codebase uses Python 3.11, FastAPI, and Postgres.
The team prioritizes readability over cleverness.

Review this function and suggest improvements:
[code here]

The key components: - Who the model is (senior backend engineer) - What context it's operating in (FastAPI, Postgres) - What it values (readability over cleverness)

Role framing is especially powerful for: - Code review (activates senior-engineer judgment) - Writing (activates editorial voice) - Analysis (activates consultant framing) - Customer support drafts (activates empathetic tone)

# In a system prompt for a production app
SYSTEM_PROMPT = """
You are a customer support specialist for Acme SaaS.
Your tone is warm, concise, and solution-focused.
You only answer questions about our product.
If you don't know something, say so and offer to escalate.
Never make up features or pricing.
"""

3. Chain-of-Thought (CoT) Prompting

For reasoning tasks, asking the model to "think step by step" dramatically improves accuracy.

❌ Direct answer prompt:

A store sells apples for $0.50 each and oranges for $0.75 each.
If you buy 4 apples and 3 oranges, what's the total?

(Model often rushes to an answer and miscalculates)

✅ Chain-of-Thought:

A store sells apples for $0.50 each and oranges for $0.75 each.
If you buy 4 apples and 3 oranges, what's the total?

Think through this step by step before giving your final answer.

Zero-shot CoT trigger phrases: - Think step by step. - Let's work through this carefully. - Break this down before answering. - Reason through the problem first.

Few-shot CoT (even more powerful for complex tasks):

Q: If I have 3 boxes with 8 items each, and I remove 5 items total, how many remain?
A: Let me work through this step by step.
   - 3 boxes × 8 items = 24 items total
   - 24 - 5 = 19 items remain
   Answer: 19

Q: [Your actual question here]
A: Let me work through this step by step.

4. Few-Shot Examples

Show, don't just tell. Examples are often more reliable than instructions.

The pattern:

Classify customer feedback as: POSITIVE, NEGATIVE, or NEUTRAL.

Examples:
Input: "This product changed my life, absolutely love it!"
Output: POSITIVE

Input: "Arrived broken, terrible packaging."
Output: NEGATIVE

Input: "It's fine, does what it says."
Output: NEUTRAL

Now classify:
Input: "Took forever to arrive but the quality is great."
Output:

Rules for good few-shot examples: - Use 2–5 examples (more isn't always better — it inflates context) - Cover edge cases you care about - Keep examples diverse, not repetitive - Match the format you want in the output exactly - Put your actual input last

5. Output Constraints & Schema Forcing

For applications that consume LLM output programmatically, JSON schema forcing is essential.

Without constraints:

Extract the name, email, and company from this text:
"Hi, I'm Sarah Chen from Acme Corp. You can reach me at sarah@acme.com"

Output might be a paragraph, a list, a table — inconsistent and hard to parse.

With schema forcing:

Extract information from the text below.
Respond ONLY with valid JSON matching this exact schema — no other text:

{
  "name": "string",
  "email": "string or null",
  "company": "string or null"
}

Text: "Hi, I'm Sarah Chen from Acme Corp. You can reach me at sarah@acme.com"

In code — using structured outputs (OpenAI):

from openai import OpenAI
from pydantic import BaseModel

class ContactInfo(BaseModel):
    name: str
    email: str | None
    company: str | None

client = OpenAI()
response = client.beta.chat.completions.parse(
    model="gpt-4o-2024-08-06",
    messages=[{"role": "user", "content": "Extract: Hi, I'm Sarah Chen from Acme Corp..."}],
    response_format=ContactInfo,
)
print(response.choices[0].message.parsed)
# ContactInfo(name='Sarah Chen', email='sarah@acme.com', company='Acme Corp')

6. Negative Prompting — Tell It What NOT to Do

Models respond well to explicit exclusions. Don't just describe what you want — describe what you don't want.

Write a short product description for this ergonomic chair.

Do NOT:
- Use the word "revolutionary" or "game-changing"
- Make unverifiable health claims
- Exceed 80 words
- Use a bulleted list

DO:
- Focus on comfort and design
- Use a conversational but professional tone

Common things worth negating: - Do not add caveats or disclaimers unless explicitly asked - Do not repeat back the question or instructions - Do not say "As an AI language model..." - Do not use filler phrases like "Certainly!" or "Great question!" - Do not invent information you're not certain about — say "I don't know" instead

7. ReAct — Reasoning + Acting

For agentic tasks — where a model needs to take multiple steps, use tools, and observe results — the ReAct pattern structures the loop.

You have access to these tools:
- search(query) → returns web search results
- calculator(expression) → evaluates math expressions
- get_weather(city) → returns current weather

To complete a task:
1. THINK: reason about what you need to do next
2. ACT: call a tool with the required input
3. OBSERVE: note the tool's output
4. Repeat until you can give a final answer

Task: What's the temperature difference between the current temperature
in Mumbai and the boiling point of water?

Example model output with ReAct:

THINK: I need the current temperature in Mumbai, then compare to water's boiling point (100°C).
ACT: get_weather("Mumbai")
OBSERVE: Current temp in Mumbai: 34°C

THINK: Boiling point of water is 100°C. Difference = 100 - 34 = 66°C.
ANSWER: The boiling point of water is 66°C higher than the current temperature in Mumbai.

This is the backbone of most AI agent frameworks (LangChain, LlamaIndex, AutoGPT).

8. Self-Consistency & Sampling

For high-stakes reasoning tasks, generate multiple responses and take a majority vote or the best-reasoned answer.

import openai
from collections import Counter

def self_consistent_answer(question: str, n: int = 5) -> str:
    responses = []
    for _ in range(n):
        response = openai.chat.completions.create(
            model="gpt-4o",
            messages=[
                {"role": "user", "content": f"{question}\nThink step by step."}
            ],
            temperature=0.7  # Higher temp = more diverse responses
        )
        responses.append(response.choices[0].message.content)

    # Extract final answers and take majority vote
    # (In practice, parse the final answer from each response)
    return responses  # Then analyze for consensus

# Useful for: classification, fact extraction, code generation

When to use: Legal/compliance classification, medical triage, financial decisions, any task where consistency matters more than speed.

System Prompt Architecture

For production applications, think of your system prompt as having distinct sections:

[IDENTITY]
You are [name/role], [brief description].

[CONTEXT]
You are operating as [product context].
Users are [who they are].

[CAPABILITIES]
You can help with: [explicit list]
You have access to: [tools/data]

[CONSTRAINTS]
Never: [hard limits]
Always: [non-negotiables]
If unsure: [fallback behavior]

[FORMAT]
Respond in [format].
Keep responses [length guideline].
Tone: [tone description].

Real example:

You are Aria, a support assistant for CloudBase, a developer infrastructure platform.

You are talking to software engineers and DevOps professionals who are customers of CloudBase.
They expect direct, technical answers without hand-holding.

You can help with: billing questions, technical troubleshooting, account settings, and feature explanations.
You have access to our knowledge base and the user's account information.

Never: make up pricing, promise features that don't exist, or share other customers' data.
Always: provide documentation links when explaining technical concepts.
If you can't resolve an issue: offer to create a support ticket with a 4-hour SLA.

Keep responses under 150 words unless a technical explanation requires more.
Use markdown formatting for code. Tone: direct, friendly, technical.

Prompt Patterns for Common Tasks

Summarization

Summarize the following [document type] for a [target audience].
Focus on: [key themes].
Ignore: [what to skip].
Format: [3 bullet points / 1 paragraph / executive summary].
Length: under [N] words.

Classification

Classify the input into exactly one of these categories: [A, B, C, D].
If ambiguous, choose the closest match.
Respond with only the category name — no explanation.

Examples:
[2-3 labeled examples]

Input: [text to classify]

Code Generation

Write a [language] function that [specific behavior].
Requirements:
- [requirement 1]
- [requirement 2]
Include:
- Type hints
- A docstring
- 2-3 unit tests
Do not include: package imports I didn't ask for, main() boilerplate.

Data Extraction

Extract the following fields from the text below.
If a field is missing, use null.
Return only valid JSON, no other text.

Fields: {field1: type, field2: type, field3: type}

Text: [input text]

What Doesn't Work (And Why)

Technique	Why It Fails
`"Answer as best you can"`	Too vague — the model already tries to do this
`"Be creative but accurate"`	Contradictory constraints confuse the model
Extremely long system prompts	Critical instructions at the end get lost (recency/primacy bias)
`"Never make mistakes"`	Models can't guarantee correctness — adds false confidence
Repeating the same instruction 5x	Repetition ≠ emphasis; use structure instead
`"Think outside the box"`	Generic phrase with no actionable meaning
Over-constraining	Too many "don't do X" rules creates failure modes

The #1 failure mode: Prompts that describe what you want the output to look like but not what the model should actually reason about. Describe the reasoning process, not just the output.

Evaluation: How to Know If Your Prompt Is Good

Gut feeling isn't good enough for production. Use these methods:

1. Regression testing

TEST_CASES = [
    {"input": "...", "expected": "POSITIVE"},
    {"input": "...", "expected": "NEGATIVE"},
    # 20-50 cases covering edge cases
]

def eval_prompt(prompt, test_cases):
    correct = 0
    for case in test_cases:
        output = call_llm(prompt, case["input"])
        if output.strip() == case["expected"]:
            correct += 1
    return correct / len(test_cases)

2. LLM-as-judge

You are evaluating an AI response for quality.

Criteria:
- Accuracy (1-5): Is the information correct?
- Relevance (1-5): Does it answer the question asked?
- Conciseness (1-5): Is it appropriately brief?

Question: [original question]
Response: [model output]

Score each criterion and provide a one-sentence justification.

3. A/B testing in production — Route 5-10% of traffic to a new prompt variant. Measure task completion, user corrections, escalation rate.

Resources

Found this useful? ⭐ Star the repo and share it with your team.
Have a technique I missed? Open an issue or submit a PR.

Claude's Hidden Power: Skills, Plugins, and the .md Files That Make It Extraordinary

2026-04-15T00:00:00+05:30

Claude's Hidden Power: Skills, Plugins, and the .md Files That Make It Extraordinary**

Most people use Claude like a search engine. The ones getting extraordinary results have figured out something different — and it starts with a plain text file.*

There is a version of Claude that writes generic emails and summarizes articles. And then there is the version that builds full PowerPoint decks, generates production-ready PDFs, writes Word documents with tables of contents, connects to your Google Calendar and drafts meeting invites — all from a single prompt.

The difference is not a smarter model. It is a system called Skills — and almost nobody talks about it.

This article is your complete guide. We will cover what Skills are, how .md files give Claude expert-level instructions, and how plugins and MCP tools turn Claude into an autonomous agent that can actually get things done.

Why Claude feels different from other AI tools

Most AI assistants are stateless — they respond to whatever you type with their general training. Claude, when used through Anthropic's platform, is different. It has access to a structured file system of instructions, and it reads those instructions before tackling your task.

Think of Claude as a brilliant generalist who, when given a specialized task, reaches for the right training manual before starting. That manual is a Skill file — a plain Markdown document containing expert-level instructions for exactly that type of work.

These Skill files live at paths like /mnt/skills/public/pptx/SKILL.md and /mnt/skills/public/pdf/SKILL.md. When you ask Claude to "create a PowerPoint," it does not just start generating slides. It first reads the PPTX Skill file, absorbs the best practices encoded there, and then executes your request with the precision of someone who has built hundreds of decks.

What exactly is a .md file?

Markdown (the .md extension) is a lightweight text format created in 2004. You write plain text with simple symbols, and it renders as formatted content. A # becomes a heading. A word becomes bold. A - starts a bullet list.

Here is what a minimal Markdown file looks like:

# My Skill

## Overview
This skill creates professional PDF reports.

## Steps
1. Install the required library
2. Set up the document structure
3. Add content and styling
4. Save the output

## Best practices
- Always use A4 page size for business docs
- Keep fonts to 2 families maximum
- Never skip the table of contents

That is it. No code. No complex configuration. Just structured text that Claude can read and follow. This simplicity is the genius of the system — anyone can write a Skill file, and Claude can follow it perfectly.

The anatomy of a Skill file

A well-crafted SKILL.md typically has six parts. Understanding them will help you both use existing skills effectively and write your own.

1. The frontmatter block

At the very top, a YAML block (between triple dashes) contains metadata: the skill's name, a description, and trigger phrases that tell Claude when to load it.

---
name: pdf
description: Use this skill whenever the user wants to
             create, read, or manipulate PDF files.
             Triggers: 'create PDF', 'pdf report', '.pdf'
---

This description is critical. It is how Claude matches your request to the right skill. A vague description means the skill never gets triggered.

2. Overview

A brief explanation of what the skill does, what libraries it uses, and any important limitations to keep in mind.

3. Quick Start

The minimum viable code or steps to get something working. Claude prioritizes this when you need a fast result.

4. Detailed instructions

Step-by-step guidance for the full range of scenarios, including edge cases. This is the bulk of any serious Skill file.

5. Best practices and warnings

The hard-won wisdom — things that break, common mistakes, and the non-obvious rules that separate a mediocre output from a great one. For example, the ReportLab PDF skill contains an explicit warning: never use Unicode subscript characters in PDFs because the built-in fonts do not support them and they render as solid black boxes.

6. Quick reference table

A summary of the most common operations at a glance. Claude uses this to quickly orient itself in complex tasks.

Every time Claude uses a skill, it is following a recipe that has been tested and refined. You are not getting a one-off guess — you are getting a repeatable, high-quality process. This is why Claude can produce a 20-page formatted PDF with headers, tables, and page numbers in under 30 seconds.

Plugins: when Claude needs to act, not just think

Skills give Claude knowledge. Plugins give Claude hands.

In Claude's ecosystem, a plugin (also called a tool or connector) is an integration that lets Claude interact with the real world. Send an email. Read a calendar. Create a file. Search the web. Run terminal commands.

Claude's built-in tools include:

web_search — Search the internet for current information
bash_tool — Run actual terminal commands on a Linux machine
create_file — Generate and save files of any type
view — Read files and directories (including Skill files)
image_search — Find and display images from the web
places_search — Search Google Maps for locations
weather_fetch — Get real-time weather data

But the more interesting category is MCP servers.

MCP: the protocol that connects Claude to everything

MCP stands for Model Context Protocol — an open standard Anthropic developed for connecting AI models to external services. Think of it as the USB standard for AI integrations. Once a service implements MCP, Claude can use it without any custom code on your end.

Popular MCP connectors include Google Calendar, Gmail, Google Drive, Slack, Figma, Jira, GitHub, and dozens more. When connected, Claude does not just know about these services — it can actively use them.

Try this prompt with Google Calendar and Gmail connected: "Check what I have tomorrow, identify the longest gap in my schedule, and draft an email to my team suggesting we use that time for a sync." Claude will read your calendar, analyze it, and compose the email — all in one shot.

How skills and plugins work together

The real magic happens when skills and tools combine. Here is what Claude does when you ask it to "research AI funding trends and create a professional report":

Reads the docx or pdf Skill file to understand how to create a professional document
Uses web_search to find recent articles and data
Uses web_fetch to read full article content
Uses bash_tool to install any required Python libraries
Uses create_file to write and execute the document generation code
Uses present_files to give you a download link

Six tools, one prompt, zero manual steps. This is the architecture that makes Claude feel qualitatively different from a simple chatbot.

Writing your own Skill file

Here is the part most guides skip: you can write your own skill files and Claude will use them. If you have a task you do repeatedly — formatting a specific type of report, writing emails in a certain style, processing data in a particular way — you can encode that knowledge in a .md file.

The structure is straightforward. Start with a YAML frontmatter block. Write a clear description with trigger phrases. Add an overview, quick-start section, detailed steps, and best practices. Upload it and tell Claude to use it.

Three things that make a skill file great:

Write precise triggers. The description field determines when Claude uses your skill. Be specific about which requests should trigger it.

Include real examples. Code snippets and sample outputs in your skill file give Claude concrete patterns to follow, not abstract rules.

Document edge cases. The most valuable part of any skill file is the warnings section — what breaks, what to avoid, what looks right but isn't.

The prompt patterns that unlock all of this

Knowing that skills and tools exist changes how you should prompt Claude. Instead of describing what you want in vague terms, you can be explicit about the output and let the skill system handle the how.

Some prompts that unlock the full capability:

"Create a PDF report on [topic] with a table of contents, charts, and page numbers."
"Build a PowerPoint presentation about [subject] with 10 slides, a consistent theme, and speaker notes."
"Read my Google Calendar for this week and create a time-blocking schedule as a Word document."
"Search for the latest news on [topic], summarize the key findings, and create an email I can send to my team."
"Write a React component for a data table, then create a downloadable HTML file I can use immediately."

Specify the output format. The moment you say "PDF," "Word document," "PowerPoint," or "React component," Claude knows to load the relevant Skill file. Vague requests get vague results. Specific output formats trigger expert-level execution.

Where to go from here

The skills system is, at its core, a knowledge transfer mechanism. Experts encode their best practices into .md files. Claude reads those files and executes accordingly. The barrier between knowing how to do something well and actually doing it well collapses.

Start by exploring what skills already exist. Ask Claude to "list the available skills" or try prompts that trigger the PDF, DOCX, or PPTX skills — notice how the output quality jumps compared to a generic request. Then, think about the repetitive tasks in your own work and consider what a Skill file for those tasks would look like.

The people getting the most out of Claude are not the ones with the cleverest prompts. They are the ones who understand the architecture — and use it deliberately.

Try it right now: Open Claude and type: "Create a professional PDF report on any topic you choose, with a cover page, table of contents, and at least 3 sections." Watch what happens when a skill kicks in.

Vector Databases & Embeddings: The Engine Behind Modern AI Applications

2026-04-15T00:00:00+05:30

Vector Databases & Embeddings: The Engine Behind Modern AI Applications

How the technology powering semantic search, recommendation systems, and RAG is quietly reshaping software development

What Are Embeddings?
What Is a Vector Database?
Real-World Use Cases
Semantic Search
Retrieval-Augmented Generation (RAG)
Recommendation Systems
Anomaly Detection & Fraud Prevention
Multimodal Search
Customer Support Automation
Popular Vector Databases at a Glance
Quick Start: Building a Semantic Search App
Choosing the Right Tool
What's Next?

What Are Embeddings?

An embedding is a numerical representation of data — text, images, audio, or video — as a list of floating-point numbers (a vector). These numbers are not arbitrary; they encode meaning. Similar items end up numerically close together in this high-dimensional space.

# Example: Two semantically similar sentences map to nearby vectors
"The cat sat on the mat."   →  [0.12, -0.45, 0.88, ...]
"A feline rested on a rug." →  [0.11, -0.43, 0.86, ...]

# An unrelated sentence is far away
"Quarterly earnings rose 12%." → [0.89, 0.21, -0.34, ...]

Embeddings are generated by embedding models — neural networks trained to understand context and semantics. Popular ones include:

Model	Provider	Dimensions	Best For
`text-embedding-3-large`	OpenAI	3,072	General text
`embed-english-v3.0`	Cohere	1,024	Search & classification
`all-MiniLM-L6-v2`	HuggingFace	384	Fast, lightweight
`nomic-embed-text`	Nomic AI	768	Open-source, local use

What Is a Vector Database?

A vector database is purpose-built to store, index, and query high-dimensional vectors at scale. Unlike traditional databases that match exact values, vector DBs find approximate nearest neighbors (ANN) — items that are semantically closest to a query.

How Similarity Search Works

Query: "affordable electric cars"
          ↓
  [Embed query → vector]
          ↓
  [Search vector DB for nearest neighbors]
          ↓
  Returns: "best budget EVs 2024", "Tesla Model 3 cost breakdown", ...

The core operation is cosine similarity or dot product — measuring the angle between two vectors to determine how "close" they are in meaning.

Real-World Use Cases

1. Semantic Search

The Problem: Traditional keyword search fails when users don't use the exact right words.

The Solution: Embed both documents and queries. When a user searches, find the documents whose embeddings are closest to the query's embedding.

Real Example — Notion AI Search:
Notion uses embeddings so when you search "meeting notes from last week about marketing," it finds the right page even if it's titled "Sync — Brand Strategy 03/10" with no exact keyword match.

import openai
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_KEY")
index = pc.Index("docs-index")

def semantic_search(query: str, top_k: int = 5):
    # Embed the query
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )
    query_vector = response.data[0].embedding

    # Search the vector DB
    results = index.query(vector=query_vector, top_k=top_k, include_metadata=True)
    return results.matches

Companies using this: Notion, Elastic, Algolia, Confluence, GitHub Copilot

2. Retrieval-Augmented Generation (RAG)

The Problem: LLMs have a knowledge cutoff and can't access your private data. Fine-tuning is expensive and slow.

The Solution: Store your documents as embeddings. At query time, retrieve the most relevant chunks and inject them into the LLM's prompt as context.

User asks: "What is our refund policy for enterprise clients?"
     ↓
[Embed question] → [Search vector DB] → [Retrieve top 3 relevant policy chunks]
     ↓
[Inject chunks into LLM prompt]
     ↓
LLM answers grounded in your actual documents

Real Example — Cursor (AI Code Editor):
Cursor indexes your entire codebase. When you ask "how does auth work in this project?", it retrieves relevant files and functions using embeddings, then feeds them to the LLM — giving context-aware answers without hallucination.

Architecture overview:

[Your Documents]
      ↓
[Chunking + Embedding]
      ↓
[Vector DB (Pinecone / Weaviate / Chroma)]
      ↓ (retrieval at query time)
[LLM (GPT-4, Claude, etc.)] → [Final Answer]

Companies using this: Cursor, GitHub Copilot, Intercom Fin, Notion AI, Perplexity

3. Recommendation Systems

The Problem: Collaborative filtering ("users like you also liked...") fails for new users and new items (cold-start problem). It also can't understand item content.

The Solution: Embed items (products, movies, articles) based on their descriptions and attributes. Recommend items closest in the embedding space to what a user has interacted with.

Real Example — Spotify:
Spotify's recommendation engine embeds songs using audio features and playlist context. "Discover Weekly" works by finding songs whose vectors are close to your listening history in this embedding space.

# Simplified product recommendation
def get_recommendations(product_id: str, top_k: int = 10):
    # Fetch the product's stored embedding
    product_vector = index.fetch([product_id]).vectors[product_id].values

    # Find similar products
    similar = index.query(
        vector=product_vector,
        top_k=top_k + 1,  # +1 to exclude the product itself
        filter={"in_stock": True}
    )
    return [m for m in similar.matches if m.id != product_id]

Companies using this: Spotify, Netflix, Amazon, Pinterest, Etsy

4. Anomaly Detection & Fraud Prevention

The Problem: Fraud patterns evolve constantly. Rule-based systems become outdated quickly.

The Solution: Embed user behavior sequences (transactions, clicks, login patterns). Flag transactions whose vectors are far from a user's historical behavior cluster.

Real Example — Stripe Radar:
Stripe embeds transaction patterns and detects anomalies by identifying transactions whose vector representations are statistical outliers compared to the merchant's and user's typical behavior.

# Flag anomalous transactions
def is_suspicious(transaction_embedding, user_history_embeddings, threshold=0.7):
    similarities = [
        cosine_similarity(transaction_embedding, hist_emb)
        for hist_emb in user_history_embeddings
    ]
    avg_similarity = sum(similarities) / len(similarities)
    return avg_similarity < threshold  # Low similarity = suspicious

Companies using this: Stripe, PayPal, Mastercard, Visa, Cloudflare

5. Multimodal Search

The Problem: Users want to search with images, not just text. Or find visually similar products.

The Solution: Use multimodal embedding models (like CLIP) that map text and images into the same vector space. A text query can retrieve images, and an image query can retrieve text.

Real Example — Pinterest Visual Search:
When you tap a section of a Pinterest image to search for similar items, they're using multimodal embeddings to find visually similar content across billions of pins.

from transformers import CLIPProcessor, CLIPModel
import torch

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

# Text-to-image search
def text_to_image_search(text_query: str):
    inputs = processor(text=[text_query], return_tensors="pt")
    text_embedding = model.get_text_features(**inputs)
    # Search image embeddings in your vector DB
    return index.query(vector=text_embedding.tolist()[0], top_k=10)

Companies using this: Pinterest, Google Lens, Shopify, IKEA, Zalando

6. Customer Support Automation

The Problem: Support tickets are repetitive. Teams waste time re-answering the same questions. Knowledge bases are hard to search.

The Solution: Embed your entire knowledge base and past resolved tickets. Automatically surface the most relevant article or resolution for each new ticket.

Real Example — Intercom Fin:
Intercom's AI agent uses embeddings to match incoming customer questions against a company's entire knowledge base. It handles ~70% of tickets autonomously by finding semantically relevant answers.

Ticket routing pipeline:

[New Support Ticket]
        ↓
[Embed ticket content]
        ↓
[Query vector DB of past tickets + KB articles]
        ↓
[High similarity match] → Auto-resolve with suggested answer
[Medium similarity]     → Route to correct team with context
[Low similarity]        → Escalate as novel issue

Companies using this: Intercom, Zendesk, Freshdesk, Linear, Atlassian

Popular Vector Databases at a Glance

Database	Best For	Hosting	Open Source	Notable Feature
Pinecone	Production at scale	Managed cloud	❌	Serverless, zero-ops
Weaviate	Hybrid search	Cloud + self-hosted	✅	Built-in BM25 + vector
Qdrant	High performance	Cloud + self-hosted	✅	Rust-based, fast filtering
Chroma	Local dev & prototyping	Embedded/self-hosted	✅	Simplest to get started
pgvector	Already using Postgres	Self-hosted	✅	No new infra needed
Milvus	Large-scale enterprise	Cloud + self-hosted	✅	Handles billions of vectors

Quick Start: Building a Semantic Search App

Here's a minimal working example using Chroma (no signup needed) and OpenAI embeddings:

pip install chromadb openai

import chromadb
from openai import OpenAI

openai_client = OpenAI(api_key="YOUR_OPENAI_KEY")
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("my_docs")

# Step 1: Add documents
documents = [
    "Our return policy allows returns within 30 days of purchase.",
    "We offer free shipping on orders over $50.",
    "Customer support is available 24/7 via chat and email.",
    "Enterprise plans include dedicated account management.",
]

def embed(texts):
    res = openai_client.embeddings.create(model="text-embedding-3-small", input=texts)
    return [r.embedding for r in res.data]

collection.add(
    documents=documents,
    embeddings=embed(documents),
    ids=[f"doc_{i}" for i in range(len(documents))]
)

# Step 2: Query
query = "How do I send something back?"
results = collection.query(
    query_embeddings=embed([query]),
    n_results=2
)

print(results["documents"])
# → ['Our return policy allows returns within 30 days of purchase.']

Choosing the Right Tool

Are you prototyping / building locally?
  └─ Yes → Chroma or pgvector

Are you already using Postgres?
  └─ Yes → pgvector (zero new infra)

Do you need hybrid search (keyword + semantic)?
  └─ Yes → Weaviate or Elasticsearch with vectors

Do you need maximum performance with complex filters?
  └─ Yes → Qdrant

Do you want fully managed, zero-ops production?
  └─ Yes → Pinecone

Handling billions of vectors at enterprise scale?
  └─ Yes → Milvus

What's Next?

The vector database space is evolving fast:

Multimodal embeddings — unified search across text, image, audio, and video
Sparse + dense hybrid search — combining keyword precision with semantic understanding
Streaming vector updates — real-time embedding pipelines for live data
On-device embeddings — privacy-preserving local search on mobile/edge devices
Graph + vector hybrid stores — combining relationship graphs with semantic similarity

Resources

Found this useful? ⭐ Star the repo and share it with your team.

Have a use case I missed? Open an issue or submit a PR.

Agentic System Design Concepts - Patterns Every AI Engineer Should Know

2026-04-11T00:00:00+05:30

Building reliable AI agents isn't just about picking the right model — it's about the patterns you wire around it. Here's a concise reference of 15 agentic system design concepts worth knowing. Two lines each — just enough to understand what they do and why they matter.

Resilience & Failure Isolation

Agent Circuit Breaker — Prevents cascading failures by halting agent execution when downstream services or tools are repeatedly failing. Borrowed from distributed systems engineering, it stops a single broken tool from dragging the entire agent pipeline down.

Blast Radius Limiter — Restricts the impact of an agent failure to a defined scope so it can't propagate across the system. Think of it as a blast door: when something goes wrong, the damage stays local.

Dead Letter Queue for Agents — A holding area where failed or unprocessable agent tasks are parked for later inspection instead of silently dropped. It gives you a recoverable audit trail when tasks fall through the cracks at runtime.

Control Flow & Decision Quality

Orchestrator vs Choreography — Defines whether agent interactions are centrally directed (orchestrator controls all moves) or emergent (agents react to events and coordinate peer-to-peer). The choice shapes coupling, debuggability, and how gracefully the system degrades.

Confidence Threshold Gate — Ensures an agent only takes action when its internal confidence in a decision clears a defined threshold. A simple but powerful reliability lever: low-confidence branches pause for human review rather than guessing forward.

Replanning Loop — Allows agents to re-evaluate their plan mid-execution when context changes or a step fails, rather than continuing blindly on a stale plan. Essential for long-horizon tasks where the environment isn't static.

Human Escalation Protocol — Provides a structured mechanism for agents to hand off to a human when they're stuck, uncertain, or handling high-stakes decisions. It's not a failure mode — it's a designed off-ramp.

Tool Invocation Reliability

Idempotent Tool Calls — Ensures that a tool can be called multiple times with the same inputs without producing unintended side effects. Critical in agentic pipelines where retries happen frequently due to timeouts or partial failures.

Tool Invocation Timeout — Prevents agents from blocking indefinitely on a tool that is slow or unresponsive, forcing a graceful fallback or retry. Without this, a single flaky API can freeze an entire agent run.

Context Window Checkpointing — Periodically saves the agent's progress so it can resume from a known-good state rather than restarting from scratch after a context overflow or crash. Especially important for long-running, multi-step tasks.

Infrastructure & Routing

LLM Gateway Pattern — A single abstraction layer that manages all LLM API calls, handling routing, rate limiting, retries, and observability in one place. It decouples agent logic from model-specific SDKs, making provider swaps painless.

Semantic Caching — Stores LLM responses keyed on semantic meaning rather than exact input strings, so similar queries hit the cache even when phrased differently. Reduces latency and cost without sacrificing answer quality.

Multi-Agent State Sync — Maintains a consistent shared state across multiple agents working in parallel or in sequence. Without it, agents operating on stale or divergent state produce contradictory or redundant outputs.

Observability & Deployment

Agentic Observability Tracing — Tracks every decision, tool call, handoff, and LLM interaction across an agent run, producing a full execution trace for debugging and performance analysis. The difference between guessing why something failed and knowing.

Canary Agent Deployment — Rolls out a new agent version to a small slice of production traffic before full release, allowing you to compare behavior and catch regressions with limited blast radius. Applies standard software deployment discipline to the agent layer.

Every Claude Code Concept You Need to Know

2026-04-11T00:00:00+05:30

Claude Code is not a chatbot. It lives in your terminal, reads your actual files, writes code, runs commands, and executes multi-step workflows — all with your permission. Here are 30 concepts you need to understand it properly. No fluff, no hand-holding.

The 30 Concepts

1. The Terminal — Claude Code doesn't run in a browser. It runs in the terminal, the same text-based interface developers use daily. If you've never opened a terminal before, that's your first homework assignment.

2. Installation + Pricing — Install with a single command via npm. Pricing is token-based through your Anthropic account. There's no flat monthly fee tied to a UI — you pay for what you use, which means costs scale with how hard you push it.

3. File Access — Claude Code reads and edits files directly on your machine, with your permission. Not "paste your doc into a chat window." It opens the actual file, modifies it in-place, and saves it. This is the concept that makes it useful.

4. Image + PDF Reading — Claude Code can ingest images and PDFs as inputs. Point it at a PDF proposal or a screenshot and it processes the content directly — no manual copy-paste required.

5. Tool Use — Claude Code has built-in tools: file reading, file writing, shell execution, and more. These are the primitives it uses to act on your computer. You see each tool call as it happens in real time.

6. Prompting Techniques — Vague prompts produce garbage results. "Help me with my marketing" is useless. "Write a 3-email welcome sequence for my dog walking business targeting first-time pet owners, 150 words each" is not. Specificity is the skill.

7. CLAUDE.md — A markdown file you create in your project directory that tells Claude Code the rules, context, and conventions for that project. Think of it as a standing system prompt that persists across sessions. Every serious Claude Code user has one.

8. Plan Mode — Before Claude Code executes anything, you can ask it to plan first. It outputs what it intends to do, step by step, and waits for your approval. Run in plan mode for anything non-trivial. Review before you let it touch anything.

9. Context Window — The amount of text Claude can "hold in mind" at once during a session. Long conversations, large files, and extensive histories eat into it. When context fills up, older information gets dropped. This affects result quality.

10. Tokens + Costs — Everything processed by Claude Code — your prompts, the files it reads, its responses — is measured in tokens. Tokens drive cost. Reading a 50-page PDF burns tokens. Keep context lean and targeted to control spend.

11. Model Selection — You can choose which Claude model backs your session. Faster, cheaper models work for routine tasks. Heavier models are worth it for complex reasoning or production-grade code. Pick the right tool for the job.

12. /compact — A slash command that compresses your current conversation history into a shorter summary, freeing up context window space without wiping the session. Use it mid-task when context gets bloated.

13. /clear — Wipes the entire conversation and starts fresh. Every new task should start with a clean context. Don't carry leftover noise from a previous task into the next one. Use this more than you think you need to.

14. Session Management — Claude Code has no persistent memory between sessions by default. Start each session with your CLAUDE.md re-read to restore project context. Design your workflow around this statelessness rather than fighting it.

15. Permission Modes — By default, Claude Code asks for approval before running any shell command. This gets tedious fast. You can pre-approve safe, non-destructive commands (ls, cat, grep, mkdir, git status) in your settings.local.json. Destructive operations should always require explicit confirmation.

16. Effort Levels — You can signal how much effort you want Claude to apply. Quick answers for exploration, thorough analysis for production decisions. Matching effort level to task type saves time and tokens.

17. Interrupt + Redirect — While Claude Code is running a task, you can interrupt it mid-execution and redirect it. If it starts going down the wrong path, stop it early. Don't let it burn tokens on a wrong approach when you can see it happening.

18. Visual Studio Code — Claude Code integrates directly with VS Code. You can run it inside the VS Code terminal and see file changes reflected in your editor in real time. If you're not a terminal-native developer, this is the recommended setup.

19. Memory — Claude Code supports memory files that persist across sessions. Unlike CLAUDE.md (project-specific), memory files can store user-level preferences and context. Useful for encoding your personal conventions once and never repeating them.

20. Project vs Global — Configuration can be scoped at the project level (CLAUDE.md, settings.local.json) or at the global level (applies to all Claude Code sessions on your machine). Know which scope a setting lives in before you modify it.

21. Slash Commands — Built-in commands prefixed with / that control Claude Code's behavior: /clear, /compact, /help, and more. You can also define custom slash commands (skills) that map to your own workflows.

22. Skills — Custom slash commands you define once and reuse indefinitely. A skill is a markdown file that describes a reusable workflow. You build it once, invoke it with /skill-name, and Claude follows the instructions every time. Hundreds of community-built skills already exist on GitHub in repos like anthropics/skills and hesreallyhim/awesome-claude-code.

23. Hooks — Scripts that run automatically before or after Claude Code actions. Quality gate hooks, for example, can intercept Claude's output before it's committed and check it against defined standards. Hooks are how you enforce consistency without relying on Claude to self-police.

24. Web Browsing — Claude Code can browse the web when given the appropriate tool access. It can fetch pages, read documentation, and pull in live information as part of a task — not just work from static local files.

25. MCP Servers — Model Context Protocol servers extend Claude Code's tool access to external services: Airtable, Google Drive, Slack, GitHub, and more. Tools handle what Claude does on your computer. MCP extends that to the internet and third-party APIs. This is the integration layer.

26. Perplexity MCP — A specific MCP integration that gives Claude Code access to Perplexity's search capabilities. Useful when a task requires real-time research as part of a larger automated workflow.

27. Subagents — Multiple Claude Code instances running simultaneously, each handling a distinct subtask. Instead of processing platforms one at a time, you spin up parallel agents and run them concurrently. Subagents are how you turn Claude Code from a sequential tool into a parallel workflow engine.

28. Remote Control — Claude Code can be configured for remote access, meaning you can trigger and manage sessions from another machine or interface. Relevant for server automation and scheduled background tasks.

29. Scheduled Tasks — Claude Code workflows can be scheduled to run automatically at defined intervals. Combine this with skills and hooks and you have a self-operating workflow system that runs without manual invocation.

30. Git Version Control — Claude Code integrates with git. Every change it makes can be committed, branched, and rolled back through standard git workflows. This is your undo button. Always have Claude Code working inside a git-tracked project. Before: changes happen and you hope nothing breaks. After: every change is versioned, documented, and reversible.

The One Rule That Matters

Master five concepts before you touch the next five. The shiny object trap — jumping from MCP to subagents to hooks before understanding CLAUDE.md and context windows — is the single biggest waste of time. The gap between people getting real results and people falling behind is not talent. It is reps. Start with file access, prompting, CLAUDE.md, plan mode, and /clear. Everything else builds on those five.

Missing ZIP Option in Windows Right-Click Menu — Here's How to Fix It

2026-04-11T00:00:00+05:30

The classic "Send to → Compressed (zipped) folder" option sometimes disappears from the Windows right-click context menu. Here's what causes it and how to get it back in under two minutes.

What Happened

Windows ships with a built-in ZIP shell extension handled by zipfldr.dll. When third-party tools like Git, VLC, or OneDrive add their own context menu entries, they can displace or corrupt the ZIP handler registration — leaving you with a bloated menu but no ZIP option.

Fix 1 — Check the Send to Submenu

Before anything else, right-click your folder or file and hover over Send to →. The "Compressed (zipped) folder" option is sometimes hiding in the submenu even when it's not visible at the top level.

Fix 2 — Re-register the ZIP Shell Extension

Open Command Prompt as Administrator and run:

regsvr32 zipfldr.dll

This re-registers the native ZIP handler with Windows Shell. Restart Explorer or reboot after running it.

Fix 3 — Restart Windows Explorer

Sometimes a stale shell session is all that's causing the issue. Run this in CMD:

taskkill /f /im explorer.exe
start explorer.exe

Fix 4 — Verify the Registry Key

Press Win + R, type regedit, and navigate to:

HKEY_CLASSES_ROOT\CompressedFolder

If this key is missing or corrupted, the ZIP option will not appear anywhere in the context menu. You may need to restore it from another machine or via a .reg export.

Root Cause

Heavy context menu contributors — Git Bash, Git GUI, VLC, SkyDrive Pro — are visible in the screenshot. Any one of them can push a bad shell extension that breaks ZIP registration as a side effect. Fix 2 resolves this in most cases.

AI Agent Directory - Few Shots LLM Models

2026-04-10T00:00:00+05:30

The AI agent ecosystem is growing fast. Here's a quick directory of notable AI startups and a couple of few-shot LLM models worth knowing about. Two lines each — just enough to know what they do and why they matter.

AI Agent Directory

Can of Soup — An AI-powered app that lets you create fictional photos of you and your friends in imaginary scenarios. Built during Y Combinator, it uses generative AI to place people into any meme, outfit, or movie scene.

Deepgram — A foundational voice AI platform offering speech-to-text, text-to-speech, and voice agent APIs. Their Nova models deliver high accuracy and low latency, supporting 30+ languages for real-time transcription.

Diffuse Bio — Building generative AI for protein design, using diffusion models to engineer new proteins with control and accuracy. Their foundation model DSG-1 can generate 3D protein structures and design binders from user prompts.

Draftaid — An AI-powered CAD tool that converts 3D models into precise 2D manufacturing drawings automatically. It reduces manual drafting time by up to 90%, acting like a copilot for mechanical engineers.

Edgetrace — A YC-backed AI video analytics platform that lets users search camera networks using natural language. Primarily used by law enforcement and transportation for real-time threat detection and suspect identification.

EzDubz — A real-time AI dubbing tool that translates videos, livestreams, and phone calls while preserving the original speaker's voice. Their proprietary models clone voices on the fly and even replicate emotions across 20+ languages.

Exa — An AI-powered search engine and API built for developers and AI agents. Unlike traditional keyword search, Exa uses neural embeddings for semantic understanding, powering tools like Cursor and Lovable.

Guide Labs — Building interpretable AI foundation models that can explain their reasoning and are easy to audit. Their open-source Steerling-8B is an 8-billion-parameter LLM designed for transparency and debuggability.

Infinity AI — Now known as Lemon Slice, they build a video foundation model for human motion and emotion. Their tech generates expressive, talking characters across styles from photorealistic to cartoon.

K-Scale — Building open-source humanoid robots for developers, with models starting at $999. Their integrated software, hardware, and ML stack lets developers focus on building applications for embodied AI.

Sevn — A generative design startup using AI to automate and optimize the creative design process. Users define parameters and constraints, and Sevn generates a range of design options to explore.

Linux Inc — An AI startup focused on bringing intelligent tooling to the Linux ecosystem. They aim to simplify Linux administration and development workflows through AI-powered automation.

Metalware — A copilot for firmware engineers that automates low-level programming for embedded systems. Their binary analysis tool fuzzes ARM-based software to detect defects earlier in the development lifecycle.

Naiver AI — Navier AI provides a web-based platform for running CFD (computational fluid dynamics) simulations at scale. Their AI agents handle geometry cleanup, meshing, solver configuration, and cloud resource management autonomously.

Osium AI — An AI-powered platform that accelerates materials and chemicals R&D for industry leaders. Their software helps engineers design new materials faster, spanning alloys, polymers, textiles, and bio-based materials.

Phind — An AI search engine purpose-built for developers that generates direct, code-inclusive answers to technical questions. It combines real-time web search with specialized models trained on programming languages and frameworks.

Piramidal — Building a foundation model for the brain, trained on a massive corpus of EEG brainwave data. Their AI interprets neural signals for neurological diagnostics, already being deployed in ICU settings.

Playground — A browser-based AI image generation and design platform used by over 9 million users. It combines text-to-image generation with a full graphic design suite for logos, social media posts, and more.

PlayHT — An AI voice generation platform that offered ultra-realistic text-to-speech with 900+ voices in 142 languages. Known for voice cloning and custom voice creation through deep learning algorithms.

Sonauto — An AI music editor that turns prompts, lyrics, or melodies into full songs in any style. It supports thousands of styles with full-length songs up to 4.5 minutes, complete with vocals and instrumentation.

Tavus — An AI video personalization platform that creates hyper-personalized videos at scale from a single recording. It uses deep learning for voice synthesis and face cloning to generate thousands of unique video variations.

YonduAI — Building the robotic workforce of the future, starting with logistics automation in warehouses. They deploy humanoid robots with remote teleoperation that gradually transitions to full AI-driven automation.

Yoneda Labs — Building a foundation model for chemical reactions to help chemists optimize drug discovery. Their AI defines parameters like temperature, concentration, and catalyst to make synthesis faster and cheaper.

SyncLabs — An AI lip-sync video generator that creates perfectly synchronized mouth movements from any audio track. Their zero-shot model handles any face in any video context without prior training on specific individuals.

Few-Shot LLM Models

Llama 3.1 — Meta's open-source large language model available in 8B, 70B, and 405B parameter sizes. It supports 128K context length and multilingual capabilities, making it one of the most versatile open-weight models for fine-tuning and deployment.

Mixtral — Mistral AI's open-source mixture-of-experts (MoE) model that activates only a subset of parameters per token for efficient inference. It delivers performance comparable to much larger dense models while being significantly faster and more cost-effective to run.

My GenAI Blogs

2026-01-10T00:00:00+05:30

Why GenAI?

Generative AI has completely changed how I think about software, creativity, and problem-solving. Over the past year, I've gone deep into the world of large language models, prompt engineering, retrieval-augmented generation, fine-tuning, and AI agents. The pace of change is incredible, and I wanted a place to document what I'm learning as I go.

This blog is that place. I'll be writing about my hands-on experiences with GenAI, the tools I'm experimenting with, things that worked, things that didn't, and the lessons I've picked up along the way.

What I've Been Exploring

My GenAI journey started with using ChatGPT and Claude for day-to-day coding tasks. That quickly evolved into deeper exploration:

Prompt engineering — learning how to get consistent, high-quality outputs from LLMs by structuring prompts effectively.
RAG (Retrieval-Augmented Generation) — building pipelines that ground LLM responses in real data using vector databases and embeddings.
Fine-tuning — adapting pre-trained models for specific tasks and domains.
AI agents — creating autonomous workflows where LLMs can use tools, reason through multi-step problems, and take actions.
Local models — running open-source models like LLaMA and Mistral locally to understand how they work under the hood.

I'm not just reading about these topics. I'm building with them, breaking things, and learning from the results.

What to Expect

I plan to post at least one article a week covering topics like:

Practical tutorials on building GenAI applications
Comparisons of different models and frameworks
Deep dives into concepts like embeddings, tokenization, and attention mechanisms
Real-world use cases and project walkthroughs
Opinions on where GenAI is heading and what matters for developers

Some posts will be short and focused, others will be longer walkthroughs. The goal is to share useful, honest content from a developer's perspective.

Let's Go

I'm excited to start writing and sharing. GenAI is moving fast, and the best way to keep up is to build, experiment, and document. That's exactly what this blog is for.

Vinayak

Claude Code Project Structure: Every File and Folder Explained

Claude Code Project Structure: Every File and Folder Explained

Table of Contents

Why Structure Matters

The Complete Directory

File-by-File Breakdown

1. CLAUDE.md — The Session Brain

2. CLAUDE.local.md — Your Personal Overrides

3. .mcp.json — External Tool Connections

4. .claude/settings.json — Permissions & Model Control

5. .claude/rules/ — Contextual Coding Standards

6. .claude/commands/ — Repeatable Slash Workflows

7. .claude/skills/ — Context-Aware Capability Packs

Deploy to production

Rollback procedure

Post-deploy verification

9. .claude/hooks/ — Automated Guardrails

Putting It All Together

Starter Template

Key Principles

Resources

LLM Fine-Tuning vs RAG: When to Use Which

LLM Fine-Tuning vs RAG: When to Use Which

Table of Contents

The Core Question

What Is RAG?

What Is Fine-Tuning?

Head-to-Head Comparison

When to Choose RAG

✅ Your knowledge changes frequently

✅ You need source citations

✅ You have large volumes of long-tail knowledge

✅ You're prototyping or iterating fast

✅ Reducing hallucinations is the priority

When to Choose Fine-Tuning

✅ You need to change how the model behaves, not just what it knows

✅ You have a well-defined, stable task

✅ Latency and cost matter at scale

✅ You have high-quality labeled examples (50+)

✅ The task requires reasoning patterns, not facts

When to Use Both

Cost & Complexity Analysis

RAG Cost Profile

Fine-Tuning Cost Profile

Decision Framework

Implementation Quickstart

RAG in 30 minutes (Chroma + OpenAI)

Fine-Tuning Checklist

Common Mistakes

Resources

Prompt Engineering: Techniques That Actually Matter

Prompt Engineering: Techniques That Actually Matter

Table of Contents

Why Prompt Engineering Still Matters

Core Techniques

1. Be Explicit About Format & Length

2. Role + Context Framing

3. Chain-of-Thought (CoT) Prompting

4. Few-Shot Examples

5. Output Constraints & Schema Forcing

6. Negative Prompting — Tell It What NOT to Do

7. ReAct — Reasoning + Acting

8. Self-Consistency & Sampling

System Prompt Architecture

Prompt Patterns for Common Tasks

Summarization

Classification

Code Generation

Data Extraction

What Doesn't Work (And Why)

Evaluation: How to Know If Your Prompt Is Good

Resources

Claude's Hidden Power: Skills, Plugins, and the .md Files That Make It Extraordinary

Vector Databases & Embeddings: The Engine Behind Modern AI Applications

Vector Databases & Embeddings: The Engine Behind Modern AI Applications

Table of Contents

What Are Embeddings?

What Is a Vector Database?

How Similarity Search Works