Vector Databases & Embeddings: The Engine Behind Modern AI Applications
Posted on Wed 15 April 2026 in GenAI
Vector Databases & Embeddings: The Engine Behind Modern AI Applications
How the technology powering semantic search, recommendation systems, and RAG is quietly reshaping software development
Table of Contents
- What Are Embeddings?
- What Is a Vector Database?
- Real-World Use Cases
- Semantic Search
- Retrieval-Augmented Generation (RAG)
- Recommendation Systems
- Anomaly Detection & Fraud Prevention
- Multimodal Search
- Customer Support Automation
- Popular Vector Databases at a Glance
- Quick Start: Building a Semantic Search App
- Choosing the Right Tool
- What's Next?
What Are Embeddings?
An embedding is a numerical representation of data — text, images, audio, or video — as a list of floating-point numbers (a vector). These numbers are not arbitrary; they encode meaning. Similar items end up numerically close together in this high-dimensional space.
# Example: Two semantically similar sentences map to nearby vectors
"The cat sat on the mat." → [0.12, -0.45, 0.88, ...]
"A feline rested on a rug." → [0.11, -0.43, 0.86, ...]
# An unrelated sentence is far away
"Quarterly earnings rose 12%." → [0.89, 0.21, -0.34, ...]
Embeddings are generated by embedding models — neural networks trained to understand context and semantics. Popular ones include:
| Model | Provider | Dimensions | Best For |
|---|---|---|---|
text-embedding-3-large |
OpenAI | 3,072 | General text |
embed-english-v3.0 |
Cohere | 1,024 | Search & classification |
all-MiniLM-L6-v2 |
HuggingFace | 384 | Fast, lightweight |
nomic-embed-text |
Nomic AI | 768 | Open-source, local use |
What Is a Vector Database?
A vector database is purpose-built to store, index, and query high-dimensional vectors at scale. Unlike traditional databases that match exact values, vector DBs find approximate nearest neighbors (ANN) — items that are semantically closest to a query.
How Similarity Search Works
Query: "affordable electric cars"
↓
[Embed query → vector]
↓
[Search vector DB for nearest neighbors]
↓
Returns: "best budget EVs 2024", "Tesla Model 3 cost breakdown", ...
The core operation is cosine similarity or dot product — measuring the angle between two vectors to determine how "close" they are in meaning.
Real-World Use Cases
1. Semantic Search
The Problem: Traditional keyword search fails when users don't use the exact right words.
The Solution: Embed both documents and queries. When a user searches, find the documents whose embeddings are closest to the query's embedding.
Real Example — Notion AI Search:
Notion uses embeddings so when you search "meeting notes from last week about marketing," it finds the right page even if it's titled "Sync — Brand Strategy 03/10" with no exact keyword match.
import openai
from pinecone import Pinecone
pc = Pinecone(api_key="YOUR_KEY")
index = pc.Index("docs-index")
def semantic_search(query: str, top_k: int = 5):
# Embed the query
response = openai.embeddings.create(
model="text-embedding-3-small",
input=query
)
query_vector = response.data[0].embedding
# Search the vector DB
results = index.query(vector=query_vector, top_k=top_k, include_metadata=True)
return results.matches
Companies using this: Notion, Elastic, Algolia, Confluence, GitHub Copilot
2. Retrieval-Augmented Generation (RAG)
The Problem: LLMs have a knowledge cutoff and can't access your private data. Fine-tuning is expensive and slow.
The Solution: Store your documents as embeddings. At query time, retrieve the most relevant chunks and inject them into the LLM's prompt as context.
User asks: "What is our refund policy for enterprise clients?"
↓
[Embed question] → [Search vector DB] → [Retrieve top 3 relevant policy chunks]
↓
[Inject chunks into LLM prompt]
↓
LLM answers grounded in your actual documents
Real Example — Cursor (AI Code Editor):
Cursor indexes your entire codebase. When you ask "how does auth work in this project?", it retrieves relevant files and functions using embeddings, then feeds them to the LLM — giving context-aware answers without hallucination.
Architecture overview:
[Your Documents]
↓
[Chunking + Embedding]
↓
[Vector DB (Pinecone / Weaviate / Chroma)]
↓ (retrieval at query time)
[LLM (GPT-4, Claude, etc.)] → [Final Answer]
Companies using this: Cursor, GitHub Copilot, Intercom Fin, Notion AI, Perplexity
3. Recommendation Systems
The Problem: Collaborative filtering ("users like you also liked...") fails for new users and new items (cold-start problem). It also can't understand item content.
The Solution: Embed items (products, movies, articles) based on their descriptions and attributes. Recommend items closest in the embedding space to what a user has interacted with.
Real Example — Spotify:
Spotify's recommendation engine embeds songs using audio features and playlist context. "Discover Weekly" works by finding songs whose vectors are close to your listening history in this embedding space.
# Simplified product recommendation
def get_recommendations(product_id: str, top_k: int = 10):
# Fetch the product's stored embedding
product_vector = index.fetch([product_id]).vectors[product_id].values
# Find similar products
similar = index.query(
vector=product_vector,
top_k=top_k + 1, # +1 to exclude the product itself
filter={"in_stock": True}
)
return [m for m in similar.matches if m.id != product_id]
Companies using this: Spotify, Netflix, Amazon, Pinterest, Etsy
4. Anomaly Detection & Fraud Prevention
The Problem: Fraud patterns evolve constantly. Rule-based systems become outdated quickly.
The Solution: Embed user behavior sequences (transactions, clicks, login patterns). Flag transactions whose vectors are far from a user's historical behavior cluster.
Real Example — Stripe Radar:
Stripe embeds transaction patterns and detects anomalies by identifying transactions whose vector representations are statistical outliers compared to the merchant's and user's typical behavior.
# Flag anomalous transactions
def is_suspicious(transaction_embedding, user_history_embeddings, threshold=0.7):
similarities = [
cosine_similarity(transaction_embedding, hist_emb)
for hist_emb in user_history_embeddings
]
avg_similarity = sum(similarities) / len(similarities)
return avg_similarity < threshold # Low similarity = suspicious
Companies using this: Stripe, PayPal, Mastercard, Visa, Cloudflare
5. Multimodal Search
The Problem: Users want to search with images, not just text. Or find visually similar products.
The Solution: Use multimodal embedding models (like CLIP) that map text and images into the same vector space. A text query can retrieve images, and an image query can retrieve text.
Real Example — Pinterest Visual Search:
When you tap a section of a Pinterest image to search for similar items, they're using multimodal embeddings to find visually similar content across billions of pins.
from transformers import CLIPProcessor, CLIPModel
import torch
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
# Text-to-image search
def text_to_image_search(text_query: str):
inputs = processor(text=[text_query], return_tensors="pt")
text_embedding = model.get_text_features(**inputs)
# Search image embeddings in your vector DB
return index.query(vector=text_embedding.tolist()[0], top_k=10)
Companies using this: Pinterest, Google Lens, Shopify, IKEA, Zalando
6. Customer Support Automation
The Problem: Support tickets are repetitive. Teams waste time re-answering the same questions. Knowledge bases are hard to search.
The Solution: Embed your entire knowledge base and past resolved tickets. Automatically surface the most relevant article or resolution for each new ticket.
Real Example — Intercom Fin:
Intercom's AI agent uses embeddings to match incoming customer questions against a company's entire knowledge base. It handles ~70% of tickets autonomously by finding semantically relevant answers.
Ticket routing pipeline:
[New Support Ticket]
↓
[Embed ticket content]
↓
[Query vector DB of past tickets + KB articles]
↓
[High similarity match] → Auto-resolve with suggested answer
[Medium similarity] → Route to correct team with context
[Low similarity] → Escalate as novel issue
Companies using this: Intercom, Zendesk, Freshdesk, Linear, Atlassian
Popular Vector Databases at a Glance
| Database | Best For | Hosting | Open Source | Notable Feature |
|---|---|---|---|---|
| Pinecone | Production at scale | Managed cloud | ❌ | Serverless, zero-ops |
| Weaviate | Hybrid search | Cloud + self-hosted | ✅ | Built-in BM25 + vector |
| Qdrant | High performance | Cloud + self-hosted | ✅ | Rust-based, fast filtering |
| Chroma | Local dev & prototyping | Embedded/self-hosted | ✅ | Simplest to get started |
| pgvector | Already using Postgres | Self-hosted | ✅ | No new infra needed |
| Milvus | Large-scale enterprise | Cloud + self-hosted | ✅ | Handles billions of vectors |
Quick Start: Building a Semantic Search App
Here's a minimal working example using Chroma (no signup needed) and OpenAI embeddings:
pip install chromadb openai
import chromadb
from openai import OpenAI
openai_client = OpenAI(api_key="YOUR_OPENAI_KEY")
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("my_docs")
# Step 1: Add documents
documents = [
"Our return policy allows returns within 30 days of purchase.",
"We offer free shipping on orders over $50.",
"Customer support is available 24/7 via chat and email.",
"Enterprise plans include dedicated account management.",
]
def embed(texts):
res = openai_client.embeddings.create(model="text-embedding-3-small", input=texts)
return [r.embedding for r in res.data]
collection.add(
documents=documents,
embeddings=embed(documents),
ids=[f"doc_{i}" for i in range(len(documents))]
)
# Step 2: Query
query = "How do I send something back?"
results = collection.query(
query_embeddings=embed([query]),
n_results=2
)
print(results["documents"])
# → ['Our return policy allows returns within 30 days of purchase.']
Choosing the Right Tool
Are you prototyping / building locally?
└─ Yes → Chroma or pgvector
Are you already using Postgres?
└─ Yes → pgvector (zero new infra)
Do you need hybrid search (keyword + semantic)?
└─ Yes → Weaviate or Elasticsearch with vectors
Do you need maximum performance with complex filters?
└─ Yes → Qdrant
Do you want fully managed, zero-ops production?
└─ Yes → Pinecone
Handling billions of vectors at enterprise scale?
└─ Yes → Milvus
What's Next?
The vector database space is evolving fast:
- Multimodal embeddings — unified search across text, image, audio, and video
- Sparse + dense hybrid search — combining keyword precision with semantic understanding
- Streaming vector updates — real-time embedding pipelines for live data
- On-device embeddings — privacy-preserving local search on mobile/edge devices
- Graph + vector hybrid stores — combining relationship graphs with semantic similarity
Resources
- OpenAI Embeddings Guide
- Pinecone Learning Center
- Weaviate Documentation
- Chroma Getting Started
- Qdrant Documentation
- pgvector GitHub
- BEIR Benchmark — Evaluate embedding models
Found this useful? ⭐ Star the repo and share it with your team.
Have a use case I missed? Open an issue or submit a PR.