Vector Databases & Embeddings: The Engine Behind Modern AI Applications

Posted on Wed 15 April 2026 in GenAI

Vector Databases & Embeddings: The Engine Behind Modern AI Applications

How the technology powering semantic search, recommendation systems, and RAG is quietly reshaping software development


Table of Contents


What Are Embeddings?

An embedding is a numerical representation of data — text, images, audio, or video — as a list of floating-point numbers (a vector). These numbers are not arbitrary; they encode meaning. Similar items end up numerically close together in this high-dimensional space.

# Example: Two semantically similar sentences map to nearby vectors
"The cat sat on the mat."     [0.12, -0.45, 0.88, ...]
"A feline rested on a rug."   [0.11, -0.43, 0.86, ...]

# An unrelated sentence is far away
"Quarterly earnings rose 12%."  [0.89, 0.21, -0.34, ...]

Embeddings are generated by embedding models — neural networks trained to understand context and semantics. Popular ones include:

Model Provider Dimensions Best For
text-embedding-3-large OpenAI 3,072 General text
embed-english-v3.0 Cohere 1,024 Search & classification
all-MiniLM-L6-v2 HuggingFace 384 Fast, lightweight
nomic-embed-text Nomic AI 768 Open-source, local use

What Is a Vector Database?

A vector database is purpose-built to store, index, and query high-dimensional vectors at scale. Unlike traditional databases that match exact values, vector DBs find approximate nearest neighbors (ANN) — items that are semantically closest to a query.

How Similarity Search Works

Query: "affordable electric cars"
            [Embed query  vector]
            [Search vector DB for nearest neighbors]
            Returns: "best budget EVs 2024", "Tesla Model 3 cost breakdown", ...

The core operation is cosine similarity or dot product — measuring the angle between two vectors to determine how "close" they are in meaning.


Real-World Use Cases

1. Semantic Search

The Problem: Traditional keyword search fails when users don't use the exact right words.

The Solution: Embed both documents and queries. When a user searches, find the documents whose embeddings are closest to the query's embedding.

Real Example — Notion AI Search:
Notion uses embeddings so when you search "meeting notes from last week about marketing," it finds the right page even if it's titled "Sync — Brand Strategy 03/10" with no exact keyword match.

import openai
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_KEY")
index = pc.Index("docs-index")

def semantic_search(query: str, top_k: int = 5):
    # Embed the query
    response = openai.embeddings.create(
        model="text-embedding-3-small",
        input=query
    )
    query_vector = response.data[0].embedding

    # Search the vector DB
    results = index.query(vector=query_vector, top_k=top_k, include_metadata=True)
    return results.matches

Companies using this: Notion, Elastic, Algolia, Confluence, GitHub Copilot


2. Retrieval-Augmented Generation (RAG)

The Problem: LLMs have a knowledge cutoff and can't access your private data. Fine-tuning is expensive and slow.

The Solution: Store your documents as embeddings. At query time, retrieve the most relevant chunks and inject them into the LLM's prompt as context.

User asks: "What is our refund policy for enterprise clients?"
     
[Embed question]  [Search vector DB]  [Retrieve top 3 relevant policy chunks]
     
[Inject chunks into LLM prompt]
     
LLM answers grounded in your actual documents

Real Example — Cursor (AI Code Editor):
Cursor indexes your entire codebase. When you ask "how does auth work in this project?", it retrieves relevant files and functions using embeddings, then feeds them to the LLM — giving context-aware answers without hallucination.

Architecture overview:

[Your Documents]
      
[Chunking + Embedding]
      
[Vector DB (Pinecone / Weaviate / Chroma)]
      ↓ (retrieval at query time)
[LLM (GPT-4, Claude, etc.)] → [Final Answer]

Companies using this: Cursor, GitHub Copilot, Intercom Fin, Notion AI, Perplexity


3. Recommendation Systems

The Problem: Collaborative filtering ("users like you also liked...") fails for new users and new items (cold-start problem). It also can't understand item content.

The Solution: Embed items (products, movies, articles) based on their descriptions and attributes. Recommend items closest in the embedding space to what a user has interacted with.

Real Example — Spotify:
Spotify's recommendation engine embeds songs using audio features and playlist context. "Discover Weekly" works by finding songs whose vectors are close to your listening history in this embedding space.

# Simplified product recommendation
def get_recommendations(product_id: str, top_k: int = 10):
    # Fetch the product's stored embedding
    product_vector = index.fetch([product_id]).vectors[product_id].values

    # Find similar products
    similar = index.query(
        vector=product_vector,
        top_k=top_k + 1,  # +1 to exclude the product itself
        filter={"in_stock": True}
    )
    return [m for m in similar.matches if m.id != product_id]

Companies using this: Spotify, Netflix, Amazon, Pinterest, Etsy


4. Anomaly Detection & Fraud Prevention

The Problem: Fraud patterns evolve constantly. Rule-based systems become outdated quickly.

The Solution: Embed user behavior sequences (transactions, clicks, login patterns). Flag transactions whose vectors are far from a user's historical behavior cluster.

Real Example — Stripe Radar:
Stripe embeds transaction patterns and detects anomalies by identifying transactions whose vector representations are statistical outliers compared to the merchant's and user's typical behavior.

# Flag anomalous transactions
def is_suspicious(transaction_embedding, user_history_embeddings, threshold=0.7):
    similarities = [
        cosine_similarity(transaction_embedding, hist_emb)
        for hist_emb in user_history_embeddings
    ]
    avg_similarity = sum(similarities) / len(similarities)
    return avg_similarity < threshold  # Low similarity = suspicious

Companies using this: Stripe, PayPal, Mastercard, Visa, Cloudflare


5. Multimodal Search

The Problem: Users want to search with images, not just text. Or find visually similar products.

The Solution: Use multimodal embedding models (like CLIP) that map text and images into the same vector space. A text query can retrieve images, and an image query can retrieve text.

Real Example — Pinterest Visual Search:
When you tap a section of a Pinterest image to search for similar items, they're using multimodal embeddings to find visually similar content across billions of pins.

from transformers import CLIPProcessor, CLIPModel
import torch

model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

# Text-to-image search
def text_to_image_search(text_query: str):
    inputs = processor(text=[text_query], return_tensors="pt")
    text_embedding = model.get_text_features(**inputs)
    # Search image embeddings in your vector DB
    return index.query(vector=text_embedding.tolist()[0], top_k=10)

Companies using this: Pinterest, Google Lens, Shopify, IKEA, Zalando


6. Customer Support Automation

The Problem: Support tickets are repetitive. Teams waste time re-answering the same questions. Knowledge bases are hard to search.

The Solution: Embed your entire knowledge base and past resolved tickets. Automatically surface the most relevant article or resolution for each new ticket.

Real Example — Intercom Fin:
Intercom's AI agent uses embeddings to match incoming customer questions against a company's entire knowledge base. It handles ~70% of tickets autonomously by finding semantically relevant answers.

Ticket routing pipeline:

[New Support Ticket]
        
[Embed ticket content]
        
[Query vector DB of past tickets + KB articles]
        
[High similarity match] → Auto-resolve with suggested answer
[Medium similarity]     → Route to correct team with context
[Low similarity]        → Escalate as novel issue

Companies using this: Intercom, Zendesk, Freshdesk, Linear, Atlassian


Popular Vector Databases at a Glance

Database Best For Hosting Open Source Notable Feature
Pinecone Production at scale Managed cloud Serverless, zero-ops
Weaviate Hybrid search Cloud + self-hosted Built-in BM25 + vector
Qdrant High performance Cloud + self-hosted Rust-based, fast filtering
Chroma Local dev & prototyping Embedded/self-hosted Simplest to get started
pgvector Already using Postgres Self-hosted No new infra needed
Milvus Large-scale enterprise Cloud + self-hosted Handles billions of vectors

Quick Start: Building a Semantic Search App

Here's a minimal working example using Chroma (no signup needed) and OpenAI embeddings:

pip install chromadb openai
import chromadb
from openai import OpenAI

openai_client = OpenAI(api_key="YOUR_OPENAI_KEY")
chroma_client = chromadb.Client()
collection = chroma_client.create_collection("my_docs")

# Step 1: Add documents
documents = [
    "Our return policy allows returns within 30 days of purchase.",
    "We offer free shipping on orders over $50.",
    "Customer support is available 24/7 via chat and email.",
    "Enterprise plans include dedicated account management.",
]

def embed(texts):
    res = openai_client.embeddings.create(model="text-embedding-3-small", input=texts)
    return [r.embedding for r in res.data]

collection.add(
    documents=documents,
    embeddings=embed(documents),
    ids=[f"doc_{i}" for i in range(len(documents))]
)

# Step 2: Query
query = "How do I send something back?"
results = collection.query(
    query_embeddings=embed([query]),
    n_results=2
)

print(results["documents"])
# → ['Our return policy allows returns within 30 days of purchase.']

Choosing the Right Tool

Are you prototyping / building locally?
  └─ Yes  Chroma or pgvector

Are you already using Postgres?
  └─ Yes  pgvector (zero new infra)

Do you need hybrid search (keyword + semantic)?
  └─ Yes  Weaviate or Elasticsearch with vectors

Do you need maximum performance with complex filters?
  └─ Yes  Qdrant

Do you want fully managed, zero-ops production?
  └─ Yes  Pinecone

Handling billions of vectors at enterprise scale?
  └─ Yes  Milvus

What's Next?

The vector database space is evolving fast:

  • Multimodal embeddings — unified search across text, image, audio, and video
  • Sparse + dense hybrid search — combining keyword precision with semantic understanding
  • Streaming vector updates — real-time embedding pipelines for live data
  • On-device embeddings — privacy-preserving local search on mobile/edge devices
  • Graph + vector hybrid stores — combining relationship graphs with semantic similarity

Resources


Found this useful? ⭐ Star the repo and share it with your team.

Have a use case I missed? Open an issue or submit a PR.