Vector Databases & AI Search: pgvector, Pinecone & Embeddings Guide

What Are Vector Databases and Why They Matter

Traditional databases search by exact match (WHERE email = ?) or pattern (LIKE %keyword%). Vector databases search by meaning — they find items that are semantically similar, even if they share no common words. This is the foundation of every AI search feature, chatbot, and recommendation engine in 2026.

A vector database stores high-dimensional vectors (embeddings) and supports Approximate Nearest Neighbor (ANN) search — finding the K vectors closest to a query vector. The "approximate" part is key: exact nearest-neighbor on millions of vectors is too slow, so these databases use clever data structures (HNSW, IVF) that trade a tiny accuracy loss for 100x speed.

The market: pgvector (free PostgreSQL extension, ideal if you already use Postgres), Pinecone (managed, scales to billions of vectors), Chroma (open-source, popular for prototyping), Weaviate (open-source, built-in vectorization), and Qdrant (open-source, Rust-based, fast).

Key Takeaways

Vector DB = store embeddings + fast similarity search.

ANN search trades tiny accuracy loss for 100x speed over exact search.

pgvector: free, runs in PostgreSQL, ideal for < 10M vectors.

Pinecone: managed cloud, scales to billions, zero infrastructure.

Every AI feature (search, RAG, recommendations) needs a vector DB.

pgvector: Vector Search in PostgreSQL

pgvector is a PostgreSQL extension that adds a vector column type and distance operators. It is the most practical choice for most projects because your vectors live alongside your relational data — no separate infrastructure to manage.

pgvector supports three distance metrics: cosine distance (<=>), L2/Euclidean distance (<->), and inner product (<#>). For text embeddings, cosine distance is almost always the right choice.

The HNSW index type (added in pgvector 0.5.0) provides fast approximate nearest-neighbor search suitable for production workloads.

Snippet

-- Install pgvector 
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with embedding column
CREATE TABLE blog_chunks (
  id SERIAL PRIMARY KEY,
  post_id INTEGER REFERENCES posts(id),
  chunk_index INTEGER,
  content TEXT NOT NULL,
  embedding vector(1536),  -- OpenAI text-embedding-3-small
  metadata JSONB DEFAULT '{}'
);

-- HNSW index for fast cosine similarity search
CREATE INDEX ON blog_chunks 
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Insert with embedding
INSERT INTO blog_chunks (post_id, chunk_index, content, embedding, metadata)
VALUES (
  1, 0,
  'React Server Components render on the server and send zero JavaScript...',
  '[0.023, -0.041, 0.089, ...]'::vector,
  '{"category": "frontend", "tags": ["react", "rsc"]}'
);

-- Semantic search: find 5 most similar chunks
SELECT 
  bc.content,
  p.title AS post_title,
  1 - (bc.embedding <=> $1::vector) AS similarity
FROM blog_chunks bc
JOIN posts p ON p.id = bc.post_id
WHERE 1 - (bc.embedding <=> $1::vector) > 0.7
ORDER BY bc.embedding <=> $1::vector
LIMIT 5;

Key Takeaways

vector(1536): stores 1536-dimensional embeddings as a native column type.

<=>: cosine distance operator. 1 - distance = similarity score.

HNSW index: m=16 (connections per layer), ef_construction=64 (build quality).

Higher m and ef_construction = better accuracy, slower builds, more memory.

Combine vector search with relational filters (JOIN, WHERE) — pgvector's superpower.

Chunking Strategies for Documents

Before embedding a document, you must split it into chunks — smaller pieces that can each be independently embedded and retrieved. The chunking strategy directly impacts search quality: too large and the embedding is diluted, too small and context is lost.

The recommended approach: split by semantic boundaries (paragraphs, sections, headings) with a fixed maximum size (500-1000 tokens) and overlap (50-100 tokens). The overlap ensures that information at chunk boundaries is not lost.

For code, split by function/class boundaries. For documentation, split by section headings. For chat logs, split by conversation turns.

Snippet

// Semantic chunking with overlap
function chunkDocument(text: string, options = {
  maxChunkSize: 800,    // tokens
  overlapSize: 100,     // tokens of overlap between chunks
  separator: '\n\n',   // Split on paragraphs first
}): string[] {
  const { maxChunkSize, overlapSize, separator } = options;
  
  // Step 1: Split by semantic boundaries (paragraphs)
  const paragraphs = text.split(separator);
  
  const chunks: string[] = [];
  let currentChunk = '';
  
  for (const paragraph of paragraphs) {
    const combined = currentChunk ? currentChunk + separator + paragraph : paragraph;
    
    if (estimateTokens(combined) > maxChunkSize && currentChunk) {
      chunks.push(currentChunk.trim());
      // Start next chunk with overlap from end of current chunk
      const words = currentChunk.split(' ');
      const overlapWords = words.slice(-Math.floor(overlapSize / 1.3));
      currentChunk = overlapWords.join(' ') + separator + paragraph;
    } else {
      currentChunk = combined;
    }
  }
  
  if (currentChunk.trim()) {
    chunks.push(currentChunk.trim());
  }
  
  return chunks;
}

function estimateTokens(text: string): number {
  return Math.ceil(text.length / 4); // Rough estimate: 1 token ≈ 4 chars
}

// Usage
const blogContent = await readFile('react-server-components.md');
const chunks = chunkDocument(blogContent);
// Embed each chunk separately
for (const chunk of chunks) {
  const embedding = await getEmbedding(chunk);
  await db.query(
    'INSERT INTO blog_chunks (content, embedding) VALUES ($1, $2)',
    [chunk, JSON.stringify(embedding)]
  );
}

Production Pipeline: Ingest, Embed, Index

A production vector search pipeline has three stages: Ingest (load documents, clean, chunk), Embed (convert chunks to vectors using an embedding model), and Index (store vectors in the database with metadata for filtering).

The pipeline should be idempotent — running it twice on the same document should not create duplicates. Use content hashing or document versioning to detect changes and only re-embed modified content.

For large datasets, embed in batches (OpenAI accepts up to 2048 texts per API call) to minimize API calls and latency.

Snippet

import { createHash } from 'crypto';
import OpenAI from 'openai';

const openai = new OpenAI();

interface DocumentSource {
  id: string;
  title: string;
  content: string;
  category: string;
  tags: string[];
  updatedAt: Date;
}

async function ingestDocuments(documents: DocumentSource[]) {
  for (const doc of documents) {
    // 1. Check if already indexed (idempotency)
    const contentHash = createHash('sha256').update(doc.content).digest('hex');
    const existing = await db.query(
      "SELECT id FROM blog_chunks WHERE metadata->>'sourceId' = $1 AND metadata->>'hash' = $2",
      [doc.id, contentHash]
    );
    
    if (existing.rows.length > 0) {
      console.log(`Skipping ${doc.title} — unchanged`);
      continue;
    }
    
    // 2. Delete old chunks for this document
    await db.query("DELETE FROM blog_chunks WHERE metadata->>'sourceId' = $1", [doc.id]);
    
    // 3. Chunk the document
    const chunks = chunkDocument(doc.content);
    
    // 4. Embed all chunks in batch (up to 2048 per API call)
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: chunks,
    });
    
    // 5. Store chunks with metadata
    for (let i = 0; i < chunks.length; i++) {
      await db.query(
        `INSERT INTO blog_chunks (content, embedding, metadata)
         VALUES ($1, $2::vector, $3)`,
        [
          chunks[i],
          JSON.stringify(response.data[i].embedding),
          JSON.stringify({
            sourceId: doc.id,
            title: doc.title,
            category: doc.category,
            tags: doc.tags,
            chunkIndex: i,
            hash: contentHash,
          }),
        ]
      );
    }
    
    console.log(`Indexed ${doc.title}: ${chunks.length} chunks`);
  }
}

Key Takeaways

Content hashing ensures idempotency — same content is never re-embedded.

Batch embedding: send up to 2048 texts per API call to reduce latency.

Store metadata (category, tags, source) for filtered vector search.

Delete-then-insert pattern handles document updates cleanly.

Run the pipeline on a schedule (cron) or trigger on content changes.

Hybrid Search: Combining Vector + Full-Text

Pure vector search misses exact keyword matches (product codes, error messages). Pure full-text search misses semantic similarity. Hybrid search combines both: run a vector search AND a full-text search, then merge the results using Reciprocal Rank Fusion (RRF).

pgvector + PostgreSQL full-text search is a powerful combination: you can run both types of search in a single SQL query, using UNION ALL and a scoring function to rank results.

Snippet

-- Hybrid search: vector + full-text in a single query
WITH vector_results AS (
  SELECT id, content, 
    1 - (embedding <=> $1::vector) AS vector_score,
    NULL::float AS text_score
  FROM blog_chunks
  ORDER BY embedding <=> $1::vector
  LIMIT 20
),
text_results AS (
  SELECT id, content,
    NULL::float AS vector_score,
    ts_rank(to_tsvector('english', content), plainto_tsquery('english', $2)) AS text_score
  FROM blog_chunks
  WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $2)
  LIMIT 20
),
combined AS (
  SELECT id, content,
    COALESCE(vector_score, 0) * 0.7 + COALESCE(text_score, 0) * 0.3 AS hybrid_score
  FROM (
    SELECT * FROM vector_results
    UNION ALL
    SELECT * FROM text_results
  ) all_results
)
SELECT DISTINCT ON (id) id, content, hybrid_score
FROM combined
ORDER BY id, hybrid_score DESC
LIMIT 10;

Key Takeaways

Vector databases are essential infrastructure for AI applications. pgvector gives you vector search inside PostgreSQL with zero new infrastructure. Pinecone scales to billions of vectors with managed hosting.

The production pipeline: document → chunk (500-1000 tokens with overlap) → embed (batch API calls) → store (with metadata) → search (hybrid vector + full-text).

For interviews: explain embeddings (semantic meaning as vectors), HNSW indexing (approximate nearest neighbor), chunking strategies (why size and overlap matter), and hybrid search (combining vector + keyword search).

Key Takeaways

pgvector: vector search in PostgreSQL. No new infrastructure needed.

HNSW index: fast approximate nearest-neighbor search for production.

Chunking: 500-1000 tokens, paragraph boundaries, 50-100 token overlap.

Batch embedding: up to 2048 texts per API call to OpenAI.

Content hashing: ensures idempotent ingestion pipeline.

Hybrid search: combine vector + full-text for best results.

Article Author

Ashutosh

Lead Developer

-- Install pgvector CREATE EXTENSION IF NOT EXISTS vector; -- Table with embedding column CREATE TABLE blog_chunks ( id SERIAL PRIMARY KEY, post_id INTEGER REFERENCES posts(id), chunk_index INTEGER, content TEXT NOT NULL, embedding vector(1536), -- OpenAI text-embedding-3-small metadata JSONB DEFAULT '{}' ); -- HNSW index for fast cosine similarity search CREATE INDEX ON blog_chunks USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 64); -- Insert with embedding INSERT INTO blog_chunks (post_id, chunk_index, content, embedding, metadata) VALUES ( 1, 0, 'React Server Components render on the server and send zero JavaScript...', '[0.023, -0.041, 0.089, ...]'::vector, '{"category": "frontend", "tags": ["react", "rsc"]}' ); -- Semantic search: find 5 most similar chunks SELECT bc.content, p.title AS post_title, 1 - (bc.embedding <=> $1::vector) AS similarity FROM blog_chunks bc JOIN posts p ON p.id = bc.post_id WHERE 1 - (bc.embedding <=> $1::vector) > 0.7 ORDER BY bc.embedding <=> $1::vector LIMIT 5;

// Semantic chunking with overlap function chunkDocument(text: string, options = { maxChunkSize: 800, // tokens overlapSize: 100, // tokens of overlap between chunks separator: '\n\n', // Split on paragraphs first }): string[] { const { maxChunkSize, overlapSize, separator } = options; // Step 1: Split by semantic boundaries (paragraphs) const paragraphs = text.split(separator); const chunks: string[] = []; let currentChunk = ''; for (const paragraph of paragraphs) { const combined = currentChunk ? currentChunk + separator + paragraph : paragraph; if (estimateTokens(combined) > maxChunkSize && currentChunk) { chunks.push(currentChunk.trim()); // Start next chunk with overlap from end of current chunk const words = currentChunk.split(' '); const overlapWords = words.slice(-Math.floor(overlapSize / 1.3)); currentChunk = overlapWords.join(' ') + separator + paragraph; } else { currentChunk = combined; } } if (currentChunk.trim()) { chunks.push(currentChunk.trim()); } return chunks; } function estimateTokens(text: string): number { return Math.ceil(text.length / 4); // Rough estimate: 1 token ≈ 4 chars } // Usage const blogContent = await readFile('react-server-components.md'); const chunks = chunkDocument(blogContent); // Embed each chunk separately for (const chunk of chunks) { const embedding = await getEmbedding(chunk); await db.query( 'INSERT INTO blog_chunks (content, embedding) VALUES ($1, $2)', [chunk, JSON.stringify(embedding)] ); }

import { createHash } from 'crypto'; import OpenAI from 'openai'; const openai = new OpenAI(); interface DocumentSource { id: string; title: string; content: string; category: string; tags: string[]; updatedAt: Date; } async function ingestDocuments(documents: DocumentSource[]) { for (const doc of documents) { // 1. Check if already indexed (idempotency) const contentHash = createHash('sha256').update(doc.content).digest('hex'); const existing = await db.query( "SELECT id FROM blog_chunks WHERE metadata->>'sourceId' = $1 AND metadata->>'hash' = $2", [doc.id, contentHash] ); if (existing.rows.length > 0) { console.log(`Skipping ${doc.title} — unchanged`); continue; } // 2. Delete old chunks for this document await db.query("DELETE FROM blog_chunks WHERE metadata->>'sourceId' = $1", [doc.id]); // 3. Chunk the document const chunks = chunkDocument(doc.content); // 4. Embed all chunks in batch (up to 2048 per API call) const response = await openai.embeddings.create({ model: 'text-embedding-3-small', input: chunks, }); // 5. Store chunks with metadata for (let i = 0; i < chunks.length; i++) { await db.query( `INSERT INTO blog_chunks (content, embedding, metadata) VALUES ($1, $2::vector, $3)`, [ chunks[i], JSON.stringify(response.data[i].embedding), JSON.stringify({ sourceId: doc.id, title: doc.title, category: doc.category, tags: doc.tags, chunkIndex: i, hash: contentHash, }), ] ); } console.log(`Indexed ${doc.title}: ${chunks.length} chunks`); } }

-- Hybrid search: vector + full-text in a single query WITH vector_results AS ( SELECT id, content, 1 - (embedding <=> $1::vector) AS vector_score, NULL::float AS text_score FROM blog_chunks ORDER BY embedding <=> $1::vector LIMIT 20 ), text_results AS ( SELECT id, content, NULL::float AS vector_score, ts_rank(to_tsvector('english', content), plainto_tsquery('english', $2)) AS text_score FROM blog_chunks WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $2) LIMIT 20 ), combined AS ( SELECT id, content, COALESCE(vector_score, 0) * 0.7 + COALESCE(text_score, 0) * 0.3 AS hybrid_score FROM ( SELECT * FROM vector_results UNION ALL SELECT * FROM text_results ) all_results ) SELECT DISTINCT ON (id) id, content, hybrid_score FROM combined ORDER BY id, hybrid_score DESC LIMIT 10;

What Are Vector Databases and Why They Matter

Key Takeaways

pgvector: Vector Search in PostgreSQL

Key Takeaways

Chunking Strategies for Documents

Production Pipeline: Ingest, Embed, Index

Key Takeaways

Hybrid Search: Combining Vector + Full-Text

Key Takeaways

Key Takeaways

Related Knowledge

Rust for Systems Programming

Go Concurrency in Practice

Understanding Closures in JavaScript: The Complete 2026 Guide

React 19 Server Components: The Definitive 2026 Guide

Next.js 15 App Router Masterclass: Everything You Need to Know

What Are Vector Databases and Why They Matter

Key Takeaways

pgvector: Vector Search in PostgreSQL

Key Takeaways

Chunking Strategies for Documents

Production Pipeline: Ingest, Embed, Index

Key Takeaways

Hybrid Search: Combining Vector + Full-Text

Key Takeaways

Key Takeaways

Related Knowledge

Rust for Systems Programming

Go Concurrency in Practice

Understanding Closures in JavaScript: The Complete 2026 Guide

React 19 Server Components: The Definitive 2026 Guide

Next.js 15 App Router Masterclass: Everything You Need to Know