Embeddings for Internal Search: The Pattern That Replaces ElasticSearch for Most SMB Sites

Most SMB sites either skip internal search (the user has to navigate manually) or run a clunky keyword-based search that misses everything except exact matches. Vector embeddings + Supabase + a small Next.js search component changes that for $0 of new infrastructure. The user can search for ‘how much does a kitchen remodel cost in Plano’ and find the post titled ‘What Custom Kitchens Run in DFW’ because the embeddings understand it’s the same intent. Here’s the setup.

#Why this matters more than people think

Internal site search produces 2-4x the conversion rate of regular browsing on sites that have it. Users who search are higher-intent — they’re looking for something specific. If your search is bad, those users leave to Google and may not come back. If your search is good, they find what they wanted and convert.

#What ‘semantic search’ adds over keyword search

Keyword search matches the exact tokens the user typed. ‘kitchen remodel cost’ matches pages containing those words. ‘how much for a new kitchen’ matches different pages, even though the intent is identical. Semantic search via embeddings understands both queries point to the same intent and returns the same results.

#The architecture

On content publish: split each page into 300-600 word chunks. Embed each chunk. Store in Postgres with the source URL.
On user search: embed the query. Vector-similarity-search against the chunks. Group results by source URL, take the best chunk per URL.
Render: show the top 8 URLs, with the best-matching chunk as a snippet preview.

#Schema

create extension if not exists vector;

create table search_chunks (
  id uuid primary key default gen_random_uuid(),
  site_id text references sites(id) not null,
  url text not null,
  title text not null,
  chunk_text text not null,
  embedding vector(1024),
  created_at timestamptz default now()
);

create index on search_chunks using hnsw (embedding vector_cosine_ops);
create index on search_chunks(site_id);

#Indexing the content

Build an indexing script that runs at deploy time. For FH-style sites where content lives in TypeScript (the static-generation pattern), walk the post and page data structures, chunk each one, embed each chunk, upsert into search_chunks.

// scripts/index-search.ts
import { POSTS } from "@/lib/blog/posts";
import { SERVICES } from "@/app/components/home/data";
import { embed } from "./embed";
import { supabaseAdmin } from "./supabase";

async function indexAll(siteId: string) {
  await supabaseAdmin.from("search_chunks").delete().eq("site_id", siteId);
  
  for (const post of POSTS) {
    const text = postToPlainText(post);
    const chunks = chunkByHeadings(text, 500);
    for (const chunk of chunks) {
      const embedding = await embed(chunk);
      await supabaseAdmin.from("search_chunks").insert({
        site_id: siteId,
        url: `/blog/${post.slug}`,
        title: post.title,
        chunk_text: chunk,
        embedding,
      });
    }
  }
}

#Search server action

// app/search/actions.ts
"use server";
import "server-only";
import { embed } from "@/lib/ai/embed";
import { supabase } from "@/lib/supabase/server";

export async function search(query: string, siteId: string) {
  if (query.length < 2) return [];
  const queryEmbedding = await embed(query, { inputType: "query" });
  const { data } = await supabase.rpc("match_search_chunks", {
    site_id_filter: siteId,
    query_embedding: queryEmbedding,
    match_count: 20,
  });
  if (!data) return [];
  // Group by URL, take best chunk per URL
  const byUrl = new Map<string, typeof data[number]>();
  for (const row of data) {
    if (!byUrl.has(row.url) || row.similarity > byUrl.get(row.url)!.similarity) {
      byUrl.set(row.url, row);
    }
  }
  return [...byUrl.values()].slice(0, 8);
}

#Hybrid: keyword + vector

Pure vector search misses exact title matches (a user typing the exact post title doesn’t need semantic search, they need keyword match). Combine: run both BM25 keyword search (via Postgres ts_vector) and vector search, then reciprocal-rank-fusion them. Code is ~30 lines, quality improvement is meaningful.

#Latency budget

Embed the query: 50-150ms. Postgres vector search: 10-40ms. Total: ~100-200ms. Acceptable for an as-you-type search box. If you want faster, batch + debounce on the client (don’t fire on every keystroke, fire 300ms after the last keystroke).

#Cost

Embedding generation: ~$0.10 per million tokens with Voyage’s voyage-3-large or OpenAI’s text-embedding-3-small. Indexing the entire FH blog (60+ posts, ~150 chunks at 500 tokens each) costs ~$0.01. Per-query cost: ~$0.0001. For SMB volumes (thousands of searches per month), monthly cost is negligible.

#Chunking strategy for blog content

Use the post’s H2 structure. Each H2 section becomes a chunk. Pre-pend the post title to each chunk so the embedding has document-level context. Result: chunks are semantically coherent (single topic per chunk) and search results land users on the relevant section of the post.

#Search UI

Two patterns. (1) Modal with search field + result list (Cmd+K style). (2) Dedicated /search page with persistent URL. We default to modal for FH client blogs because it doesn’t require navigation and feels fast.

#Common mistakes

Embedding documents and queries with different models. Use the same model for both, or use an asymmetric model with the right inputType.
Storing only one embedding per document. Long documents have multiple topics; chunking captures all of them.
Forgetting to re-index after content changes. Build a CI step that re-indexes on every deploy.
Skipping the title in the chunk. Without it, sections lose their document context and rank worse.

#How this lands across FH client work

Two FH client sites have semantic search live. Both are content-heavy (a 200-post blog and a 600-page documentation site). User search → click-through rate is 32% higher than navigation-only conversions. Total infrastructure cost: zero — runs on existing Supabase. If your site has 50+ content pages and either no search or bad search, book a consultation — the implementation is a 3-day engagement that ships a real search experience.