You are past the micro stage. You have a catalog, a knowledge base, a few hundred pages, or all three. Your site search returns nothing useful, your support inbox answers the same questions your docs already cover, and someone on the team keeps saying "we should add AI search." You do not need a data team to do this. You need hybrid search on the Postgres database you probably already run, and one developer who can own it. Here is the whole build at your altitude.
The reason this is now a one-developer job and not a machine-learning project: Supabase ships pgvector inside Postgres, so your embeddings live in the same database as your content. No separate vector store to sync, secure, and bill. The internal-search post makes the full case that this replaces Elasticsearch for most companies your size, and it does.
Why keyword search alone stopped being enough
Classic search matches words. A customer searches "can I get my money back" and your page says "refund policy," so the search returns nothing, because none of the words match. Semantic search matches meaning: it understands that "get my money back" and "refund" are the same intent. That is the upgrade everyone wants.
But semantic search has its own blind spot. It is great at meaning and bad at exact strings. Search a product code, a person's name, or a precise SKU, and a pure-vector search will happily return things that are semantically nearby and factually wrong. This is why the answer is not "replace keyword search with semantic search." It is hybrid: run both and combine them.
The three-layer picture
- Keyword search: Postgres full-text search using `tsvector` and `websearch_to_tsquery`. Exact, fast, and already in your database. This catches names, codes, and precise phrases.
- Semantic search: embeddings in a `vector` column, ranked by cosine distance. This catches meaning, synonyms, and questions phrased in ways you never wrote down.
- Hybrid search: fuse the two rankings so a result that both rankings like rises to the top. This is the version your users will actually feel is good.
Step one: store content with both indexes
One table holds your content, a full-text index for keyword search, and a vector column for semantic search. Supabase documents the vector column setup in detail; here is the shape at your scale.
create extension if not exists vector;
create table documents (
id bigint generated always as identity primary key,
title text,
url text,
content text,
fts tsvector generated always as (to_tsvector('english', content)) stored,
embedding vector(384)
);
create index on documents using gin(fts);
create index on documents using hnsw (embedding vector_cosine_ops);The `fts` column is generated automatically from your content, so keyword search stays current with zero extra work. The `embedding` column you populate with an embedding job, which is the only moving part you own.
Step two: the embeddings pipeline one dev can run
Embeddings are generated when content changes, not per search. Run the job in a Supabase Edge Function on a schedule or a database trigger. Use the open-source gte-small model (384 dimensions) to start: it is free to run and good enough for most internal and site search. Move to a hosted model like OpenAI's text-embedding-3-small only when you have measured a quality gap that matters. The Edge Functions post covers where this job belongs.
// supabase/functions/embed-documents/index.ts
// Runs on a schedule. Embeds anything missing an embedding.
import { createClient } from "jsr:@supabase/supabase-js@2";
const db = createClient(Deno.env.get("SUPABASE_URL")!, Deno.env.get("SERVICE_ROLE")!);
const session = new Supabase.ai.Session("gte-small");
Deno.serve(async () => {
const { data: rows } = await db.from("documents")
.select("id, content").is("embedding", null).limit(50);
for (const row of rows ?? []) {
const embedding = await session.run(row.content, { mean_pool: true, normalize: true });
await db.from("documents").update({ embedding }).eq("id", row.id);
}
return new Response("ok");
});Step three: the hybrid ranking function
Hybrid search runs the keyword ranking and the semantic ranking, then fuses them. The standard fusion is Reciprocal Rank Fusion: each result gets a score based on its position in each ranking, and the scores add up, so a result that both rankings rank highly wins. Supabase publishes a ready hybrid function; this is the shape of it.
create or replace function hybrid_search(
query_text text,
query_embedding vector(384),
match_count int default 10,
rrf_k int default 50
)
returns setof documents
language sql
as $$
with fts as (
select id, row_number() over (
order by ts_rank_cd(fts, websearch_to_tsquery(query_text)) desc) as rank
from documents
where fts @@ websearch_to_tsquery(query_text)
limit match_count * 2
),
vec as (
select id, row_number() over (
order by embedding <=> query_embedding) as rank
from documents
order by embedding <=> query_embedding
limit match_count * 2
)
select d.*
from fts full outer join vec on fts.id = vec.id
join documents d on d.id = coalesce(fts.id, vec.id)
order by
coalesce(1.0 / (rrf_k + fts.rank), 0) +
coalesce(1.0 / (rrf_k + vec.rank), 0) desc
limit match_count;
$$;You call this from your app with the user's raw query text and its embedding. One round trip, both rankings, fused. No separate search cluster, no sync job between your database and a search index, no second system to secure.
Measuring whether your search is any good
You cannot improve what you do not measure, and search quality is easy to eyeball wrong. A demo query that works proves nothing. Build a small evaluation set instead: real questions paired with the passage that should win, run on every change, scored as one number you can watch move.
- Collect 30 to 50 real questions from support tickets and your search logs. Use the phrasing people actually type, not the questions you wish they asked.
- For each question, note the passage on your site that correctly answers it. That is your answer key.
- Run the set through `hybrid_search` and record whether the right passage lands in the top three results.
- Track the hit rate as a single number. When you change chunking, the model, or the fusion, rerun the set and compare. A change that drops the number is a regression no matter how good the demo looked.
Thirty questions is enough to catch the regressions that matter and cheap enough that one developer maintains it in an afternoon. It is the single highest-leverage thing you can build after the search itself, because it turns tuning from guesswork into a number.
Keeping embeddings fresh as content changes
Content changes. A page gets edited, a product gets added, a policy gets rewritten. An embedding is a snapshot of the text at the moment you generated it, so stale embeddings quietly degrade search until the day someone notices the results are wrong. The fix is to re-embed on change, not on a heroic quarterly rebuild.
Track a content hash or an `updated_at` per row. Your embedding job embeds anything whose content changed since its embedding was written, and nothing else. On a database trigger or a short schedule, that keeps the index current for pennies, because you only ever re-embed what actually moved. Never let the embedding and the content drift apart. A search that confidently returns last quarter's price is worse than a search that returns nothing.
The support-deflection math
Here is how to justify the build to whoever signs off on it. Count the repetitive questions your team answers by hand that your content already covers. If support fields 400 of those a month and each takes eight minutes, that is over 50 hours a month spent retyping answers that already live on your own site.
Wire good retrieval into a help widget or an answer box and you deflect a meaningful share of them. Even a conservative 30 percent is 15-plus hours back every month, for a one-time build and a 25-dollar database. That is the number that gets the project approved: not "AI search is cool," but "this pays for itself in saved support hours inside the first quarter." Unlike hiring, the deflection scales with traffic without scaling cost. If a grounded assistant is the endpoint, size it against the honest tradeoffs in AI chat for customer service.
When to move off gte-small
Start with gte-small because it is free and good enough to prove the value. You move to a hosted model like OpenAI's text-embedding-3-small when your evaluation set shows a quality gap that matters, not before. The signal is concrete: questions where a human can see the right answer exists, but retrieval ranks it low, consistently, after you have already fixed the chunking. That is a model-quality ceiling, and a stronger embedding model raises it.
The move is not a rewrite. Swap the model in the embedding job, re-embed your content once as a background backfill (not downtime), and set the vector column dimensions to match the new model. Everything downstream, the hybrid function and the app code, stays the same. Because embeddings are generated on content change and not per query, even a paid model costs little for a corpus that does not churn daily. Size the decision to your evaluation numbers, not to a vendor's pitch.
The failure modes to watch for
- Embedding a whole page as one vector. The match gets vague and the returned passage is too big to use. Chunk by section.
- Mixing embedding models. Vectors from two models are not comparable. If you change models, re-embed everything and never query across the two.
- Skipping keyword search. Pure semantic search fumbles exact strings, product codes, and names. Hybrid exists for exactly this reason.
- No evaluation set. Without it, every tuning change is a guess and regressions ship silently.
- Letting embeddings go stale. Re-embed on change, or your search slowly starts lying with confidence.
Wiring it into your product
Retrieval is a database call. The product is what you do with the results, and three placements cover most of the value. A site-search box that returns ranked passages with a link to the source. An answer box on high-intent pages that shows the single best passage inline, so the visitor gets the answer without a click. And a help widget that runs the same query behind the scenes and, at the top tier, composes a grounded reply with citations.
Keep retrieval and presentation separate. One function returns passages; the UI decides whether to show a list, a single answer, or a generated reply. That separation is what lets one developer own the whole thing: the correctness-critical part is one SQL function with an evaluation set behind it, and everything else is display code you can change without touching search quality.
Start with the search box. It is the lowest-risk placement, it improves the moment retrieval is good, and it gives you real query logs that feed the evaluation set. Add the answer box on high-intent pages next, and reserve the generated assistant for last, once the box has proven the retrieval holds up on real traffic and your content is clean enough to ground an answer. Shipping in that order means every step is backed by evidence from the one before it.
The cost math at your scale
This is the part that surprises teams. A Supabase Pro project is 25 dollars a month and covers the storage and compute a few hundred thousand chunks and normal query volume will use. gte-small embeddings run on open-source models, so the embedding job has no per-token bill. Compare that to a managed search service at a few hundred dollars a month plus the engineering time to keep it in sync with your primary database.
If you move to hosted embeddings later, text-embedding-3-small is a fraction of a cent per thousand tokens, and you only pay when content changes, not per search. For a company your size, the all-in cost of production hybrid search is closer to a streaming subscription than an enterprise contract.
RAG for support, done carefully
Once retrieval is good, a grounded support assistant is a small step: retrieve the top passages with `hybrid_search`, hand them to an LLM, and ask it to answer using only those passages with links back to the source. The retrieval you already built is 90 percent of the work. But do not ship it until the retrieval quality is proven, because a confident wrong answer is worse than a search box. We wrote the honest version of when this pays off in the RAG-for-SMB post and the customer-service tradeoffs in AI chat for customer service.
The AEO dividend
The same content structure that makes your hybrid search sharp is the structure that makes answer engines cite you. Chunked passages under clear headings, each answering one question, are what your retrieval ranks highest and what ChatGPT and Google's AI Overviews reach for. So this build pays twice: better on-site search for your visitors, and better legibility to the engines that decide who gets cited. The strategy layer is in the AEO playbook.
If you want this built and handed off clean, or reviewed before you commit a sprint to it, book a consultation. The lighter version for a smaller shop is AI search for micro businesses; the version with governance, index tuning, and scale is AI search for mid-size companies.