Every client conversation now has a new question buried in it: "why aren't we showing up in AI answers?" ChatGPT, Perplexity, Google's AI Overviews, and Claude are answering the queries that used to send a click to your client's site. The retrieval that decides who gets cited runs on embeddings and semantic matching. That is the exact machinery Supabase's AI and Vectors stack gives you for building search. For an agency, that overlap is a service line. This is how we package it.
The pitch to a client is not "we'll add a chatbot." It is "we'll make your content the thing an answer engine reaches for, and we'll prove it with the same retrieval math the engines use." That framing sells because it maps to a business outcome the client already worries about: being cited, being found, and keeping the pipeline that AI search is quietly draining. If you want the strategy layer behind that framing, we wrote it up in the AEO playbook and the reality check in AI Overviews and zero-click search.
The five primitives, read as service components
Supabase documents six or seven AI features. Stop reading them as a database menu. Read them as the parts of a service you deliver.
- Semantic search: match content by meaning, not exact words. This is what lets a client's page answer a question phrased in a way no one on their team ever wrote down.
- Keyword search: Postgres full-text search over the same content. Cheap, exact, and still the right tool for names, SKUs, and precise phrases.
- Hybrid search: fuse the two so you stop losing the queries that pure-semantic or pure-keyword each miss. This is the quality tier clients feel.
- Vector columns: store embeddings next to the content in Postgres. No separate vector database to run, bill, or explain.
- Embeddings and RAG: turn text into vectors and retrieve the right passages to feed an answer. This is the engine under a support assistant or an on-site answer box.
A client does not buy "pgvector." They buy "our knowledge base answers questions in our own words, on our own site, and the answer engines can read it." Your job is to assemble the primitives into that outcome and maintain it. The primitives are commoditized. The assembly and the judgment are what you charge for.
The delivery model: one project, many tenants
The reason this works as a service line and not a series of expensive custom builds is multi-tenancy. You do not spin up a Supabase project per client. You run one project and isolate each client by a `site_id` scope enforced with Row Level Security. We run the full pattern across the Frontend Horizon client book, and it is written up in the RLS multi-tenant post. One project. One migration story. One place to ship an improvement to every client at once.
Store each client's content chunks in a shared table, scoped by tenant. The embedding column lives right beside the text.
create extension if not exists vector;
create table doc_chunks (
id uuid primary key default gen_random_uuid(),
site_id text references sites(id) not null,
url text not null,
heading text,
content text not null,
embedding vector(384), -- gte-small: 384 dims, open source, free
created_at timestamptz default now()
);
alter table doc_chunks enable row level security;
create policy "read own site" on doc_chunks
for select using (site_id = (auth.jwt() ->> 'site_id'));
create index on doc_chunks
using hnsw (embedding vector_cosine_ops);The retrieval function ranks chunks by cosine distance and filters to the caller's tenant. RLS makes the tenant filter impossible to bypass, so you can put every client in one table without one client ever seeing another's content.
create or replace function match_chunks(
query_embedding vector(384),
site text,
match_count int default 8
)
returns table (url text, heading text, content text, similarity float)
language sql stable
as $$
select url, heading, content,
1 - (embedding <=> query_embedding) as similarity
from doc_chunks
where site_id = site
order by embedding <=> query_embedding
limit match_count;
$$;Package it as three tiers
Sell outcomes at three price points. The primitives are the same underneath. What changes is scope and the maintenance you commit to.
- Legibility (entry tier). No vector database at all. You structure the client's content the way a retrieval system wants to read it: clean headings, self-contained answers, schema markup, an FAQ block per key page. This is pure AEO groundwork and it is the tier most clients should start on. See schema markup for the structured-data half.
- On-site answers (middle tier). Semantic plus hybrid search over the client's own content, wired into their site search and an answer box. This is where embeddings and hybrid ranking earn their keep. The client's visitors get answers in the client's words instead of bouncing to an AI engine.
- Assisted support (top tier). RAG: retrieve the right passages and let an LLM compose a grounded answer with citations back to the client's pages. This is the highest-touch tier and the one with the most ways to go wrong, so price the maintenance honestly.
The AEO connection you get to charge for
Here is the part that turns a search build into a marketing service. An answer engine ranks your client's content by chunking it, embedding each chunk, and retrieving the chunks that best match a user's question. That is the same pipeline you just built. So when you make a client's content retrievable for their own on-site search, you are also making it retrievable for the engines. The work compounds.
Concretely: content that scores well against your own `match_chunks` function is content structured the way answer engines want. Short, self-contained passages under clear headings. One question answered per block. No answer buried three scrolls into a wall of text. You can run a client's page through your own embeddings pipeline and show them, with a similarity score, which sections an engine would surface and which are invisible. That report is a deliverable. Clients pay for evidence they can see.
The embeddings pipeline, run once per client
You generate embeddings when content changes, not on every request. Run the job in a Supabase Edge Function triggered on content publish, or on a schedule. Use an open-source model like gte-small (384 dimensions) to keep the free tier free, or move to a hosted model when a client's volume justifies it. The internal-search post covers the model tradeoffs in depth, and the Edge Functions post covers where this job should live.
// Chunk on publish, embed, upsert. One function serves every tenant.
import { pipeline } from "@huggingface/transformers";
const embed = await pipeline("feature-extraction", "Supabase/gte-small");
export async function indexPage(siteId: string, url: string, sections: Section[]) {
for (const s of sections) {
const output = await embed(s.text, { pooling: "mean", normalize: true });
await admin.from("doc_chunks").upsert({
site_id: siteId,
url,
heading: s.heading,
content: s.text,
embedding: Array.from(output.data),
});
}
}The margin math
This is why it is a service line and not a loss leader. One Supabase Pro project at 25 dollars a month carries the whole client book for the storage and request volume a marketing-content index generates. gte-small embeddings run on open-source models with no per-token API bill. Your cost to add a client is close to the cost of the indexing job and your time to structure their content.
So the recurring revenue is almost pure margin once the pipeline exists. You are not reselling someone else's API at a markup. You are charging for assembly, judgment, and the ongoing work of keeping each client's content clean and retrievable as it changes. Price the legibility tier as a fixed setup plus a monthly retainer. Price the RAG tier higher because the maintenance is real.
What clients ask before they buy
Three questions come up in every sales conversation, and how you answer them decides whether you sound like a partner or a vendor selling magic beans. Have the honest answers ready.
"Will this get us into ChatGPT and AI Overviews?" The honest answer: you make their content the kind an engine reaches for, and you measure retrievability directly, but no one controls the engines' models. You are selling the input you own, proven with numbers, not a guaranteed placement. Clients respect that answer because everyone selling the guaranteed version is lying, and the client usually knows it.
"Do we need a chatbot?" Usually not first, and often not at all. A chatbot is a generation layer on top of retrieval. If the retrieval is not good and the content is not clean, the chatbot amplifies the problem in front of customers. Sell the retrieval and the legibility first. The chatbot is an upsell you earn after the foundation is proven, not the opening move.
"Is our content even good enough?" This is the question you can answer with evidence instead of opinion. Run their pages through your embeddings pipeline and show them, section by section, what an engine would surface and what is invisible. Most clients have never seen their content scored this way. The report reframes the whole engagement from "trust us" to "here is the gap, here is the fix."
The retrievability report is your flagship deliverable
The single artifact that closes deals and renews retainers is a report that scores a client's existing content the way an answer engine would. It is cheap to produce once your pipeline exists, and no competitor selling generic SEO is handing the client anything like it. This is the thing you demo in the pitch.
Build it from the retrieval you already run. For a set of target questions the client cares about, embed each question, run it against the client's indexed content, and record which passage wins and how strongly. The output is a ranked, per-question view of what the client would get cited for and where they are silent.
- Per-question coverage: for each question a customer might ask, the best-matching passage on the client's site and its similarity score. Low scores are content gaps you get paid to fill.
- Orphaned answers: strong content that no clear question maps to, usually because the heading is vague or the answer is buried mid-paragraph. These are quick wins.
- Cannibalization: multiple weak passages competing for the same question, where one strong consolidated answer would win instead. This is a restructuring recommendation.
- A prioritized fix list: the ten content changes that move the most questions from invisible to retrievable, ordered by effort against impact.
That report is the difference between billing for "SEO work" and billing for a measured, defensible outcome. It gives the client something to approve, gives you a scope that is not open-ended, and gives the next quarter's retainer an obvious agenda. Rerun it monthly and the delta is your progress report.
Staffing the service line
The reason agencies fail to productize this is they treat every client as a from-scratch engineering project. Split the work by who is cheapest to do it well, and the per-client cost drops to something a retainer covers.
- One engineer owns the shared spine: the project, the tenant-scoped tables, RLS, the indexing function, and the hybrid search function. This is built once and touched rarely. It is not per-client work.
- A content strategist owns the per-client work: structuring pages, writing FAQ blocks, running the retrievability report, and turning its fix list into a monthly content plan. This is where the recurring hours actually go, and it does not require an engineer.
- An account lead owns the framing: translating retrievability scores into business language, setting expectations honestly, and tying the top tier to the client's content-freshness commitment.
The moment the content strategist can run the report and the fix cycle without pulling the engineer in, you have a service line. Until then you have a consulting practice that does not scale past the one person who understands the code.
A worked example: the buried answer
A professional-services client had a strong pricing explanation, but it was three paragraphs deep on a page titled "Our Approach," wrapped in context about their philosophy. Their own semantic search scored it poorly against the question "how much does this cost," because the passage that matched was diluted by everything around it. An answer engine would skip it for the same reason.
The fix was not new content. It was structure: a heading that was the actual question, a direct answer in the first sentence, and the philosophy moved below as supporting context. The similarity score against the pricing question went from the bottom of the results to the top, and within weeks the page started appearing in AI answers for cost questions in their category. Same information, retrievable shape. That is the entire job, repeated across a client's content, and it is what you are charging a retainer to keep doing as their content grows. It is also the exact discipline behind content that actually ranks for professional services firms.
Where to draw the line
- Do not promise rankings in AI answers. You control legibility and retrievability. You do not control the engines' models. Sell the input you own, measured, and be honest about the output you don't.
- Do not ship a RAG assistant on content the client won't maintain. A support bot grounded in a stale knowledge base is worse than no bot. Tie the top tier to a content-freshness commitment or don't sell it.
- Do not run a separate vector database per client to look sophisticated. Postgres with pgvector inside the Supabase project you already run is simpler to operate and cheaper to bill. Complexity you can't maintain is a liability with a monthly cost.
The 90-day rollout
Ship the shared spine in the first month: one project, the tenant-scoped chunk table, RLS, the indexing function, and the match function. Onboard two pilot clients on the legibility tier in month two, structure their content, and produce the retrievability report. Add on-site semantic search for the pilot that has the cleanest content in month three, measure the deflection and the on-site answer rate, and use that number to sell the next cohort. The same staged cadence we use for search engagements is in how we run a 90-day engagement.
This is the platform posture Frontend Horizon runs under the name Elevi: one system, many client surfaces, the assembly owned centrally. If you are trying to turn AI search into a service your team can actually deliver at scale instead of a one-off build that eats a quarter, book a consultation. And if you are sizing this for a specific client, the audience-matched versions are next: micro businesses, small and growing teams, and mid-size companies.