Tuning search relevance
Relevance is not a setting you flip — it’s a loop. This guide is the technique we use to tune search with samesake, written up from a real session where “black dress with white” kept returning red dresses. Every step here came from a concrete fix.
query → explain → diagnose → fix (data or config) → re-measure → repeat1. Diagnose before you tune — searchExplain
Section titled “1. Diagnose before you tune — searchExplain”Never guess why a result ranked where it did. matcher.searchExplain(...) returns, per hit, the
rank from each retrieval leg (FTS, cosine, spaces) and the merged RRF score, plus the parsed NLQ
and the compiled filter SQL.
const r = await matcher.searchExplain(project, collection, { q: "black dress with white" });r.docs.forEach((d) => console.log(d.id, { fts: d.fts_rank, cos: d.cosine_rank, spc: d.spaces_rank }));console.log(r.parsed, r.constraintTrace.appliedFilters, r.filters.sql);In our case the explain showed fts = null for every hit (no title contained “black”/“white”)
and the order was driven entirely by the embedding — which, to the model, was mostly “dress.”
That single output told us colour wasn’t a signal at all. Measure the cause, then fix it.
2. Data quality dominates everything
Section titled “2. Data quality dominates everything”We audited the colour field: 17 of 27 products had no colour at all — they’d been guessed from the title with a heuristic, and most titles (“Princess Line Dress with Belt”) have none. No amount of weight-tuning fixes absent data. Garbage in, “red dress for a black query” out.
3. Enrich attributes at the source (don’t guess from text)
Section titled “3. Enrich attributes at the source (don’t guess from text)”The fix was to read the attributes off the product images with samesake’s enrich pipeline —
a multimodal stage that calls your generate with the image + a schema and writes structured
fields into enriched:
import { z } from "zod";
enrich: pipeline( stage("vision", { model: "your-vision-model", images: (ctx) => (ctx.data.image_url ? [String(ctx.data.image_url)] : []), prompt: () => "Describe this product's colours and pattern as JSON.", schema: () => z.object({ color_text: z.string(), pattern: z.string().optional() }), }))The schema callback takes a zod schema or a plain JSON Schema object — samesake
converts zod to JSON Schema and hands it to your generate. (The same goes for a
constrained NLQ schema.) Provider-dialect mapping — e.g. Gemini’s responseSchema
vs responseJsonSchema — stays in your generate function.
Run with matcher.enrich(project, collection). Colours went from mostly-empty to accurate
(“RED PUFF SLEEVE MAXI DRESS” → solid red). See the full pipeline.
4. Compose what you embed
Section titled “4. Compose what you embed”An embedding only knows what’s in the text it was built from. Pull the attributes that matter
into the doc embedding’s source:
embeddings: { doc: { source: "$title $brand $category $enriched.color_text $enriched.pattern", model: "...", dim: 1536 } }Now “black dress” cosine-matches products whose embedded text actually says black — the black dress jumped from buried to #1.
5. Hard filters vs soft signals — and keep NLQ in its lane
Section titled “5. Hard filters vs soft signals — and keep NLQ in its lane”This is the subtlest lever:
- Hard filters for strict, well-populated constraints:
price ≤ 5000,available = true. These should gate the result set — that’s the “hard filters stay hard” promise. - Soft fields (
f.text({ soft: true })) for sparse or fuzzy attributes: a missing colour tag shouldn’t empty your results. samesake relaxes soft filters when too few rows match. - Constrain NLQ so it can’t turn a fuzzy word into a hard filter on a sparse field. We gave
NLQ a schema of just
{ semantic_query, max_price }— so “black dress” never compiles tocolor = 'black'(which had dead-ended at the 2 literally-tagged rows). Colour is left to the embedding + visual signals instead.
6. Multimodal & cross-modal signals
Section titled “6. Multimodal & cross-modal signals”With a multimodal embedding model, an image space gives you three retrieval modes over one index: text→text (doc cosine), text→image (a text query embedded into the image space), and image→image (find-similar / search-by-image). Colour/pattern intent that text barely encodes is far stronger image→image — so “find similar” is where the visual space earns its keep, while text→image mainly adds category/shape sense. Know which mode a query needs.
7. Tune query-time weights last
Section titled “7. Tune query-time weights last”Channel weights (Channels.fts({ weight }), Channels.cosine, Channels.spaces) and
defaultSpaceWeights rescale the RRF mix without reindexing. Reach for these only after the
data and signals are right — re-weighting noise just reshuffles noise. Per-query weights let you
push visual vs intent vs price differently for, say, a “similar look” surface vs a keyword box.
8. Measure, and respect corpus size
Section titled “8. Measure, and respect corpus size”Tune against a fixed query set, not vibes — see Eval from search snapshots for relevance@k + constraint compliance. And be honest about scale: at ~30 products colour is a weak discriminator no matter what, because the embedding is dominated by category. Real relevance wins need both clean attributes and a catalog big enough to disambiguate.
Running enrichment in the background
Section titled “Running enrichment in the background”enrich, index, and ingest route their work through a JobRunner (ctx.jobs.run). The
default runs inline; pass jobs: createPgBossRunner({ connectionString }) from
@samesake/jobs-pgboss to back them with a
pg-boss queue for durability and concurrency control:
import { createPgBossRunner } from "@samesake/jobs-pgboss";const matcher = createMatcher({ /* ... */, jobs: await createPgBossRunner({ connectionString: process.env.DATABASE_URL! }) });The loop, in one line
Section titled “The loop, in one line”Explain → fix the data → compose the embedding → set hard/soft correctly → constrain NLQ → tune weights → measure. In that order.