Tuning search relevance

Relevance is not a setting you flip — it’s a loop. This guide is the technique we use to tune search with samesake, written up from a real session where “black dress with white” kept returning red dresses. Every step here came from a concrete fix.

query → explain → diagnose → fix (data or config) → re-measure → repeat

1. Diagnose before you tune — `searchExplain`

Never guess why a result ranked where it did. matcher.searchExplain(...) returns, per hit, the rank from each retrieval leg (FTS, cosine, spaces) and the merged RRF score, plus the parsed NLQ and the compiled filter SQL.

const r = await matcher.searchExplain(project, collection, { q: "black dress with white" });
r.docs.forEach((d) => console.log(d.id, { fts: d.fts_rank, cos: d.cosine_rank, spc: d.spaces_rank }));
console.log(r.parsed, r.constraintTrace.appliedFilters, r.filters.sql);

In our case the explain showed fts = null for every hit (no title contained “black”/“white”) and the order was driven entirely by the embedding — which, to the model, was mostly “dress.” That single output told us colour wasn’t a signal at all. Measure the cause, then fix it.

2. Data quality dominates everything

We audited the colour field: 17 of 27 products had no colour at all — they’d been guessed from the title with a heuristic, and most titles (“Princess Line Dress with Belt”) have none. No amount of weight-tuning fixes absent data. Garbage in, “red dress for a black query” out.

3. Enrich attributes at the source (don’t guess from text)

The fix was to read the attributes off the product images with samesake’s enrich pipeline — a multimodal stage that calls your generate with the image + a schema and writes structured fields into enriched:

import { z } from "zod";

enrich: pipeline(
  stage("vision", {
    model: "your-vision-model",
    images: (ctx) => (ctx.data.image_url ? [String(ctx.data.image_url)] : []),
    prompt: () => "Describe this product's colours and pattern as JSON.",
    schema: () => z.object({ color_text: z.string(), pattern: z.string().optional() }),
  })
)

The schema callback takes a zod schema or a plain JSON Schema object — samesake converts zod to JSON Schema and hands it to your generate. (The same goes for a constrained NLQ schema.) Provider-dialect mapping — e.g. Gemini’s responseSchema vs responseJsonSchema — stays in your generate function.

Run with matcher.enrich(project, collection). Colours went from mostly-empty to accurate (“RED PUFF SLEEVE MAXI DRESS” → solid red). See the full pipeline.

4. Compose what you embed

An embedding only knows what’s in the text it was built from. Pull the attributes that matter into the doc embedding’s source:

embeddings: { doc: { source: "$title $brand $category $enriched.color_text $enriched.pattern", model: "...", dim: 1536 } }

Now “black dress” cosine-matches products whose embedded text actually says black — the black dress jumped from buried to #1.

5. Hard filters vs soft signals — and keep NLQ in its lane

This is the subtlest lever:

Hard filters for strict, well-populated constraints: price ≤ 5000, available = true. These should gate the result set — that’s the “hard filters stay hard” promise.
Soft fields (f.text({ soft: true })) for sparse or fuzzy attributes: a missing colour tag shouldn’t empty your results. samesake relaxes soft filters when too few rows match.
Constrain NLQ so it can’t turn a fuzzy word into a hard filter on a sparse field. We gave NLQ a schema of just { semantic_query, max_price } — so “black dress” never compiles to color = 'black' (which had dead-ended at the 2 literally-tagged rows). Colour is left to the embedding + visual signals instead.

With a multimodal embedding model, an image space gives you three retrieval modes over one index: text→text (doc cosine), text→image (a text query embedded into the image space), and image→image (find-similar / search-by-image). Colour/pattern intent that text barely encodes is far stronger image→image — so “find similar” is where the visual space earns its keep, while text→image mainly adds category/shape sense. Know which mode a query needs.

7. Tune query-time weights last

Channel weights (Channels.fts({ weight }), Channels.cosine, Channels.spaces) and defaultSpaceWeights rescale the RRF mix without reindexing. Reach for these only after the data and signals are right — re-weighting noise just reshuffles noise. Per-query weights let you push visual vs intent vs price differently for, say, a “similar look” surface vs a keyword box.

8. Measure, and respect corpus size

Tune against a fixed query set, not vibes — see Eval from search snapshots for relevance@k + constraint compliance. And be honest about scale: at ~30 products colour is a weak discriminator no matter what, because the embedding is dominated by category. Real relevance wins need both clean attributes and a catalog big enough to disambiguate.

Running enrichment in the background

enrich, index, and ingest route their work through a JobRunner (ctx.jobs.run). The default runs inline; pass jobs: createPgBossRunner({ connectionString }) from @samesake/jobs-pgboss to back them with a pg-boss queue for durability and concurrency control:

import { createPgBossRunner } from "@samesake/jobs-pgboss";
const matcher = createMatcher({ /* ... */, jobs: await createPgBossRunner({ connectionString: process.env.DATABASE_URL! }) });

The loop, in one line

Explain → fix the data → compose the embedding → set hard/soft correctly → constrain NLQ → tune weights → measure. In that order.