The two words that silently decide whether your embeddings work

14 June 2026By AntonioGitHub ↗LinkedIn ↗Part of NORDHEM

CodingAISearch

The local embedding model I use, e5, wants a query: or passage: prefix on every piece of text. Forget it and it fails silently, quietly degrading search with no error. Here is why the prefix exists and how I made it impossible to leave off.

I added semantic search to my furniture store last week, and it worked on the very first try. That made me suspicious, because nothing works on the first try, and sure enough I had nearly shipped a version that was quietly worse than it should have been. The thing I almost got wrong wasn’t a config flag or a tuning number. It was two words: query: and passage:.

Here’s the short version. The embedding model I’m using, a little open model called multilingual-e5-small, was trained to expect a small label in front of every piece of text telling it what that text is: a search, or a document. Leave the label off and the model still runs. It just gets worse at its job, with no error and no warning. The interesting part is why a model would care about a label, and how I made it impossible for me or anyone else to forget it.

What’s an embedding, again?

An embedding is a way of turning a piece of text into a list of numbers, called a vector, so that texts that mean similar things end up with similar numbers. That’s the whole idea. The word ‘sofa’ and the word ‘couch’ come out as two lists of numbers that sit close together; ‘sofa’ and ‘frying pan’ come out far apart.

That closeness is the entire trick behind semantic search. To find products like the thing someone typed, I turn their search into a vector, then look for the product vectors nearest to it. You can picture it as a giant map where every product is a pin, related things sit near each other, and a search just drops a new pin and grabs whatever’s closest. Keyword search can’t do that, because it matches letters: it never connects ‘sofa’ to a product that only ever says ‘couch’.

So what are these two words?

The model’s family is called e5, and it was trained on pairs of text: a short search on one side, a longer document on the other. During that training, every search was written with query: in front of it and every document with passage: in front of it. So the model didn’t just see the words, it learned what those two prefixes mean.

In practice that means to embed the search ‘task chair’ I feed the model query: task chair, and to embed a product I feed it passage: ergonomic mesh task chair with lumbar support. Same model, same call. The prefix is the only thing telling it which hat to wear.

Why would the model care which one it is?

Because a search and a product description aren’t the same kind of writing. A search is short and often a fragment, like ‘comfy seat for the living room’. A product description is longer and written in full sentences. The model’s job is to bridge that gap, to make a stubby query and a wordy passage that mean the same thing land near each other on the map. The prefixes are how it knows which side of the bridge a given piece of text is standing on.

The part that took me a second to accept is that this makes the model asymmetric. The exact same words, ‘office chair’, come out as slightly different vectors depending on whether I label them query: or passage:. They’re not meant to be identical, because in the model’s world a search for office chairs and a product that is an office chair play different roles.

What happens if you get it wrong?

Nothing visible, and that’s the scary part. If I feed the model raw text with no prefix, or the wrong prefix, it doesn’t complain. It returns a perfectly normal-looking vector. Searches still come back, results still fill the page. They’re just a little worse, in a way I’d never notice by clicking around. It’s the kind of bug where everything looks fine and the quality just leaks out slowly. The only reason I trust my version at all is that I built a benchmark first, so I can put a number on ‘a little worse’. Without that, I’d have shipped the broken version and never known.

How I made it impossible to forget

I knew that if getting the prefix right depended on me remembering to type query: every single time, I would eventually forget, and so would anyone else who touched the code. So I didn’t leave it up to memory. The one module that talks to the model has no function that takes raw text. It has embedQuery, which adds the query prefix for you, and embedPassage, which adds the passage prefix for you. That’s the whole public surface.

typescript

const MODEL_ID = "Xenova/multilingual-e5-small";

// The private worker: load the model once, run the actual embedding.
async function embed(prefixed: string[]): Promise<number[][]> {
  const extractor = await getPipeline();
  const output = await extractor(prefixed, { pooling: "mean", normalize: true });
  return output.tolist() as number[][];
}

// e5 requires these exact prefixes; bake them in so a caller can't forget.
export async function embedQuery(text: string): Promise<number[]> {
  const vectors = await embed([`query: ${text}`]);
  // ...return the single vector
}

export async function embedPassage(text: string): Promise<number[]> {
  const vectors = await embed([`passage: ${text}`]);
  // ...return the single vector
}

export async function embedPassages(texts: string[]): Promise<number[][]> {
  return embed(texts.map((t) => `passage: ${t}`));
}

The actual model call lives in the private embed function in the middle there. The only ways to reach it are embedQuery and embedPassage, and each one writes the correct prefix before the text ever reaches the model. My indexing code calls embedPassage for every product; my search code calls embedQuery for whatever someone types. Nobody passes raw text, because there’s no door that accepts it. The prefix isn’t a comment I’m hoping people read. It’s baked into the only path through.

Things that surprised me

A few things I didn’t expect going in:

It fails completely silently. Skip the prefix and there’s no error, no warning, not even a slow path. The model hands back a slightly wrong vector and the search just gets quietly worse. Silent-degradation bugs are the ones I’m most afraid of.
The same text really does embed differently as a query and as a passage. I assumed ‘office chair’ was ‘office chair’ to the model. It isn’t. The prefix changes the numbers, on purpose.
The rule lives in the model’s docs, not the library. The embedding library I use won’t add the prefix for me and won’t warn me it’s missing. If you copy a generic feature-extraction snippet off the internet, you’ll get working-but-wrong embeddings, because those snippets almost never mention the e5 prefixes.
One tiny test guards the whole thing. My check just asserts that ‘sofa’ lands nearer ‘a comfortable couch’ than ‘a stainless steel knife set’. With the prefixes right, that gap is comfortable. Mangle them and the gap shrinks. One assertion catches a silent bug.
Every model family has its own ritual. e5 wants query: and passage:. Others want a different instruction, or a whole sentence like ‘Represent this sentence for retrieval:’, or nothing at all. The lesson isn’t ‘always add query:’, it’s ‘go find out what your specific model expects’.

When is this worth caring about?

Here’s what I’d tell someone about to add semantic search themselves. If you’re calling a hosted embedding API, the provider usually handles this for you, and you can ignore the whole thing. But the moment you run an open model yourself, which is what I’m doing because it’s free and the data never leaves my machine, its input contract becomes your problem. Before you write a line of code, read the model’s card and find the prefix or instruction it expects.

Then put that prefix somewhere it can’t be skipped. Not a comment, not a line in the README, not a note in your own head. A wrapper function that adds it for you, so the wrong thing is impossible to call rather than merely discouraged. That’s the real lesson, and it’s older than embeddings: when there’s a detail you absolutely must not forget, the fix isn’t to try harder to remember. It’s to build the thing so that forgetting it isn’t an option.