What is Elasticsearch, and why use it?
The plain-English tour of Elasticsearch I wish I'd had: what it actually is, the capabilities that make it worth running, and exactly how I use each one in NORDHEM, my Nordic home-goods webshop.
I'm building a webshop called NORDHEM, and the question I had to answer early on was: what is Elasticsearch, and why would I run a whole separate piece of infrastructure just to power a search box? The short answer is that a database stores your data and a search engine finds it, and those turn out to be very different jobs. The interesting part is everything in between, so this is the plain-English tour I wish I'd had, one capability at a time, with the real code I use for each.
What is it, really?
Think of the index at the back of a textbook. You don't read all 600 pages to find where it talks about photosynthesis. You flip to the back, find the word, and it tells you pages 42 and 311. Elasticsearch builds exactly that, an inverted index, except instead of mapping words to page numbers it maps words to the documents that contain them. Search for “velvet” and it already knows which products mention velvet, because it did that bookkeeping ahead of time when the data went in. That is the whole trick, and I'll drop the analogy now because the rest is detail.
You talk to it over a JSON REST API. You send a JSON document to index it, and you send a JSON query to search it, which means from my Node search service it's just HTTP and shapes I control in TypeScript. It's near-real-time: a document you index becomes searchable about a second later, not instantly, which is a tradeoff for speed I have never once felt as a problem. It's also built to scale horizontally across many machines, which is the headline feature everyone repeats. I should be honest here: NORDHEM runs a single node on my PC under Docker. I get none of the distributed-cluster benefits and I don't need them at 43,000 products. I run Elasticsearch for what it does on one node, and that is already a lot.
How does it understand words? (full-text analysis)
Before a word goes into that index, Elasticsearch runs it through an analyzer, a small pipeline that turns messy human text into clean tokens. Take one of my product names: “Antonio's Solid Wood Beds.” The analyzer chops it into tokens, lowercases them, throws away the apostrophe-s possessive, drops noise words, and stems each word back to its root. “Antonio's Solid Wood Beds” comes out the other end as antonio solid wood bed. “Beds” became “bed.” That last step, stemming, is why a shopper who types “bed” still finds a product called “Beds”: both sides reduce to the same root.
Here is the analyzer I actually ship. I rebuilt the standard English analyzer by hand as a custom one so I could extend it in later steps instead of starting over:
filter: {
english_possessive_stemmer: { type: "stemmer", language: "possessive_english" },
english_stop: { type: "stop", stopwords: "_english_" },
english_stemmer: { type: "stemmer", language: "english" },
english_synonyms: { type: "synonym_graph", synonyms: synonymRules },
// ...
},
analyzer: {
english_text: {
type: "custom",
tokenizer: "standard",
filter: ["english_possessive_stemmer", "lowercase", "english_stop", "english_stemmer"],
},
english_search: {
type: "custom",
tokenizer: "standard",
filter: [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_stemmer",
"english_synonyms",
],
},
}Read the filter list left to right and you can see the pipeline: strip the possessive, lowercase, drop stop words like “the” and “of,” then stem. The two analyzers share that chain. I'll come back to why there are two of them, and to that last english_synonyms filter, in a second.
How does “couch” find a sofa? (synonyms)
My catalog calls a sofa a sofa. Plenty of shoppers call it a couch, and some call it a settee. None of those words stem to the same root, so analysis alone won't connect them. For that I give Elasticsearch a list of words that should mean the same thing. It lives in a plain text file:
sofa, couch, settee
rug, carpet
wardrobe, armoire
nightstand, bedside table
cushion, pillow
lamp, light fixture
tv stand, entertainment center, media console
shelf, shelving, bookcase
ottoman, footstool, pouf
duvet, comforterEach line is a group of equivalents. Search any word on a line and Elasticsearch expands the query to match the others, so “couch” finds the sofa and “pouf” finds the ottoman. The decision I'm happiest with is that these apply at query time, not when the data is indexed. That's why my analyzer chain has two analyzers: one for indexing documents (no synonyms) and one for searching (synonyms bolted on the end). Editing this file changes how searches behave without rebuilding the 43,000-document index, which is the whole reason a future step can let me hot-edit these rules from an admin screen.
What happens when someone types “vellvet”? (typo tolerance)
People misspell things in search boxes constantly, and a search that returns nothing for “vellvet” is a search that loses a sale. Elasticsearch has fuzziness built in: it measures how many single-character edits separate two words and treats them as a match if they're close enough. “Vellvet” is one extra letter away from “velvet,” so in NORDHEM it finds all 22 velvet products instead of zero. The shopper never knows anything was wrong.
Which result comes first? (relevance scoring)
Finding the matching products is half the job. Ordering them is the other half, and it's the half people judge you on. Elasticsearch scores every match with an algorithm called BM25, which rewards documents where your search word appears and is rarer across the catalog, so a match on an unusual word counts for more than a match on a common one. On top of that base score I tell it that some fields matter more than others. A query is one JSON object:
query: {
multi_match: {
query,
type: "best_fields",
fields: ["name^3", "product_class^2", "description"],
// AUTO scales allowed edits with term length: 0 edits up to 2
// chars, 1 edit for 3-5, 2 edits above 5.
fuzziness: "AUTO",
},
},The little ^3 and ^2 are field boosts. A word found in the product name is worth three times one found in the description, and the category sits in between, because if you type “sofa” you almost certainly want a product named sofa, not one that mentions sofas in a paragraph. best_fields means the single best-matching field decides the score, which is right for short product names. And notice fuzziness: "AUTO" riding along in the same object: that's the typo tolerance from the last section, set to scale the edit budget by word length so “vellvet” can be fixed but a three-letter word like “rug” can't mutate into something else.
I should be honest that those boost numbers are educated guesses for now. A later step measures them against real human relevance judgments instead of my intuition. The point is that the engine gives me knobs, and the knobs are just numbers in a JSON object.
Can it help before I finish typing? (autocomplete and did-you-mean)
Two more things ship in NORDHEM today, and they're the ones a shopper notices first. As you type, an autocomplete dropdown suggests completed product names, matching on word prefixes so “fabric so” already points at “three-seat fabric sofa.” And if a search comes back looking thin, a did-you-mean suggestion offers a better-spelled rewrite, scored so it only speaks up when its guess is genuinely better than what you typed. A well-spelled query gets no suggestion at all, which is the polite default. Both of these are Elasticsearch features I configured, not separate services I had to build.
What about filtering by category and price? (coming next)
Here's where I switch to future tense, because I haven't built this yet. The next step is faceted filtering: the sidebar of checkboxes that lets you narrow results by category, price range, colour, or material, each with a live count next to it. Elasticsearch does this with aggregations, which compute those buckets and counts in the same request that fetches the results. I know which feature does it and roughly how it'll go, but I'd be lying if I called it done. It's the thing I'm building right after this post.
Can it understand what I meant, not just what I typed? (planned)
Everything so far is keyword matching, however clever. The frontier I'm saving for later is semantic search: turning both the query and every product into vectors of numbers that capture meaning, so a search for “cosy reading corner” can surface an armchair and a floor lamp even when those exact words appear nowhere in the product text. Elasticsearch supports this with vector fields and a nearest-neighbour search, and the plan is to blend it with the keyword results so each covers the other's blind spots. That's a later step, genuinely future tense, and I'm looking forward to measuring whether it actually helps or just sounds clever.
Why not just use my database?
This is the question I get most, so here's the short version. Postgres owns the truth in NORDHEM: orders, users, the canonical product records. Elasticsearch owns a search-shaped copy of the products, reshaped and pre-chewed for fast finding. If the search index ever got corrupted or I changed the analyzer, I'd throw it away and rebuild it from Postgres, and nothing of value would be lost. The database is the source; the search engine is a derived view I can regenerate at will. Postgres can do basic text search, and for a small site with exact-name lookups it's plenty. It just doesn't do stemming, query-time synonyms, length-aware fuzziness, tunable relevance, and aggregations all at once, which is the bundle I wanted.
Things that surprised me
A few things didn't click until I'd built it and watched it behave:
- The query and the data go through the same analyzer pipeline. I assumed analysis only happened when indexing. It also runs on what you type, which is the only reason “bed” matches “Beds”: both get stemmed to the same root before they're compared. If the two sides used different rules, matching would quietly fall apart.
- Putting a filter in the wrong order is a silent recall failure, not an error. My synonym filter sits after the stemmer on purpose. Put it before, and “sofas” skips the “sofa, couch” rule because the rule text never got stemmed to match. Nothing throws. You just lose results and never know.
- Query-time synonyms mean I can edit the rule list without reindexing. Because synonyms apply at search time, changing that text file doesn't touch the stored documents. Editing 43,000 indexed docs to add one synonym would have been miserable; this is just a settings update.
- Fuzziness is length-aware on purpose. “AUTO” allows more edits for longer words and zero edits for very short ones, so “vellvet” gets corrected to “velvet” while “rug” can't morph into “rag” or “rig.” A fixed edit count would have made short words chaos.
So, is it worth running?
Here's my honest take on when Elasticsearch earns its place and when it doesn't yet. If you're running a small site where people look things up by exact name, or a blog with a handful of pages, you don't need it. Postgres full-text search will do the job and you'll have one less thing to operate. The moment search becomes a feature your customers actually judge you on, the moment a query for “vellvet” returning nothing costs you a sale, or a sofa that won't show up for “couch” makes your shop feel broken, that's when it earns its place. For a webshop, search is the front door, and Elasticsearch gives me stemming, synonyms, typo tolerance, tunable ranking, and autocomplete as one coherent toolkit instead of five half-built ones.
What I'd tell someone about to try it: start with one node, send it JSON, and learn the analyzer chain before anything else, because almost every confusing search result traces back to how a word got tokenized. The distributed-cluster reputation can wait. The everyday power is in that quiet little pipeline that turns “Antonio's Solid Wood Beds” into antonio solid wood bed before anyone searches a thing.


