How We Replaced Our Entire Search Engine (with an AI writing half the code)

by Alex Yumashev · Updated Mar 23 2026

I've been dunking on AI pretty consistently on this blog. Not to mention I'm sick of all the AI influencer "built 15 apps in a weekend" crowd, but Claude Code just did something wild for me. Had it help us rip out our entire search engine - we're talking millions and millions of records, thousands of tenants - and migrate it from SQL Server full-text search to a small embedded, in-process Lucene port. Search went from 7-8 seconds down to milliseconds.

Why now?

For context - we relied on SQL Server full-text search for years and it was... mostly fine. The way a gas station sandwich is "mostly fine" when you're starving. It worked, nobody died, we had bigger problems to deal with. But when your customers are searching across millions of records and staring at a loading spinner for 7-8 seconds - "mostly fine" stops cutting it.

I've been trying to approach this project for years and kept chickening out. Not because the search rewrite itself is hard - swapping a search provider is a weekend, maybe two. The thing that kept scaring me off was all the infrastructure around it. Dozens of one-off CLI tools for index rebuilding, compaction, deduplication, gradual rollout, health checks, verification scripts, progress reporting. Plus all the prep work - tech stack evaluation, risk analysis, benchmarking candidate libraries, planning a migration path that doesn't nuke search for thousands of paying customers at once. The kind of work that makes you reconsider your career choices.

And this is exactly where AI came in. Not for the search code itself - I wrote that (well, most of it) in an hour. But for churning out dozens of self-contained CLI tools and scripts. Each one is a small, well-defined, green-field task. Clear inputs, clear outputs, no tangled legacy context to lose track of. If you read my jQuery rant - this was the exact opposite. That was brownfield hell, context rot after three files. This was "here's a spec, write me a tool," over and over, and Claude Code absolutely crushed it.

And of course interrogating AI endlessly for subtle details like "Which folder is writable when you host ASP.NET Core in Docker? In Windows/IIS? In Linux?". How to keep RAM usage low for index writers? How to avoid locking issues when blue/green deployments overlaps?

The final reindexing job that Claude helped me write ran for 39 hours straight across all tenant data, reporting nice progress graphs and auto-fixing errors as it went. It finished and everything checked out. Weeks of the most tedious infrastructure grind imaginable, compressed into days.

Why Lucene

For the uninitiated - Lucene is what powers Elasticsearch and Solr under the hood. Except we're not running an external Elasticsearch cluster. We're running a small custom fork of Lucene .NET port directly inside our app process. No extra services, no extra config, nothing.

Stack Overflow did an almost identical migration back in 2011 (SQL Server full-text to Lucene.NET) and their reasons read like our internal planning doc:

1. Distribute the workload. Full-text search is heavy. With an embedded index, it happens right in the app process - no round-trip to the DB server, no waiting in line behind other queries.

2. Get the database off search duty. Our database is busy enough without demanding full-text queries piled on top. Pulling search out gives us headroom for actual SQL work - no more compromising between "what's good for full-text" and "what's good for everything else."

3. Better control over results. SQL Server full-text is a black box. Lucene gives you custom analyzers, field boosting, scoring tweaks - when a customer says "search isn't finding X," we can actually do something about it now.

4. No external service dependency. It's just code in our codebase, running in our process. No Elasticsearch cluster to provision, no separate infrastructure to monitor at 3 AM. A local folder for index files and that's it.

5. No new dependencies for self-hosted customers. A big chunk of Jitbit customers run our helpdesk on their own servers. "Hey, now you also need to set up and maintain an Elasticsearch cluster" - yeah, no. With an embedded library there's nothing new to install. Deploy the app, search works.

Now, the tech porn

For the developers in the room - here's what's actually running under the hood, because some of this was non-obvious.

Index-per-tenant. Each customer gets their own Lucene index. No shared index with filtering - full isolation. One tenant's index compaction or reindex doesn't touch anyone else's. On SaaS we're talking thousands of indexes. On self-hosted, a single folder.

LRU cache for IndexWriters. Lucene's IndexWriter is expensive to create - it acquires a write lock on the directory, loads segments, etc. Opening one per request would be murder. So we keep a pool of open writers in a custom LRU cache, capped at 50. When a writer gets evicted, the cache commits all changes and calls Dispose() on it automatically. In practice, 50 covers all active tenants with room to spare, and idle ones get quietly evicted.

RAM buffer tuning. Each writer's RAM buffer is set to 2MB. That sounds stingy, but it's intentional. With potentially 50 live writers, you're looking at up to 100MB of indexing buffers just sitting there. Lucene defaults are much higher. We throttled it down and compensated with debounced commits instead.

Debounced commits. We don't flush to disk on every document write - that would be slow and punishing on SSDs. Instead, every write schedules a debounced commit with a 5-second cooldown. If more writes come in, the timer resets. When things go quiet, it commits once. Batch writes during a reindex get explicit commits per chunk anyway.

Blue-green deploy safety. Lucene uses file-level write locks. During a blue-green deployment, old and new instances briefly overlap. We set WriteLockTimeout to 5000ms - Lucene polls internally every second, so this gives the new instance five attempts to acquire the lock before giving up. Usually the old pod is gone by then.

HTML sanitization before indexing. Ticket bodies come in as HTML. Indexing raw HTML means your search index is full of <div> tokens and CSS class names. We strip it with my own StripHTMLFast() (honestly I deserve a Nobel prize for this thing I wrote years ago, it uses Span<char> heavily and reads the HTML directly from the buffer stream without allocating any strings) before handing it to Lucene - both during reindex and on every incremental update. Sounds obvious, but easy to miss.

Query escaping that doesn't break power users. We escape Lucene's special characters before parsing, but deliberately preserve " (phrase search), * (wildcard), and ? (single-char wildcard). So "exact phrase" and tick* both work. If the query still fails to parse after escaping, we fall back to wrapping the whole thing in quotes as a phrase search. Customers get power-user features without needing to know Lucene syntax.

Resumable reindex. The reindex job writes a progress file after every 1000-ticket chunk, storing the last processed IssueID. If the server restarts mid-migration, it picks up from there. No starting over. On a 39-hour reindex across millions of records, that matters.

Gradual rollout gate. Old instances (created before our migration cutover ID) don't auto-init Lucene - they fall back to SQL FTS until we explicitly triggered their migration. New instances above the threshold auto-trigger a background reindex on first search. This let us roll out to new signups immediately while migrating the legacy base in a controlled batch.

OOM handling. If Lucene throws an OutOfMemoryException during a write, we catch it, evict the writer from the cache (freeing its RAM buffer), and rethrow. Better to lose in-flight writes than to leave a broken writer sitting in the pool corrupting future writes.

So yeah

Not gonna turn into one of those AI evangelists. But for the kind of tedious, well-defined infrastructure work that was blocking this migration for years - it saved me weeks, maybe months. Instead it took ONE F*CKING DAY and the 39 hours of staring at the autohealing reindexing job. The right tool for the right job and all that.

Search is fast. Database is happy. Self-hosted customers don't need to install anything new. Ship it. Over and out, I'm off to enjoy the dopamine hit