back to Jitbit Blog home About this blog

The SaaS paradox, Shifting to Linux, and a love/hate letter to Microsoft

by Alex Yumashev · Updated Dec 9 2023

As your SaaS application balloons in size and complexity, you're smacked right in the face with what I call "The SaaS Paradox" - where 90% of your work, blood, sweat, and tears become completely invisible to your customers. Let's break down this thankless iceberg of effort:

DevOps - A Tangled Web of Complexity As your architecture morphs into an intricate beast, you're wrangling an ever-growing zoo of containers, microservices, and other moving parts. It's a technical juggling act that your customers blissfully ignore.

Infrastructure - Scaling Mountains and tuning OS'es You're constantly scaling servers, backups, configuring and reconfiguring hardware, operating systems, to keep up with demand. It's a Herculean task that goes unnoticed too.

Security Your app is now a shining beacon for hackers, attracting attempts to steal data or exploit your platform for nefarious activities like spamming. In the past 48 hours alone, we've thwarted a DDoS attack hammering us with 8,000 requests per IP per second and another spam attack from a paid account abusing our system to send spam emails (he was creatiung fake support tickets on behalf of unsuspecting "customers" and tweaked the ticket confirmation message template to be a salesy email). And guess what? Your customers don't see that too.

Our latest invisible project - Moving to Linux

Last week, we undertook a massive project. We finally ditched the remaining Windows Servers for Linux after a 14-year marriage. This was a colossal shift, yet it went unnoticed to our customers again (or worse, caused minor issues for some of them).

Why Windows in the first place?

When starting our app 14 years ago we were shackled to Windows because Jitbit Helpdesk ran on .NET Framework and used MS SQL Server databases for data storage. The databases migrated to Linux ages ago, but the application code lingered on Windows. We did migrate to .NET Core (now simply called ".NET"), which is cross-platform and boasts perks like lightning speed and an async-first approach, but decided to keep the app on Windows Servers - "if it ain't broke, don't fix it".

I love Microsoft for what they're doing for .NET and C#. "ASP.NET Core" is one of the fastest performing web frameworks, playing (and winning) in the same league with Go, Rust and C++, serving millions of requests per second. NodeJs, Django, Ruby, whatever - they are not even in the same ballpark. The development speed is also mindblowing. However, I hate Microsoft for how .NET couples with their own flagship product - the Windows Server.

Ridiculous bug MS is unwilling to fix for 4 years

What finally pushed us over the edge was this ridiculous bug from Microsoft (here's the original bug closed without fixing and my second desperate attempt to reopen it two years ago). In a nutshell this bug caused 503 errors during IIS server "recycles", affecting about 10% of users. In layman's terms, any minor tweak in IIS, like adding an SSL certificate, a new IP address/hostname to listen to, tuning buffers, not to mention deploying a new app version, would cause a recycle (which is normal) and throw 503 errors at half of the connected users for 20 seconds in the process (which is not). This might be tolerable for a small, internal self-hosted app, but for a critical, 24/7 SaaS platform with millions of users across 24 timezones? Absolutely unacceptable.

It's as if Microsoft's .NET and IIS teams don't talk to each other, ironically, leaving Windows Server as a second-class citizen in the .NET ecosystem.

The funniset thing is that even StackOverflow which runs on .NET hosted on IIS suffers from this very issue. As an everyday SO-user I caught a couple "503 Service unavailable" errors myself. Not to mention the horror stories people share in the GitHub issue: one poor guy is managing hundreds of thousands of servers spawned by 120+ load balancers in several datacenters and after moving from .NET Framework to .NET Core his setup became unmanagable.

.NET 8 made things worse, we're fed up

The recent .NET 8 relase made things worse, now half of the users get a 503 when the app is doing a non-overlapped recycle. We freaked out, snapped and pulled the plug.

As it turns out, .NET runs like a dream on Linux. Our app is now behind a custom-compiled Nginx proxy server, handling everything with finesse. The ability to tweak Nginx configuration and reload the webserver with zero downtime is nothing short of sorcery compared to Windows. Nginx is also very capable of blocking and rate-limiting DDoS attacks without proxying the load onto the upstream application at all.

We also employ "blue-green deployments", where new app instances are spawned and warmed up in the background, then seamlessly switched in. This was literally not possible on Windows - you needed an extra physical server as a load balancer. This allows for multiple deployments a day, even during peak hours, without our customers noticing a thing. The "blue" and "green" instances of the app can be managed by Docker or by a system.d service, pick your flavor.

Not to mention that we don't have to pay for Windows licenses any more. Our AWS/Microsoft bill is now 30% lower and we get more powerful machines in return.

And the bottom line is - after fixing some weird platform specific bugs (like Linux using difrerent timezone names than Windows and date-parsing logic) we're back to our nimble roots, where a bug can go from report to resolution in mere minutes after a support ticket arrived. This isn't just an improvement; it's a revolution.