Our helpdesk app, both the SaaS and the self-hosted version - is powered by SQL Server. The SaaS version additionally uses S3 (to store file-attachments), Redis (to persist in-memory cache between deploys and restarts) and other fancy cloud stuff, but there's still one big ass SQL Server database at the heart of it.
And this Christmas we've migrated it to Linux.
We've just migrated a 1TB (!) database.
— Alex Yumashev (@jitbit) December 27, 2021
From a Windows server to a Linux one.
With zero downtime. pic.twitter.com/DyhlGBcK1B
It's been 4 years since MS has first announced Linux support for SQL Server. We were super excited, but didn't want to rush things up. The nature of our app is too critical to afford betatesting new hotness. We wanted to wait this out... 4 years forward, SQL on Linux has matured from a fun little experiment into a first class Linux-citizen. Microsoft claims there are tens of millions of SQL Servers running under Linux (and that's just the ones with telemetry enabled) and the docker image has 70 million pulls. Time to give it a try.
Performance while "windows vs. linux" SQL benchmarks are nearly identical, one still gets better performance on Linux. Simply because Linux adds less overhead. The OS footprint is much smaller even compared to "Windows Core" or whatever the "headless" Windows version is called nowadays. AWS's default bare-bones Ubuntu image uses only ~150MB RAM to run, and you can reduce this even further by removing unneeded stuff like "Amazon SSM Agent" etc. This results in faster startups, warmups and service restarts - the SQL process takes only a couple of seconds to restart on a Linux machine even under a heavy load, while on Windows this can take literally minutes.
Cost Windows is 2X more expensive in the cloud. Our typical database server is a 4-core 32GB RAM instance, which makes about $250 per month at AWS. By simply moving to Linux we get a 2X more powerful machine (8-core 64GB RAM) for less ($230/month).
This section is not specific to SQL Server, you can use a similar technique with any relational database
Our database is nearly 1 TB big and is spread across 3 drives for speed (two for the data, one for the transaction log). It's basically performance 101: keep the most frequently JOIN'ed tables on different disks and move the transaction log away from both.
To move this whole setup between servers with minimum downtime, SQL Server, basically, offers three ways of migrating:
"Log shipping" technique is not unique to SQL Server and exists in almost any RDBMS engine, including PostgreSQL and MySQL. It is the process of taking transaction log backups at the primary server and applying them to the secondary server. It can be scripted or manual, either way the best part is that you can use the existing full/diff/log backups you're already taking (are you?). The servers do not need to "know" about each other, you don't need to set up partnering, mirroring, security or online connectivity between the two.
The steps are fairly simple:
The next steps can be done manually, but we wrote a script to make the whole transition more "atomic"
We've managed to keep the downtime under 25 seconds. Our customers have barely noticed anything. The ones who keep working on a Christmas weekend anyway.
Funny enough the biggest challenge came from an unexpected source - how do we copy a huge backup file over from Windows to Linux? It's kinda sad that in 2021 you still have to use the good old scp
(WinSCP
on Windows) and cannot simply detach a network drive from a Windows instance and plug it into the Linux one (b/c of the incompatible file systems obviously).
We were expecting all kinds of glitches, hiccups, bugs and slowdowns, but it all went surprisingly smooth. The Linux machine has calmly picked things up and is carrying the load just fine. SQL Server Management Studio works perfectly from an external machine and the whole DBA experience is just... the same.
The only gotcha we have found so far, is that the "max memory" SQL Server setting was being ignored under Linux. We've set it to 62 out of 64 GB, but the memory was stuck at 47GB. Turns out SQL simply does not "see" all the RAM available, reporting only 50GB out of the 64GB we actually had. It appears you have to tweak the Linux-exclusive memory.memorylimitmb
setting otherwise SQL Server "thinks" there's only 80% of RAM available on the machine.
Overall our app handles 1000-1100 requests per second on a busy day and the new database works just fine.