back to Jitbit Blog home About this blog

How we migrated a 1TB database from Win to Linux with no downtime

by Alex Yumashev · Updated May 21 2022

Our helpdesk app, both the SaaS and the self-hosted version - is powered by SQL Server. The SaaS version additionally uses S3 (to store file-attachments), Redis (to persist in-memory cache between deploys and restarts) and other fancy cloud stuff, but there's still one big ass SQL Server database at the heart of it.

And this Christmas we've migrated it to Linux.

It's been 4 years since MS has first announced Linux support for SQL Server. We were super excited, but didn't want to rush things up. The nature of our app is too critical to afford betatesting new hotness. We wanted to wait this out... 4 years forward, SQL on Linux has matured from a fun little experiment into a first class Linux-citizen. Microsoft claims there are tens of millions of SQL Servers running under Linux (and that's just the ones with telemetry enabled) and the docker image has 70 million pulls. Time to give it a try.

Why Linux?

Performance while "windows vs. linux" SQL benchmarks are nearly identical, one still gets better performance on Linux. Simply because Linux adds less overhead. The OS footprint is much smaller even compared to "Windows Core" or whatever the "headless" Windows version is called nowadays. AWS's default bare-bones Ubuntu image uses only ~150MB RAM to run, and you can reduce this even further by removing unneeded stuff like "Amazon SSM Agent" etc. This results in faster startups, warmups and service restarts - the SQL process takes only a couple of seconds to restart on a Linux machine even under a heavy load, while on Windows this can take literally minutes.

Cost Windows is 2X more expensive in the cloud. Our typical database server is a 4-core 32GB RAM instance, which makes about $250 per month at AWS. By simply moving to Linux we get a 2X more powerful machine (8-core 64GB RAM) for less ($230/month).

Migrating a huge database with minimum downtime

This section is not specific to SQL Server, you can use a similar technique with any relational database

Our database is nearly 1 TB big and is spread across 3 drives for speed (two for the data, one for the transaction log). It's basically performance 101: keep the most frequently JOIN'ed tables on different disks and move the transaction log away from both.

To move this whole setup between servers with minimum downtime, SQL Server, basically, offers three ways of migrating:

"Log shipping" technique is not unique to SQL Server and exists in almost any RDBMS engine, including PostgreSQL and MySQL. It is the process of taking transaction log backups at the primary server and applying them to the secondary server. It can be scripted or manual, either way the best part is that you can use the existing full/diff/log backups you're already taking (are you?). The servers do not need to "know" about each other, you don't need to set up partnering, mirroring, security or online connectivity between the two.

The steps are fairly simple:

  1. Make sure all your backups are compressed, so a 1TB database shrinks down to a ~180GB file. Makes copying much faster
  2. Restore the full backup on the new server well in advance
  3. Apply all the diff backups if any
  4. Apply incremental log backups that you're already taking, one by one
  5. Make one final log backup by hand, restore it on the new server too
  6. Take a break and breathe 😉. At this point you have a replica that is only 1-5 minutes behind the primary server
  7. The next steps can be done manually, but we wrote a script to make the whole transition more "atomic"

  8. Take the app offline and "pause" all the HTTP traffic on the load balancer (or set it to retry failed upstream requests after 20 seconds, HAProxy can do that), so it just looks like regular "sluggishness" to an end user
  9. Make one final transaction log backup (it's fast and results in a very small compressed file since we're only minutes behind)
  10. Copy & restore the file at the new server (again, fast, see above)
  11. Switch the app connection-string to the new server
  12. Start praying

We've managed to keep the downtime under 25 seconds. Our customers have barely noticed anything. The ones who keep working on a Christmas weekend anyway.

Funny enough the biggest challenge came from an unexpected source - how do we copy a huge backup file over from Windows to Linux? It's kinda sad that in 2021 you still have to use the good old scp (WinSCP on Windows) and cannot simply detach a network drive from a Windows instance and plug it into the Linux one (b/c of the incompatible file systems obviously).

SQL Server on Linux first impressions

We were expecting all kinds of glitches, hiccups, bugs and slowdowns, but it all went surprisingly smooth. The Linux machine has calmly picked things up and is carrying the load just fine. SQL Server Management Studio works perfectly from an external machine and the whole DBA experience is just... the same.

The only gotcha we have found so far, is that the "max memory" SQL Server setting was being ignored under Linux. We've set it to 62 out of 64 GB, but the memory was stuck at 47GB. Turns out SQL simply does not "see" all the RAM available, reporting only 50GB out of the 64GB we actually had. It appears you have to tweak the Linux-exclusive memory.memorylimitmb setting otherwise SQL Server "thinks" there's only 80% of RAM available on the machine.

Overall our app handles 1000-1100 requests per second on a busy day and the new database works just fine.