Product update: Moving Hosted Helpdesk files to Amazon S3

by Alex Yumashev · Updated Apr 4 2019

You probably noticed that we offer unlimited storage on our helpdesk app pricing page for "Enterprise" customers.

Well. We actually don't. The storage is limited, obviously, - by the size of the hard drive that stores the actual files. Every once in a while when we're about to run out of space, we add another drive (or extend the existing ones). As of December 2017 the client files have grown to 8 drives, 2 TB each.

Now here's a couple of downsides of using regular HDDs for this:

Cost: even if you use Amazon's "Cold HDD" disk type (the cheapest one) it's still more than $1300 every month just for the storage (drives and their backup snapshots)
Complexity: compared to everything else, file attachments in a helpdesk system can be classified as "infrequently accessed" data, and keeping this kind of data on a hard disk is not very efficient. You can get away with this in an "on-premise" local installation, but not in a cloud app that stores millions and millions (and millions and millions) of files from thousands of customers. Not to mention the complexity of backup/restore procedures etc.

Enter Amazon S3

After considering all the options we decided to move the files to AWS S3 - the "Simple Storage Service" offered by Amazon Cloud. It's fast, it's cost-effective and it's durable - Amazon promises 99.999999999% in its SLA agreement. Which is 0.0000000001% of objects can be lost. In plain English this means, quoting Amazon, "if you store 10,000 objects you can on average expect a loss of a single object once every 10,000,000 years".

That'll do.

Thanks to Amazon's great SDK it took me only two days to add S3-support to our hosted helpdesk app. We deployed the feature in mid December 2017 and the app started saving ticket attachments to two locations at the same time: S3 and the old fashioned disk drives. Just so we could test and see how things are working out.

Everything was fine. After fixing a couple of bugs the big question was - how the heck do we move all the existing terabytes of data?

Copying the data

We had to spawn a special server just for the move. Then we cloned the file-drives from snapshots (we couldn't use the existing ones because they were too slow for the move). Then we attached the drives to the new server, installed Amazons CLI tools and started writing the Epic Copy Script.

Which turned out to be just one line of code aws s3 cp foldername s3://bucketname

And on the Christmas night of 2017 we launched the script that's been working for 172 hours non stop and copied 96 million files.

Fun fact

Funny thing, the absolute majority of files stored in our support ticket system turned out to be Social network icons. Millions and millions of those twitter.png's and linkedin.gif's (not that we looked inside the confidential files, but we can't help seeing some of the filenames).

See, a helpdesk app imports most of its tickets from emails, and most emails have signatures with useless icons and logos that eat up disk space and waste network bandwidth... So one of the nice side-effects of the move was noticing this and adding proper filters to cut those useless bytes out.

Cool things about S3

Super easy backup you don't even have to do anything. Just click "enable versioning" for a bucket and AWS will do the rest
Cost-effective
Truly unlimited - we don't have to worry about running out of disk space any more
Cool "lifecycle" features - you can set up an expiration policy, that will, say, archive a file after 90 days and move to it to "infrequent access" storage class which is cheaper. You can auto-expire a deleted object if no one asked to undelete it, you can even automatically move super-old files to Amazon Glacier
Better performance - the server does not need to handle any disk I/O now, it's all been offloaded to S3. Also, the AWS SDK comes with nice async .NET methods, that offload the I/O blockage away from web-server threads
Scales automatically
etc. etc.

The only thing I don't like is the vendor lock-in we're putting ourselves into. But this can actually be a good thing (topic for a whole new blog post though).

We're super stoked to offer all these new features to our customers.