back to Jitbit Blog home About this blog

How I learned to stop worrying and wrote my own memory-cache

by Alex Yumashev · Updated May 7 2024

In my ongoing quest for performance optimization, I, a self-proclaimed performance enthusiast, spend considerable time analyzing bottlenecks in our SaaS helpdesk web app. Proudly, our backend seldom exceeds 5-6% CPU usage, even during peak hours when the Americas awaken and Europe remains active.

The primary motivation behind our performance obsession is related to deploys. Updating an ASP.NET Core app involves shutting down the old process and queuing HTTP requests until the new one launches, which can cause a noticeable slowdown—sometimes up to 20 seconds. This focus on startup performance ensures seamless transitions for our customers during new version deployments.

I recently identified an ideal optimization candidate: .NET’s built-in "MemoryCache," which is frankly inadequate.

What's wrong with .NET's built-in "MemoryCache"

TL;DR "MemoryCache" sucks

We cache a lot of stuff in memory, to spare a database call whenever we can. We use Redis as an "L2" cache provider, but for "L1" we use in-memory caching primitives offered by .NET. Basically Microsoft offers two caches: System.Runtime.Caching.MemoryCache and Microsoft.Extensions.Caching.MemoryCache.

They both suck and ruin performance.

Enter FastCache

What I needed is basically a "concurrent dictionary" with expiring items, that will be fast, lightweight, atomic and generic.

So this weekend I did what every programmer would do, I wrote my own cache with blackjack and hookers. Please welcome: FastCache. No, honestly I did google for alternatives, but haven't found anything that really fits my needs. After all, in-memory cache efficiency is not something you worry about at the beginning of your software company's journey (which is where most companies are). I guess that explains why there's so little offerings in this space.

Making a fast cache

My two biggest roadblocks were:

  1. How to work with date/time efficiently when checking/evicting expiring items?
  2. How to make writes atomic?

"DateTime.Now" is slow

Check this benchmark out. It does nothing but getting "current system time"

|          Method |     Mean |    Error |   StdDev |
|---------------- |---------:|---------:|---------:|
|    DateTime_Now | 93.55 ns | 1.516 ns | 0.083 ns |
| DateTime_UtcNow | 19.19 ns | 1.479 ns | 0.081 ns |

DateTime.UtcNow is a little bit faster because it does not have to account for time zones (another good reason to use UtcNow in your code!). But still not fast enough. If I'm going to check whether a cached item is expired I need something much faster. Check this out:

|          Method |      Mean |     Error |    StdDev |
|---------------- |----------:|----------:|----------:|
|    DateTime_Now | 92.530 ns | 3.2421 ns | 0.1777 ns |
| DateTime_UtcNow | 19.037 ns | 0.6913 ns | 0.0379 ns |
|    GetTickCount |  1.751 ns | 0.0245 ns | 0.0013 ns |

Yes. How about that. 1 nanosecond, baby. This is Environment.TickCount. It is limited to int data type though, which is 2.4 billion milliseconds. But hey, if I target .NET 6 there's the TickCount64 which is equally fast on 64-bit processors!


The second biggest challenge is "atomicness". When it comes to atomicness, the biggest trick is to check "item exists" and "item not expired" conditions in one go.

When an item was "found but is expired" - we need to treat this as "not found" and discard the item. For that we either need to use a lock so that the three steps "exist? expired? remove!" are performed atomically. Otherwise another thread might jump in, and ADD a non-expired item with the same key while we're still evicting the old one. And we'll be removing a non-expired key that was just added.

Or - instead of using locks we can remove by key AND by value. So if another thread has just rushed in and shoved another item with the same key - that other item won't be removed.

Basically, instead of doing this

lock {
	remove by key!

We now do this

exists? (if yes the backing dictionary returns the value atomically)
remove by key AND value

Here's how we do it. If another thread chipped in while we were in the middle of checking if it's expired or not, and recorded a new value - we won't remove it.

Why lock is bad?

Locks suck because they add an extra 50ns to benchmark, so it becomes 110ns instead of 70ns which sucks. So - no locks then!

Overall speed test

Benchmarks under Windows

|               Method |      Mean |     Error |   StdDev |   Gen0 | Allocated |
|--------------------- |----------:|----------:|---------:|-------:|----------:|
|     FastCacheLookup  |  67.15 ns |  2.582 ns | 0.142 ns |      - |         - |
|    MemoryCacheLookup | 426.60 ns | 60.162 ns | 3.298 ns | 0.0200 |     128 B |
|   FastCacheAddRemove |  99.97 ns | 12.040 ns | 0.660 ns | 0.0254 |     160 B |
| MemoryCacheAddRemove | 710.70 ns | 32.415 ns | 1.777 ns | 0.0515 |     328 B |

Benchmarks under Linux (Ubuntu, Docker)

|               Method |        Mean |      Error |    StdDev |   Gen0 | Allocated |
|--------------------- |------------:|-----------:|----------:|-------:|----------:|
|      FastCacheLookup |    94.97 ns |   3.250 ns |  0.178 ns |      - |         - |
|    MemoryCacheLookup | 1,051.69 ns |  64.904 ns |  3.558 ns | 0.0191 |     128 B |
|   FastCacheAddRemove |   148.32 ns |  25.766 ns |  1.412 ns | 0.0253 |     160 B |
| MemoryCacheAddRemove | 1,120.75 ns | 767.666 ns | 42.078 ns | 0.0515 |     328 B |

Wow, a 10x performance boost on Linux. I love my job.

P.S. The title of this post is a hat tip to Sam Saffron.