Improving C# Performance with Span<T>

by Alex Yumashev · Updated May 24 2022

Whenever I have some free time on my hands I love making our helpdesk app faster. The newest C# and .NET Core releases come with so many performance oriented features that I've been waiting to play with, specifically the new datatype called Span<T>.

Here's the thing. Every program spends 80% of its CPU cycles working with Strings and Byte Arrays. Guess what, even sending an email over SMTP or parsing an incoming HTTP request - is still working with strings and arrays. But the problem with strings and arrays is that they are "immutable". If you want to slice, trim, expand, combine or otherwise manipulate an array or a string, you always allocate new copies. Every tiny modification - creates a new copy. Which is a huge performance problem - more work for Garbage Collector, more memory usage etc. etc. Say you want to Split() a string by a delimiter - well, you've just allocated N more strings in addition to the original one.

This is why the .NET team has come up with the Span<T> datatype. It's basically a "view" into your existing array.

You can manipulate your "array-like" data using spans all you want - trim, slice, split and combine. It all happens on an existing memory range. And once you're done - convert it back to an array (or don't, if your further code is also Span-compatible).

Real word Span optimization example

Our helpdesk app has a built-in "data URL" parser. "Data URLs" are inline HTML images that look like this:

<img src="data:image/png;base64,iVBORw0KGgoAAA
ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4
//8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU
5ErkJggg==" alt="Red dot" />

Parsing this image boils down to finding a comma, then base64-decoding everything after it into a byte array. Even this simple 2-step operation has huge performance improvements.

Originally we were using this code:

public static byte[] ParseDataUrlArraySplit(string imgStr)
{
    return Convert.FromBase64String(imgStr.Split(',')[1]);
}

Which obviously sucks as it allocates two new strings, and then parses the second one. Let's rewrite the .Split call into .Substring like this:

public static byte[] ParseDataUrlSubstr(string imgStr)
{
    return Convert.FromBase64String(imgStr.Substring(imgStr.IndexOf(',') + 1));
}

This will probably perform better, but can we use Spans instead?

public static byte[] ParseDataUrlSpan(string imgStr)
{
    var b64span = imgStr
        .AsSpan() //convert string into a "span"
        .Slice(imgStr.IndexOf(',') + 1); //slice the span at the comma
    
    //prepare resulting buffer that receives the data
    //in base64 every char encodes 6 bits, so 4 chars = 3 bytes
    var buffer = new Span<byte>(new byte[((b64span.Length * 3) + 3) / 4]);

    //call TryFromBase64Chars which accepts Span as input
    if (Convert.TryFromBase64Chars(b64span, buffer, out int bytesWritten))
        return buffer.Slice(0, bytesWritten).ToArray();
    else
        return null;
}

The code above slices a string (which can be casted to a ReadOnlySpan<char> since strings are essentially arrays of chars) then allocates a buffer, and uses the new TryFromBase64Chars API that accepts Span as a parameter. Yes, we're still calling a ToArray at the end just because I'm lazy and don't want ot rewrite further code that still expects a byte array.

The results are mind blowing:

BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1586 (21H2)
Intel Core i7-9700K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET SDK=6.0.201
  [Host]   : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
  ShortRun : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT

Job=ShortRun  IterationCount=3  LaunchCount=1
WarmupCount=3

|            Method |     Mean |     Error |    StdDev |  Gen 0 |  Gen 1 | Allocated |
|------------------ |---------:|----------:|----------:|-------:|-------:|----------:|
| TestUrlArraySplit | 6.153 us | 0.4384 us | 0.0240 us | 1.8158 | 0.0687 |     11 KB |
|       TestUrlSpan | 2.730 us | 0.2081 us | 0.0114 us | 0.9842 | 0.0191 |      6 KB |
|     TestUrlSubstr | 5.717 us | 0.3150 us | 0.0173 us | 1.7929 | 0.0153 |     11 KB |

The span-variant is 3X faster than our original Split-based code and uses 2X less memory!

Considering we use this code a lot in our app (whenever a helpdesk end-user pastes an image into a support ticket or a reply), this optimization will add up into a huge performance boost.