Whenever I have some free time on my hands I love making our helpdesk app faster. The newest C# and .NET Core releases come with so many performance oriented features that I've been waiting to play with, specifically the new datatype called Span<T>
.
Here's the thing. Every program spends 80% of its CPU cycles working with Strings and Byte Arrays. Guess what, even sending an email over SMTP or parsing an incoming HTTP request - is still working with strings and arrays. But the problem with strings and arrays is that they are "immutable". If you want to slice, trim, expand, combine or otherwise manipulate an array or a string, you always allocate new copies. Every tiny modification - creates a new copy. Which is a huge performance problem - more work for Garbage Collector, more memory usage etc. etc. Say you want to Split()
a string by a delimiter - well, you've just allocated N more strings in addition to the original one.
This is why the .NET team has come up with the Span<T>
datatype. It's basically a "view" into your existing array.
You can manipulate your "array-like" data using spans all you want - trim, slice, split and combine. It all happens on an existing memory range. And once you're done - convert it back to an array (or don't, if your further code is also Span-compatible).
Our helpdesk app has a built-in "data URL" parser. "Data URLs" are inline HTML images that look like this:
<img src=" ANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4 //8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU 5ErkJggg==" alt="Red dot" />
Parsing this image boils down to finding a comma, then base64-decoding everything after it into a byte array. Even this simple 2-step operation has huge performance improvements.
Originally we were using this code:
public static byte[] ParseDataUrlArraySplit(string imgStr) { return Convert.FromBase64String(imgStr.Split(',')[1]); }
Which obviously sucks as it allocates two new strings, and then parses the second one. Let's rewrite the .Split
call into .Substring
like this:
public static byte[] ParseDataUrlSubstr(string imgStr) { return Convert.FromBase64String(imgStr.Substring(imgStr.IndexOf(',') + 1)); }
This will probably perform better, but can we use Spans instead?
public static byte[] ParseDataUrlSpan(string imgStr) { var b64span = imgStr .AsSpan() //convert string into a "span" .Slice(imgStr.IndexOf(',') + 1); //slice the span at the comma //prepare resulting buffer that receives the data //in base64 every char encodes 6 bits, so 4 chars = 3 bytes var buffer = new Span<byte>(new byte[((b64span.Length * 3) + 3) / 4]); //call TryFromBase64Chars which accepts Span as input if (Convert.TryFromBase64Chars(b64span, buffer, out int bytesWritten)) return buffer.Slice(0, bytesWritten).ToArray(); else return null; }
The code above slices a string (which can be casted to a ReadOnlySpan<char>
since strings are essentially arrays of chars) then allocates a buffer, and uses the new TryFromBase64Chars
API that accepts Span as a parameter. Yes, we're still calling a ToArray
at the end just because I'm lazy and don't want ot rewrite further code that still expects a byte array.
The results are mind blowing:
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19044.1586 (21H2)
Intel Core i7-9700K CPU 3.60GHz (Coffee Lake), 1 CPU, 8 logical and 8 physical cores
.NET SDK=6.0.201
[Host] : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
ShortRun : .NET 6.0.3 (6.0.322.12309), X64 RyuJIT
Job=ShortRun IterationCount=3 LaunchCount=1
WarmupCount=3
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Allocated |
|------------------ |---------:|----------:|----------:|-------:|-------:|----------:|
| TestUrlArraySplit | 6.153 us | 0.4384 us | 0.0240 us | 1.8158 | 0.0687 | 11 KB |
| TestUrlSpan | 2.730 us | 0.2081 us | 0.0114 us | 0.9842 | 0.0191 | 6 KB |
| TestUrlSubstr | 5.717 us | 0.3150 us | 0.0173 us | 1.7929 | 0.0153 | 11 KB |
The span-variant is 3X faster than our original Split-based code and uses 2X less memory!
Considering we use this code a lot in our app (whenever a helpdesk end-user pastes an image into a support ticket or a reply), this optimization will add up into a huge performance boost.