Microbenchmarking and the Importance of Being Thorough

Some time ago at work, I ran across code resembling the following.

Problem

Please note that we are still on VS2013 and as such do not have access to C# 7’s pattern matching.

Anyway, traditionally C# developers have had two ways of solving the “I want a reference to my object as an X but I also want to handle the case where my object is not an X” situation. One is to use the is operator and then a C-style cast. The other is to use the as operator and then null check. This is what those look like in C# and CIL respectively.

OldILUnified

The as version has fewer instructions and as such certainly seems shorter, and is is really the same thing as as with the result then down-casted to a boolean. The magic behind them both is the isinst opcode, which returns the reference to the thing as the desired type if it matched and null otherwise. But is runs the exact same thing and simply treats the return value as a boolean (where null is false and not-null is true). But I’ve always tended to prefer as over is for this type of scenario since in the happy path, the is method would still require me to cast again, which is indicated in the IL by the castclass opcode.

I would like to use pattern matching from C# 7.0 either in a switch or in an if. That’s not available to me, but both of those use the same isinst opcode under the hood. Anyway, back to this thing…

Problem

I had occasion to add another type to that surfeit of ases and elseifs, so I did. Obviously, one would like to refactor the ~~clusterf~~ situation to leverage polymorphism, but in the project in question that would take more development time than we would be able to get permission from management to spend. This code also doesn’t change all that often, so–even if it’s ugly–the ugliness really isn’t costing us maintenance very often and seems to be working well as is.

Even so, I got a comment on my pull request saying the following.

…casting isn’t free, and this feels pretty inefficient to me. Could we just get the object’s type and then cast it once based off that, rather than casting it to every type of object that it might be?

Having no strong feelings one way or the other, I started to implement the requested approach. The obvious thing was to implement it with a similar set of nested elseifs. Then it occurred to me that I could store the types in a dictionary and get constant time lookup. O(1), amirite?

In wondering which implementation (dictionary vs elseifs) would be more performant, at some point it occurred to me that I might as well run a benchmark to see for myself. So I began to set it up. Then I figured while I was at it, I might as well benchmark the current implementation just to see how many sweet sweet milliseconds I was going to save.

In an effort to make sure I do my benchmarking right, I searched for advice and discovered the following excellent post on benchmarking in C# by the illustrious Jon Skeet.

http://www.yoda.arachsys.com/csharp/benchmark.html

Some salient points made in that article include:

Make your tests take minutes, not seconds. (to smooth out the bumps of the GC)
In your benchmarking code, prefer local variables over fields. (the CLR can optimize locals better than fields in some cases, so this can reduce the time of your baseline)
Check the results of the calculation are correct, not merely how long it took. (the compiler might optimize away the part you’re trying to benchmark if it doesn’t know for sure that you’ll actually use the result!)

I set up a new project with six classes called, unimaginatively, A, B, C, D, E, and F, and ran them through each set of type detection logic for 900 million iterations on my workstation, running in Release mode in VS2013 (i.e. the most relevant version for my use case).

I ran with six options. The one I called Polymorphic simply required all six types to implement a common C# interface and leveraged that interface to perform the operations. The one I called IfIsCStyle was a series of elseifs where each condition was an is operator and each code block used a C-style cast. The one I called CastAsEachTime was the existing implementation: as cast to all six types and then elseif the results not null. The one I called CastWithElse was the same thing but with all the casts moved into the elseif blocks. It will obviously be more performant than the CastAsEachTime version since it does less casting in the average and best cases, but the code visually resembles a descending staircase, which is distracting. I also added one called TypeDictionary where I stored the desired types in a static dictionary, and one called TypeElse which compared the object’s Type against each of the six known desired types in elseifs with the == operator. Then I ran the benchmark.

Function	Adjusted Time (ms)	Speed Cost
Polymorphic	30322	0%
IfIsCStyle	131462	76.93%
CastWithElse	149699	79.74%
CastAsEachTime	198622	84.73%
TypeDictionary	356411	91.49%
TypeElse	362060	91.63%

I was surprised. For one thing, I was surprised that IfIsCStyle beat CastWithElse. I have no good explanation for that. But what really surprised me was the fact that the TypeDictionary and TypeElse implementations were far and away the worst implementations thus far. How could this be?

I was able to find an explanation in this post. Essentially, using isinst allows the CLR to check whether the object belongs to the type without having to load various reflection metadata that’s required when you create a whole System.Type object. That GetTypeFromHandle call that’s the backbone of C#’s typeof operator…

TypeOfA

…causes the entire System.Type to be created, where we really only need the inheritance hierarchy data. And isinst is optimized for that.

That link also mentioned a different possibility for solving my use case: Type.GetTypeHandle. It’s a way of getting a type’s unique identifier without having to load the entire type with all its metadata and dragons.

There be dragons here. ... Friendly dragons sometimes, though.

That could be a lot more efficient, and his benchmarking seems to indicate that it is. But the act of benchmarking had already surprised me once in this journey, so I tried my own. I tried one version with the type handles stored in static fields and elseifing through the results of Type.GetTypeHandle, called HandleElse. And I added one that stored the handles in a static dictionary, called TypeDictionary. Here are the results.

Function	Adjusted Time (ms)	Speed Cost
Polymorphic	30322	0%
IfIsCStyle	131462	76.93%
CastWithElse	149699	79.74%
CastAsEachTime	198622	84.73%
HandleElse	203630	85.11%
TypeDictionary	356411	91.49%
TypeElse	362060	91.63%
HandleDict	396930	92.36%

Obviously, the new idea didn’t fare too well. I can only guess, but I suspect that it has to do with the fact that his benchmark only needed to compare to see if the types were equal but did not then need to actually obtain a reference to the instance as the desired type, so it had no need to cast at all. Adding a cast to the method, as my use case required, seems to have trumped the supposed performance gains of not loading the entire type object into memory.

At this point, not only was I sure the existing way was faster than what had been recommended by my pull request reviewer, but I was having fun. So I broke out my VS2017 Community Edition and reran the benchmarks but with overloads for PatternMatchIfElse (C# 7.0 pattern matching in an if) and PatternMatchSwitch (C# 7.0 pattern matching in a switch). Just when I thought nothing could surprise me, I was once again surprised.

Function	Adjusted Time (ms)	Speed Cost
Polymorphic	33490	0%
HandleElse	58392	42.65%
PatternMatchSwitch	63171	46.99%
PatternMatchIfElse	64653	48.2%
CastWithElse	65342	48.75%
IfIsCStyle	66402	49.56%
TypeElse	70936	52.79%
CastAsEachTime	110476	69.69%
TypeDictionary	269841	87.59%
HandleDict	272549	87.71%

I was very interested in the drop in performance of the CastAsEachTimeversion compared to the rest. My only guess is that the JIT Compiler has recently received some optimizations related to common type matching patterns, which included some of the others but did not include CastAsEachTime.

In all the tests I did, the fastest any implementation ran was the polymorphic one on VS2017. It completed in 33,490 milliseconds. The slowest any implementation ran was the type handle dictionary in VS2013, which completed in 396,930 milliseconds. Thus, the maximum savings for worrying about all this performance was 404 milliseconds per one million iterations. The original function under discussion is not called in a tight loop. It is not the bottleneck. This performance difference is not even measurable to our end users.

So I learned that, when getting feedback of the “X would be more performant than this” variety, it’s sometimes good to race your own horses. And ~~sometimes~~ most of the times the difference is so minuscule that it’s better to just focus on clean and readable code.

However, there are times when performance does matter. Just to prove this, I implemented an eleventh type checker. This one would try a C-style cast and then catch the InvalidCastException when it fails, and proceed to try the next type. Remember the slowest one above (type handle dictionary in 2013)? It ran 900 million iterations in 396,930 milliseconds. That’s over 2,200 iterations per millisecond. My C-style cast version took…well, I never let it finish. I ran it on 3,000 iterations, and it took 212,815 milliseconds. That’s about 70.94 milliseconds per iteration. Assuming that rate holds constant, if I tried to let it finish the same 900 million iterations I gave to the rest of the implementations, that would place it finishing in about two years plus a week. I elected not to let it run that long.

So there it is. Performance doesn’t really matter, except when it does.

You can download the .cs file that I used for my benchmarking here.

Share this:

Related

Leave a comment