Consider this console application.
Let’s walk through it. Nothing seems too awful. It starts by allocating a 1GB byte array and filling it with random bytes. It then traverses the byte array looking for the mode: the byte that occurs most often. Once it finds that, it prints the result out to the console (let’s assume that the WriteLine
represents some other piece of long-running calculation that really should be done on a background thread, so that the Task.Run
there is justified). It then enters an input loop where a greeting to the user is displayed, including the Id of the thread on which the input loop is running, because that may be useful information. Nothing too bad seems to be going on here.
Houston, We Have a Problem
However, if you run this while you check your Resource Monitor, you’ll see something interesting–the application allocates 1GB for the byte array at the start, but it never lets it go. Even if you type “GC” into the input loop to force the garbage collector to run, it will never let go of that 1GB.
The Art of Debugging
The input loop has built into it an idea for how to pursue the problem. Simply press enter to start another calculation. You’ll see the app now has 2GB of memory allocated. When it finishes, use “GC” to run the garbage collector, and you’ll gain another clue: it releases one GB and holds on to the other. It is probably safe to assume that the one it holds onto is the same one that was allocated the first time and refused to be garbage collected then. So what is different about the second array?
To answer that, let’s open our old friend Windbg! Running !dumpheap -stat
gives the following (at the end of a very long list).
Well, we knew (or strongly suspected) that the problem was in the byte array, so it’s no surprise to see a gigabyte allocated there. Let’s dig further.
There are various small byte arrays being used by the system in various ways, but it’s pretty obvious who our culprit is.
So the byte array is being held alive by an instance of Program+<>c__DisplayClass1_0
which is being held alive by a Func`2[[System.Int32, mscorlib],[System.String, mscorlib]]
which is being held alive by a Program+<>c__DisplayClass3_0
. Naturally.
Even without a deep understanding of how the compiler creates closures, you should get the feeling at this point that the problem has something to do with closures. Indeed, it does, though you can’t understand exactly what the relationship is without learning one particular detail about closures, and the importance of that detail is the point of this post. Here it is.
The C# compiler creates a single closure instance for all delegates defined within a single method.
In other words, consider a simple method that looks like this.
The compiler will turn that into something that looks like this, except with much funkier member names (like <>c__DisplayClass3_0
, which is also why those kinds of names show up in your stack traces sometimes).
As you can see, it doesn’t distinguish between the method that needs the byte array and the one that doesn’t. All of them belong to the same instance of the same type and therefore keep alive the same set of variables. In other words…
Every lambda declared in a C# method keeps alive every variable that was captured in any of those lambdas, not just itself.
So the byte array in our example is being kept alive by the closure class of the StartCalculating
method which is in turn being kept alive because one of its methods (the greeting-generating lambda) is being kept alive by the input loop. You can see this in Visual Studio by setting a breakpoint on the line where the greeting is calculated (you’ll have to change it to use method body syntax first). You will be able to check the contents of bytes
in the watch window when your breakpoint is hit, thus proving that this lambda is keeping the byte array alive.
However, suppose that the ConsoleInputLoop
is an API outside of our control and that the long-running task really does need to be run on a separate thread. What are our options for fixing this?
The most obvious way, given just the example we have so far, would be to calculate the length of the byte array outside of the closure and then only let the closure capture that single integer variable rather than capturing the byte array in its entirety.
This would work (I encourage you to make the change and try it out for yourself!), but chances are that if the calculation being done really requires a background thread, then it will need to capture the whole byte array. What then?
The answer is that then it becomes a more philosophical problem. The issue now is that the closure itself, a class generated outside of our control, violates the Single Responsibility Principle. It does the long-running calculation and specifies the greeting format. As such, let’s take the long-running calculation task and move it to a new method that does not require a closure.
Ah, much better. And more readable besides. And when we run it, we find that it does not hold on to the byte array. So we’re done, right?
But wait, what about async/await?
It could be objected that I didn’t use proper async patterns in the original version. Fair enough. What happens if we change the original problem to look like this?
Run it and see. I’ll wait.
Did you do it? Yup, the memory leak is still there. Let’s actually just try and async the version we already solved.
Try running that. What do you find?
Yup, that’s right–the memory leak is back even though we’re not capturing the byte array in any lambdas! In fact, even if we take out the one lambda we still have, the memory leak persists.
The reason is because the async state machine captures all local variables as fields on its closure, which is separate from the closures generated by our lambdas. And this makes sense. The Task that is returned by our async StartCalculating
method is still in progress as long as the console input loop is still in progress. Consider what would happen if you set a breakpoint on the closing curly brace of StartCalculating
. When you finally quit your console loop with a “Q”, you’ll hit the breakpoint, and the debugger will want to have everything you declared as a local variable in StartCalculating
to still be in scope for you to inspect.
So the async version actually requires us to refactor the entire reference to the byte array out of our method. The result would look something like this.
Conclusions
This whole thing is a C# language compiler curiosity, but I showed it in this way to demonstrate that it can cause real memory leaks in your applications, and you should be aware of it.
I hope you find some use out of it. Add a comment to discuss, I’ll see you next time, and until then, happy developing.
Note: The steps and commands for using windbg were taken from this excellent step-through. http://alookonthecode.blogspot.com/2012/02/windbg-101a-step-by-step-guide-to.html