Adventures of an Open Source Rookie: Everything’s Broken and It’s All My Fault

Somewhere, there is a mother baking cookies. Her young son sees the busy-ness and productivity of the kitchen (to say nothing of the aromas) and wants to help. He learns that cookies are coming, and henceforth no force of man nor nature can keep him away from the kitchen. So she lets him. Only two hours later when she tries one for the first time does she find out that he put salt in where he should have put sugar.

Sometimes in life, things like this happen. Sometimes we’re the mom, but often we’re the son. We all have stories of trying to help and failing miserably. This is one of those stories.

Background

A while back I had a use case where I needed to intercept some calls to some methods. I investigated Castle DynamicProxy, but the performance wasn’t good enough. I could have added the code manually, but if I wanted to maintain that much boilerplate for the rest of my life, I’d have gone into COBOL instead of C#.

Eventually, I ran into AOP, so I started looking around at how to implement such a thing in the .NET Framework. I quickly ran across PostSharp but was just as quickly turned off by the price point¹. I had recently heard of the concept of code weavers, so I started investigating the most popular one for .NET: Fody.

Their github page includes a list of open source addins which were made with Fody, and one of them in particular caught my attention: MethodBoundaryAspect.Fody. It didn’t quite have all the features I wanted, and a bug or two popped up in the course of my attempt to integrate it into my solution, but fundamentally it was solid and performed well. Score!

Playing around with the code for MethodBoundaryAspect.Fody was actually a ton of fun. I learned a lot, got to use new libraries I hadn’t known about before (like Mono.Cecil), and there’s a whole meta feeling you get when you realize that you’re writing code that writes other code.

Sunday: Look Ma, I’m Doing Open Source!

Those features sat on my branch for a few months. But eventually, one Sunday afternoon, I decided to brush them off and actually contribute to open source. Look at me! I’m making a difference!

I started with a simple bug fix. The code that recreates an aspect instance couldn’t handle constructor overloads properly. In particular, when searching for the right constructor to use, it only checked the number of parameters, not that the types of parameters matched. So, for example, if you had a constructor that took a single parameter of type string and another that takes a single parameter of type int, it would get confused between them. Confident that I was making a difference and doing no harm, I made the change.

Spoiler alert: There's a bug here.

That was part of the pull request, but I also included another feature I had added months earlier. In particular, the previous version didn’t even try to handle attribute parameters of type object or System.Type or any array type. It was only primitives and strings. Everything else threw a NotSupportedException. Also, although it checked for named property assignments in the attribute’s construction, it didn’t do the same for named fields.

So I went ahead and added all of that (which I had implemented months earlier) and submitted a pull request on Sunday night. Then I flew out on vacation, taking my laptop with me just in case I wanted to add another one of my old features after they merged this one.

I slept soundly that night, confident that I had done a great favor to the world.

Monday: Certified Working on My Machine™

I woke up to this on Monday morning.

My pull request was merged! That means it was absolutely flawless, right?

Well, that felt good. Maybe I should submit another one of my features–hey wait a minute! What’s this?

Nope. No way this was my fault. Have you tried turning it off and on again?

Hey! What gives? I fixed something, and now it’s broken? This couldn’t possibly be my fault, right?

Upon further investigation, I was compelled to conclude that I had indeed broken the build. Oh, it compiled. And it passed all the tests. But it simply wasn’t working for someone that was already using it in production.

On the technical side, I could see that comparing the TypeReference instances with their default equality comparison was for some reason not good enough. But I couldn’t fix it unless I could reproduce it. After all, how can I verify that it’s fixed if I can’t verify that it’s broken?

I spent all day Monday trying to reproduce that issue. I wrote tests, I started new projects, I tried .NET Standard, and I tried the sample code given in the ticket. About six hours of debugging while on vacation–all to no avail. That night, I hung my head in shame as I had to ask the user for further details.

That’s when I decided that this would be a learning experience for me. There’s a desire in many people, me included, to react to failure–especially public failure–by throwing up my hands and walking away forever. That’s not what I did, though. I buckled down to try to figure it out and even to ask humbly for more help in reproducing the problem. And so I learned…

Lesson #1: Sometimes my code will break things, and that's ok.

Tuesday: The Key Ingredient

I woke up Tuesday morning to a fresh message that contained the key I needed.

🤦 🤦 🤦 🤦 🤦 🤦 🤦 🤦

As soon as I read those lines, I realized the problem. The TypeReference class in Cecil can be a bit finicky. This user’s stuff had been working before, though, so obviously the types will work if I can just identify the right constructor.

I had it fixed in an hour, submitted it, and moved on.

Strings are good for comparisons. 👍

The thing is, though, that I felt a little bad about this one. Not just because I had broken everything. But because I had actually looked at the code before I originally checked it in, and I had thought to myself “You know, TypeReference can be finicky sometimes; I wonder if there are conditions where that equality might fail.” Only I didn’t actually stop to wonder or think about such conditions. If I had, the first answer I would have come up with would have been “If the type is being imported from a reference assembly”. But I didn’t think about that; I just went blithely on with submitting my code.

Even in Monday’s debugging session, I had passed over this code many times. I had stared at this code for probably an hour, but I never saw what was right in front of me. I was so focused on trying things to reproduce the issue that I forgot to think intelligently about different situations that could cause the failure being seen. And so I learned my next two lessons.

Lesson #2: Don't focus on trying things to reproduce the problem to the exclusion of thinking intelligently about what might cause the problem in the first place.

Lesson #3: You know that voice in the back of your head that says "This code feels fragile."? It's trying to do you a favor; listen to it.

I realized something else, too. I had grown complacent in my professional coding. I used to pay attention to that “This code is fragile” voice, but I had gotten used to ignoring it at work “because QA will catch it” or “those edge cases don’t show up often, and it’ll be awhile before they do” or “no one will really care or notice if I go the extra mile in testing this first”. In short, I had gotten lazy. And I thank open source for waking me up from that state. When I see the effects of my laziness directly in my users’ experience, I can no longer justify skimping on tests or checking something in without at least trying to break it first.

Wednesday: Debugging Class Is Still In Session

The pull request had been merged overnight (if you’re wondering why a lot of the merging and commenting seems to happen overnight, I think it’s because the primary authors are based in Germany, and I’m in the western USA). I was flying back home this afternoon, but I thought I’d work on rebasing another of my old features onto the latest version of master. This feature will be to allow the OnExit function to overwrite the return value that gets returned out of the weaved method.

In the interest of learning my lessons, I decided that I would add tests for Return Value Overwrite to make sure it works even across assemblies.

I had it about ready to submit, when it occurred to me that perhaps I should try it with a different constructor argument, just to demonstrate that the previous issue was really out of the way. So, in addition to my test that took a string, I added a test that took a Type[].

And it failed.

I learned another lesson here:

Lesson #4: Sometimes the best way to find bugs in a feature is to use the feature like your user would.

If you’ve ever had your car break down on a long road trip with your family inside because of that oil change or tune-up you’ve been putting off, you know how I felt. I was incensed with myself. I was frustrated. I was depressed. I was livid. My vacation was over; I was checking out and about to fly home, and there was very little I could do to fix the problem, if I even knew what it was.

My wife and I stopped in a mall parking lot so I could make some more progress, but I still didn’t accomplish anything noteworthy. It was awful of me to ask her to sit there in our rented Hyundai Santa Fe while I pored over dumps from ILSpy and errors thrown by PEVerify, but she has forgiven me.

Deciding to put this to bed once and for all, I wrote the deepest battery of tests for the most complex and unlikely scenarios I could think of. I added enums that have each of the allowable underlying types. I added arrays of such enums. I used each of those both in a constructor parameter expecting it and in one expecting object. And, for the arrays, in one expecting object[]. Nothing would escape this battery of tests. I simply would not permit it.

I was expecting a lot of that to be overkill, but it really wasn’t. For instance, I found one particular bug that only showed up when I passed an enum array to a parameter expecting a single object and even then only failed if the enum was defined in a separate assembly from the target. This one was particularly nasty because the only symptom of failure was that PEVerify failed, and–although it did tell me the method it didn’t like–it refused to tell me anything about the method that might have been helpful in isolating the real problem².

I wasn’t able to finish that day; I had to go to bed unresolved. I did, however, pick up another lesson.

Lesson #5: Adding an exhaustive battery of tests when first adding a feature may feel like overkill, but it won't feel that way when it catches a bug before the pull request. ... Nor when it prevents one from being introduced a year from now.

Thursday: An End to the Madness…So Far

Thursday saw me finish writing (and fixing) the tests a few hours after I finished work. I submitted the pull request, and it was merged later that night with compliments:

Don't worry. This won't go to my head. Also, this guy's smiling avatar is possibly the best thing on all of github.

It was an adventure, but I’ve learned lessons. Valuable lessons. I’m already putting them to work on the next feature I plan to implement. I don’t even know for sure that this adventure is over (the original user who submitted the report that it was broken has not yet responded as of the time I write this). So stay tuned for further updates on that and other projects on my path as a developing coder.

[1] This should not be construed as an opinion on the relative value per dollar of PostSharp; my use case simply required an open source solution.

[2] For the curious, the real problem was that if you don’t resolve a TypeReference to an enum defined in an external module, it imports the enum as a reference type instead of a value type, which makes PEVerify call you various rude words for trying to use an enum as a reference type.