I recently faced an issue at work with identifying when a critical initialization API stopped being called in one of our applications. This post outlines how I used the Git log to identify the offending commit, and explores some other ideas I have about software architecture, managing work, and collaborating with others.

I’m personally not very fond of globally shared state in any of my applications. In C#.NET, this is static state. The problem I have with global state is that it can be difficult to reason about how a program works, is nearly-always a barrier to consistent testing, and as I observed this week, can prevent an application from behaving properly. When possible, I prefer to share information using dependency injection, though I do concede it is occassionally difficult to properly represent some concepts to a dependency injector. In any case, when it is possible to do so, I definitely prefer dependency injection.

With all of that said, we don’t usually choose what we inherit, and our first priority as developers should always be to create business value. Our customers do not care about our glorious design patterns, or how beautiful we believe that parallel algorithm Joe implemented is. They care about whether or not our software is meeting their needs effectively. Delighting our users usually means delivering a solution for them that works, and streamlining the experience in as pain-free a way as possible. An example of this is Visual Studio’s Edit & Continue feature for C#. I didn’t actually know I wanted this feature until I saw it in a keynote presentation; since then, I’ve been hooked. Angular provides a similar capability via the ng serve command. At the end of the day, deadlines - not developer preferences - usually drive development decisions.

I still don’t really understand the code I was working on. More specifically, I don’t know why some of the decisions were made, but I could see the clear reason for why it existed. I was troubleshooting a part of the code that dealt with shift data, and specifically with expiration based on the “current” shift. This is part of the business domain, and whether I like it or not, is pretty essential to us delivering the business solution. The specific problem was that our code needed to go fetch the list of employee shifts from the back-end, and store them on the front-end for use in our client app. This was achieved using a static method, predictably named setShifts. Of course, I had to do a fair amount of debugging to come to this conclusion. Through the power of static analysis, I was also able to determine this method was not called at any point in the application, and was the only site that modified an internal _shifts member.

As I was looking at the code, I couldn’t help but think “there has got to be a way to figure out at what point in history this method stopped being called.” A quick internet search later revealed that Git supports log searches using Regular Expressions (regex) (thank you, StackOverflow). Because I was doing “research,” I took some extra time to go read the documentation for the git-log command.

It turns out, what I was looking for was a command similar to this:

git log -S "\.setShifts\(" --pickaxe-regex --raw --diff-filter=M

What this tells Git is:

  1. Read the Git log (git log)
  2. Analyze log diffs using the \.setShifts\( regex (-S "\.setShifts\()
  3. Only give us files matching the regular expression (--pickaxe-regex)
  4. Give us a summary of changes (--raw)
  5. Only include files that were modified (i.e., no adds or deletes) (--diff-filter=M)

This surprisingly gave me an extraordinarily short commit-list (1 commit). By feeding that commit into the git-show command, along with the regex, I was able to view the original diff where the change was made. Here’s the relevant command:

git show <git-object> -S "\.setShift\("

I replaced <git-object> with the commit ID obtained from the log command.

Alas, I now knew where the original bug was introduced! There were a few ways I could have chose to fix it. I chose to simply re-add the code that was removed, because I couldn’t see a case where I could easily do anything else, and the removed code was fairly innocuous in the first place (occurring during startup).

I could have gone down a long road of refactoring the area of code I was investigating, but I felt that I should probably discuss any changes to the code with my colleagues prior to committing to another 1-2 hours of work. As it stood, I’d already had to abandon other work I was committed to doing to investigate this bug, and wasn’t willing to invest any more time in a bug that I felt was solved, understanding there was other work I had to do.

One of the axioms I employ as often as possible when investigating bugs is that they’re unplanned work. The Phoenix Project identifies four types of work:

  1. Business Projects
  2. Internal Projects
  3. Operational Change
  4. Unplanned Work

When I come up against the Unplanned category, I try to minimize the impact on the collective other three. For me, this usually means:

  • Addressing the immediate problem
  • Posing a solution that can become part of an operational- or strategic-plan

The idea behind this strategy is that I can engage with other members of the team. The benefit of taking such an approach is that my boss can re-assign work to more junior members (if he feels it could be valuable), makes work transparent, and minimizes the impact of unplanned work on the other three types of work.

If you haven’t read The Phoenix Project, I’d highly recommend you add it to your reading list. I read a paper copy, but would recommend listening on Audible. There are some diagrams, but I don’t think they’re valuable enough to justify the time investment required for reading a paper copy.

This was a relatively short story, but I thought it might be valuable for others. In any case, thanks for reading!

- Brian