My team has started adopting feature flags recently. I wanted to take some time to discuss what feature flags are, some best practices my team has discovered along the way, enumerate some of the situations where using feature flags might make sense for your team, and give some tips on how to get started.

A Little Bit of Background

I’ve spent the majority of my career thus far working on systems that fit one of two archetypes:

  1. A single system, deployed on a handful of machines, in a large number of environments
  2. A single system, deployed on a large number of machines, in a small number of environments

The teams I’ve worked on have essentially always had hundreds of machines that they need managed the installation, configuration, and operation of. Generally speaking, those teams have not always had ownership of the decision to deploy a piece of software outside of a small subset of the total footprint of the machines owned by the organization; in one situtation, we spent a lot of time coordinating with environment owners to gain approval to move forward in their environment. In another situtation, we coordinated with other IT Teams, who also provided local support for the environment that the system was installed in.

In each of the aforementioned scenarios, there was one common question in the teams I worked on that made the experiences very similar: How do we ensure we have the entire system up to date, all the time? On more than one occassion, in each circumstance, we tried to deploy a new piece of functionality to our stakeholders, and found that the installation either could not be performed, or failed, because we didn’t have some dependency installed. There were a number of ways we could have solved those problems, but at the very core, the problem was one of confidence: how do we make ourselves, our partners, and our stakeholders, confident that we can deliver them new, incremental value, without risk of damaging business operations? In an accounting firm, you can’t affect closing cycles. In a manufacturing plant, you can’t cause inventory count discrepancies, or worse, stop the manufacture of the companies’ products. If our stakeholders could rely on us to deliver new versions of our software to their environments without breaking their operations, we wouldn’t have spent as much time fussing about whether this or that dependency was installed, or trying to figure out why web requests were failing at random in one environment or another, and we could have spent more time on the things that added value, like collecting user stories, automating the deployments, and improving our monitoring systems to provide higher levels of service.

There is one design practice that I read about, and was very confident would alleviate some of the headaches described above: feature flagging.

What is a Feature Flag

A feature flag is very simply a named object in your system that determines whether you’re going to do one thing, or do a different thing. Sometimes, they may just determine whether you’re going to do just one thing; there is no “go do this other thing.” Here’s an example of how we might use this. Pretend our system performs the following actions on a customer:

// Assume `newInformation` is something that's provided to us.
var customer = GetCustomer(customerId);
UpdateCustomer(customer, newInformation);

Let’s pretend that we want to add some new logic to our system. Maybe we’re considering giving customers new account credits.

// Assume `newInformation` is something that's provided to us.
var customer = GetCustomer(customerId);

if (featureFlags.IsEnabled("CustomerCreditFeature"))
{
    // If we're using a very functional style of programming, then
    // adding credits to the `newInformation` we've been given yields
    // a _new_ instance of `newInformation`, which is the
    // `newInformationWithCredits`.
    var credits = GetCustomerCredits(customer);
    var newInformationWithCredits = newInformation.AddCredits(credits);
    UpdateCustomer(customer, newInformationWithCredits);
}
else
{
    UpdateCustomer(customer);
}

The example is a little silly, but this is the canonical form of how you use feature flags in a system: you check to see if some feature has been turned on; when it has, do all the new stuff; when it has not, do the exact same thing you did before the flag was introduced. This is important: you must preserve the original behavior and semantics of the system. I’ll touch more on how you can achieve this later, when I discuss the best practices my team has established for ourselves.

Why Feature Flags

Let’s continue with the previous example, where we’re enabling some new Customer Credits Feature. The system depends on us implementing a bunch of new stuff:

  1. We have to somehow be able to GetCustomerCredits
  2. We have to implement some method of adding credit info to the other information we have about a customer
  3. Potentially, we need a different method of updating the customer, that understands the credit information we’re adding

Let’s talk about some of the reasons why we would do this. Pretend the team that’s making the changes to the silly code I’ve concocted above may not be the same team that’s building the actual Customer Credits backend. But this is a really important feature for your company, and you want to start getting as much of the code done as possible so that you can support the feature as soon as everything’s in place. You might even start “faking” those calls to the other system to obtain credits, to see if your system is truly ready to start integrating with the new functionality, during your existing test cycles. You might even start updating sample data from the other team each week, which enables you to identify if something changed in a way your system wasn’t prepared to handle. I call this slowly moving backend problem, where you’ve got a bunch of constituent changes that need to be made, and they all happen at a different pace.

Another reason you might want to use feature flags is that you have a steady, rolling update cadence to your system, but perhaps your users aren’t prepared to start using the new bits you and your team have been working hard on. You’ve got training that needs to be done, or a critical member of the partner team is out of office for the next two weeks. Maybe that person is in the office, but they’re uterrly swamped with work that is more important than your feature. I call this the on-hold problem, since the situtation is similar to being on-hold with a phone-call: you’re not getting anything done with those stakeholders until the person they need is available, or they figure out how to move forward without that person.

Yet another reason you might choose to use feature flags for certain features: you plan on doing a complete re-design of some or all of your application’s user experience. Perhaps you’re moving a bunch of logic off of your server and performing it in a client-page instead, as would be the case in moving code from an ASP.NET WebForms or MVC site to a site built using Angular or React. Perhaps you’re not changing your technology, but you’ve decided to make mobile-friendly versions of your pages, because they were built before mobile devices took over the world, or because your product did not have any demand for mobile. Let’s call this the requirements changed problem.

As an aside, in this last example, some people might say, “you don’t need feature flags, you need a better QA process!” They’re partially right: investing in your QA process is probably a good idea, for a variety of reasons, but those people are also wrong: maybe you’re starting out with an immature QA process or team, where you either don’t have the system requirements documented very well, or the team doesn’t understand how to test the system very well; perhaps you don’t have the tests recorded in any test plans; perhaps testing the system is very difficult because you haven’t created adequate tools to enable your testing team to do it very effectively, and they’re forced to only test a subset of your system before you can move forward. Regardless of the reason, the budget of your organization is not likely to magically change overnight, and neither is your capacity in QA. Good quality assurance is a skill, and you can’t just “improve” it. In all of these circumstances, you can’t just “fix” the QA process. That’s not how business works, at all, and such comments - in experience - are rarely productive. I’ve never had a single stakeholder complain about feature flags, and I’ve had many stakeholders express to me on multiple occassions how glad they are that we used them.

Final example, before we move on: for some features, you might want to give your customers an “opt-in” experience for new functionality, so they can decide for themselves when they’re ready to make the change. You might choose to do this on some experiemental new feature that affects all your users, and you want to get early feedback from users that are willing to try the latest version of the thing you’re building. This is my favorite use-case when dealing with changes in web applications that affect lots of customers. Usually, the documentation for a page is going to be driven by a handful of people that are proxies for final users. Those teams are proxies for the end-users they represent, and as we learned very quickly in one team I worked on, those proxies simply cannot get everything right for every customer or use-case. We got feedback, all the time, that some feature that we thought was perfect, based on design from experts that had either served in the same role as the people we were building the features for, or were very familiar with the role of a particular user and all the roles around it. They were truly experts, and we still got it wrong. Let’s refer to this problem as being the big group, poor representation problem, which is truly what it’s about: we’re empowering the end-user to preview the new experience and give us feedback about it, because we’ve decided to admit we don’t know everything.

Similar to the previous example, I feel compelled to spend a little extra time explaining this one. It would be very easy to say, “do better requirements gathering”, or “fix your analysis process.” For some problems, once you reach a certain scale, you can do perfect requirements gathering - absolutely. You can interview every single stakeholder for some piece of functionality. You can even do multiple rounds of interviews with those stakeholders, allowing them to see their own and everyone else’s feedback. But let’s be honest here: that’s not even practical. Your team can try to do that, but if the cost of trying to get feedback from every single customer doesn’t ruin your business, the wait time and lack of innovation will: some third party company will have a more efficient process for soliciting requirements that are good enough, rapidly design, build, release, and repeat, and eventually, you won’t have customers. I’m not saying you shouldn’t have focus groups, or that you should throw your hands up and say, “well, this problem’s to big; let’s just wing it!” I am saying that you shouldn’t allow the size of your user-base to stall you from making progress. This is what the DevOps movement was founded on: delivering steady streams of incremental progress and value to customers

When Should I Use Feature Flags

So I’ve talked about feature flagging quite a bit so far. I’ve come up with a handful of archetypal problems for why you might be interested in them, too:

  1. Slowly moving backends
  2. The “on-hold” problem
  3. The “requirements changed” problem, where you’re modernizing part of your system to be compatible with the fact that the world is different than it was eight years ago
  4. The “big group, poor representation” problem

These are mostly business, project, and program problems. As a developer, you can’t control how quickly other teams are able to build the dependencies you require in your applications (if such teams exist), and you certainly don’t have any control over any of your customers. But there is one thing you do have control over, and that’s how easily your features can be tested, piloted, and ultimately, adopted, by your stakeholders. You can give them an unparalleled degree of comfort with accepting your new changes, trying them out, and giving you feedback, as long as you do it effectively. Some of the cases described previously answer both why you should use feature flags, as well as the when, but there are also technical situtations:

  1. You’ve got a ton of servers that need to be updated, and your full deployment process is measured not in minutes or hours, but days or weeks
  2. You know that your feature works functionally, but you don’t know if it will cause a performance regression in production that will stall you or someone else’s application
  3. You have some functionality that is non-essential: your system can work in a degraded state without one feature, but it cannot work at all without some other feature

I spoke about some of the different installation environments that I’ve worked with, which are on the side of more complex in terms of environments I’ve worked with, in the beginning of this article, but what about performance? On July 4th, my team had a critical production incident come in. We had a critical process that was timing out in production, and we turned one of my features off because it is plainly not a required feature of the system; it’s a very important feature, that still has the potential to affect the company’s costs, but it is not a critical feature. In that situation, the bug was actually another part of the same system, where we had a handful of servers playing a recursive game of ping-pong over HTTP, which caused very long request queue lengths in those servers. In another situation, we changed the length of cache duration in another application, which caused the same critical process to start timing out due to high CPU and I/O operations on a shared database server.

To be sure, these are both operational and design issues: recursive ping-pong over HTTP? Come on, that shouldn’t even be possible in your production system, and we should be raising all sorts of alarm bells when the related conditions occur, except here’s the thing: we’re all human, we all make mistakes, and sometimes, mistakes happen. But, my teams’ use of feature flags has on more than one occassion helped us mitigate what would have otherwise been completely unmitigated disasters, in the span of a few minutes from the time we were alerted to the time we solved the problem. And recall that I said one of those problems was on July 4th. Nobody wanted to work that day on my team, but with the ability to disable non-critical parts of the system, we were able to very easily mitigate the problem, and wait until Monday to get to work on fixing it. None of us was happy about the fact that it was broken, but we were all happy that we could go back to enjoying time off with our family and friends, do a post-mortem first thing the next week, and prioritize those fixes above all other work to ensure it didn’t happen again.

Okay, I’m Sold. How do I do it?

Implementing feature flags in your system looks really simple at first. That’s certainly what I assumed. In some ways, it really is, but it turns out, there are still best-practices you should adhere to as well.

The first thing I’d recommend is coming up with a shared abstraction for feature flagging, and place performance criteria on it’s response times. If you decide to adopt feature flags with any degree of depth across your system, you’ll want to ensure that you’re doing it the same way everywhere, and that the calls to obtain flag lists occur within a bounded amount of time. You should know precisely how long that call should take, and monitor for it taking longer than it’s specified performance objective. The reason for this is simple: you don’t want to slow your entire system down just because you’ve decided to use feature flagging; that’s a very good way to irritate your users.

The next thing I’d recommend is that you load all of the feature flags for an entire transaction all at once. I do this at the application level, because I don’t have enough feature flags to justify loading a subset of them. The key is to ensure you have a scope that all the flags go in, and load all of them each time; avoid being clever, it’s not worth the complexity. The reason you should prefer to load all the flags for an entire scope all at once is so you don’t end up with a bunch of independent calls for a single flag inside of that scope happening on the same code path. Imagine having some code that’s called in a loop, and part of that loop is finding out whether or not a flag is enabled for part of your transaction, taking 27ms per iteration of the loop. Not good, to say the least. Additionally, flags should not change during your transaction: you don’t want to process some of the transaction one way, and another part a different way.

There is one caveat to my previous recommendation: if your configuration system allows for applications to subscribe to updates, then it’s fine for the flag changes. The key is ensuring that your publish/subscribe system actually publishes changes in a timely, deterministic fashion. If it’s even possible for you to lose track of subscribers, then you should just load the feature flags at the start of a transaction. In ASP.NET, this would be at the start of a controller action, and in something like an Angular service, it would be immediately prior to executing the service method. The reason for this guidance is that you want your system to be responsive to the end-users (even when you are the end-user making the change): if a flag is turned off, they should be able to expect it to immediately stop affecting the behavior of the system.

Another lesson learned in my team is that you should always log what path you’re taking after observing a feature flag. If you’re taking the “feature enabled” path, log the fact that you’re doing that. If you’re taking the “feature not enabled path” (when one exists), log that! The key is that your operations teams must be able to determine what the system is doing, and logging is often a critical part of that. In fact, the reason this has become standard guidance in my organization is that we actually broke the way that a feature flag is observed in a release, and started taking the wrong path based on the configured value for the flag. We caught this in deployment monitoring, and rolled everything back so that we could fix it and observe the flag correctly, until we can eventually retire the flag. Another note is that when you write a log entry related to a feature flag, you should include the name of the flag, and a brief summary of what will or will not be performed.

Always write Feature flags in the affirmative case: by setting this flag to True, you’re turning this thing on. Never make previous behavior contingent on a flag value being True. The reason for this is that you may not even have flag data available to your application; that’s the false case, and you don’t want to turn some new feature on because some other part of your system is either unavailable or unresponsive. This may mean that you have to cache the last response and fall back to it if the system that you query for flag data is unavailable: in that situtation, log that you’re using cached values, and make sure your cache has an expiration time on it that you agree on with your stakeholders.

If (or when) you implement feature flags, limit the scope of each flag. A feature flag should control exactly one thing in your system; if this means that you need to have two flags activated at the same time, then you should monitor for invalid configurations of the two flags. I recently put two flags per feature in my system for two different features; the first controls whether or not we even attempt some new logic, and the second determines whether or not we accept the result from the logic. The first flag is so we can write a stream of results from the new logic, which we review with customers. The second flag determines whether or not we record a change to an important value in our system. They’re both essential to our design, because they build in an enormous amount of safety: we can profile the system, and see if we’ve significantly affected the performance characteristics with our change, our customers can monitor the value for a week before bringing the change to their stakeholders, and then we can turn it on and get value. The decision has also added significant value to my teams’ relationships with other teams: it used to be that we’d make a change, and if it went south, we could face significant liability on a product with external customers. Now, we’ve created the ability to have monitoring periods, along with incremental adoption. My development team is no longer a service provider, but a partner to a team that I work closely with. Our relationship has never been better, and our decision to employ feature flags is a significant part of that change.

If you decide to start adopting feature flags, you should consider allowing flags to be set on individual nodes of your application (if you run an app on more than once instance). This enables you to ensure that your configuration works (ideally) for your entire cluster of machines running that workload prior to rollout across a cluster. This can be done with things like environment variables. The feature flagging system we use allows multiple sources, which have priority with respect to one another in order of most-to-least specific. This allows us to change the behavior of one node with affecting others. You should also be kind to your operations team with your design: make sure that your “standard” implementation of feature flagging does not require you to start bouncing (turning off and on) services all over the place. If you’re having to bounce servers for any reason when changing feature flags, you must make it explicitly clear under what circumstances that would be required (changing configuration files and environment variables is a good example) so your operations folks aren’t chasing their tails when your system is on fire.

When choosing feature flagging, you should have a central repository where you document all of your flags, including: flag names; applications affected; a description of what the flag controls; related flags, if any exist. Your operations team should be explicitly aware of this, be trained on how to access it, and how to correlate the information in your flags repository with the logs they’re seeing in your system. This is crucial in the teams I’ve worked on: if the operations team that is responsible for being on call can’t figure out what’s happening with the system, they need to contact the developer. If they have to contact the developer, who is several time zones away sleeping, you could see sustained outages for your stakeholders. Contacting that one person may not even be possible. This is a fantastic way for you to erode trust with your operations team and stakeholders alike. Suffice it to say, I recommend taking the time, and doing the due diligence, to avoid this scenario.

One of the things that I discussed in a recent code review was whether or not we should require a component to have a URL configured when a feature flag was not enabled. We essentially would create an object, figure out the URL was wrong, and then log a warning about it. Forget about the fact that the system may not work when we turned the feature on! The entire feature was to offload some logic to a remote server so we could share it across applications more easily (similar to moving a shared function out into a different micro-service), and the new version of the feature depended on being able to make that call over HTTP, except we would potentially allow invalid configurations of a remote HTTP endpoint. Oops! We wound up deciding that regardless of whether or not the feature was enabled, we would ensure the configuration was valid for the system being upgraded. The way we ultimately landed on the final decision was empathy: we didn’t want our operations folks to have to deal with some system configuration being wrong, and that condition lying dormant waiting for them to enable our new feature. Instead, we opted to make it something that would fail the entire deployment, giving us early feedback about whether or not we’d failed to update some portion of our deployment system.

My last recommendation for feature flags and adoption from my own experience is this: be transparent with your stakeholders. Tell them what you’re doing. Make them part of the experience. You can’t just do it in a silo, where you’re aware that it’s happening, but they have no idea. You’ll lose a lot of the value that way. If they know what you’re doing as a team, why you’re doing it, and how it benefits them, you’re going to substantially improve your relationship with those customers and stakeholders. You’re not just communicating an architectural and design decision that you’ve made as a team; you’re communicating that you want to be partners with those stakeholders, and that you want to empower them to have a seat at the table with you. That’s the goal in the first place: to build relationships, create partnerships, and build trust, which enables your team to be more effective. This will enable you to deploy faster, with more confidence, and increase the value your organization brings to your partners.

Closing

I hope you’ve enjoyed reading this post. If you’d like to discuss the article, please contact me using one of the means listed below. This is a topic I’m very passionate about, because I’ve experienced the benefit of making the transition, and would love to walk you through any other learnings I’ve had, or try to convince you of why these things could benefit your team.

- Brian