Why feature flags aren't enough

“Do not confuse feature flags with application configuration.”

‍

Feature flagging — or its more generic counterpart, dynamic configuration¹ — has risen in prominence over the last decade. It provides numerous benefits, such as decoupling the product release lifecycle from the software release lifecycle, encouraging smaller and more frequent software updates, and enabling continuous delivery.

Configuration pitfalls

However, certain pitfalls have become clear:

Dynamic configuration adds complexity to your code. Instead of having a single artifact describe the behavior of your software, logic now lives in two places — your codebase and the dynamic configuration provider.
Adding a new runtime dependency to your application leads to new failure modes. Dynamic configuration is typically bound at runtime, whereas code constants are bound at compile time.
Stale configuration leads to technical debt. If temporary flags are not cleaned up, and permanent ones aren’t correctly labeled, you end up with a massive list of configs that is hard to make sense of.

These pitfalls have led to the emergence of a conservative narrative:

“It is recommended that feature flags be kept as short-lived as possible and their number as low as needed.” — ConfigCat
“Do not confuse flags with application configuration.” — Unleash

Why feature flags fail

Although the pitfalls are real and some apprehension is warranted, this narrative may do more harm than good.

What about temporary flags that get repurposed as permanent kill switches?
What about configs that enable more verbose logging that you need to keep around in case of an outage?
Are simple on/off feature flags enough? What if you wish to dynamically tune the number of resources you allocate for a background process?

Even if you’ve adopted a flagging tool, product requirements often change more frequently and drastically than we like to admit. Imagine this hypothetical: you work at a SaaS analytics company and are tasked with implementing a new query algorithm that uses fewer compute resources. You write the algorithm and roll it out gradually to customers via a feature flag. Eventually, you hit a customer of a specific scale for whom the query is prohibitively slow.

How do you proceed with the rollout? Do you rewrite the algorithm to accommodate the large customer? Do you wait for the storage team to redistribute the customer’s data, which could take weeks to get prioritized? What if product requirements change and there is a more high-value project that you are pulled into? Should the flag be treated as temporary or permanent?

Partial solutions create new problems

Once you’ve worked with full application configuration tools, as my team built at Meta, going back to traditional feature flags makes all the shortcomings and pitfalls immediately clear. But trying to build your own tools proves more complicated and harder than it first appears.

We operate in an environment with imperfect information. We need tools that are flexible, not rigid, to cope with changing requirements. Many tools offer a subset of the features engineers need to do their jobs. This often leads to companies adopting what we call the hidden configuration stack, where software logic is split across three different places:

A 3rd party feature flagging provider like LaunchDarkly for product-facing feature flags and experimentation.
A homegrown solution for low-level software configuration.
Code.

Depending on the company, the homegrown solution may be at varying maturity levels. We have come across companies with flat YAML files uploaded to S3, companies who write handcrafted SQL statements to store customer configuration in a JSON column in MySQL, and companies that rely on Kubernetes ConfigMaps (despite its size limit) with a thin layer of tooling around it.

This hidden configuration stack accentuates the pitfalls introduced earlier.

Complexity is more pronounced because logic now lives in three places instead of just two.
Depending on the architecture of the homegrown solution, you now have two runtime dependencies for your application instead of one.
You now have two separate systems with stale configs that need cleaning up. Moreover, given a use case, it remains unclear where its configuration should live.

The disparate systems often have feature gaps. For example, a 3rd party solution may have a better UI and rules engine for targeting, while a homegrown solution may be more reliable.

Tech giants’ tools, for every size customer

Engineers are wired to unblock themselves. The fact that this stack exists points to an unmet need. In recent years, more companies have built proprietary configuration solutions. Notable examples include:

ConfigBus at Twitter → A git-based configuration system that supports code review, schema checks, and custom validation. It uses Zookeeper to distribute the configuration reliably across clusters.
Configerator at Facebook → Uses code to express configuration. It supports importing configuration dependencies and code reuse. Most engineers never have to edit their configuration directly.
Flipr at Uber → It supports a generic rules engine, allowing users to define targeting based not just on users but on any other dimension like city, geography, data center region, or time.
ConfigDB at Plaid → Supports a structured, expressive data model (Protobuf) along with programmatic writes so that machines can generate configs, not just humans.

While these companies have the resources to build sophisticated systems, many smaller companies do not, yet are compelled to build proprietary systems that offer some of these features.

As an industry, we are better served by (a) recognizing reality instead of hiding underneath the cloak of best practices and (b) building better tools. While it is true that dynamic configuration causes entropy, it has obvious benefits. Dynamic updates, staged rollouts, quick rollbacks, and organization-wide access. What if we built tools to control this entropy? Imagine a system with native observability, the ability to detect stale configs, and suggest their removal from your code. We can build systems that combine simplicity and reliability of hard-coded configs with flexibility and dynamism of centralized storage.

Although configuration management use cases can be very diverse, it is feasible and beneficial to support all of them on top of a uniform and flexible foundation, with additional tools providing specialized functions. Otherwise, inferior wheels will be reinvented. — Holistic Configuration Management at Facebook

At Lekko, we care deeply about solving these problems for all software teams. We aim to clean up the hidden configuration stack and empower teams with a unified configuration platform.

‍

Sergey Passichenko

Founding Engineer

Lekko

Ex-Meta

Posted on

October 6, 2023

Updates

Why feature flags aren't enough

Configuration pitfalls

Why feature flags fail

Partial solutions create new problems

Tech giants’ tools, for every size customer

Twelve-Factor App Config is Obsolete

Announcing Lekko's Seed Round