Feature Toggle Dynamics in Large-Scale Systems: Prevalence, Growth, Lifespan, and Benchmarking
Pith reviewed 2026-05-10 08:33 UTC · model grok-4.3
The pith
Removals of feature toggles lag behind additions in Kubernetes and GitLab, producing growing inventories and a small share of de facto permanent toggles.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Longitudinal analysis of commit histories in Kubernetes and GitLab reveals that toggle removals trail additions by roughly 35 percent and 13 percent respectively, so toggle inventories grow over time. Median lifespans are 734 days in Kubernetes and 185 days in GitLab. Between 0.73 percent and 1.33 percent of toggles exceed the longest removal durations previously observed and therefore function as permanent features. These observations are used to define a benchmarking framework consisting of five metrics and their corresponding threshold zones for assessing toggle-management health.
What carries the argument
Longitudinal extraction of feature-toggle addition and removal events from version-control commit histories, used to measure prevalence, growth rates, lifespan distributions, and the emergence of permanent toggles.
If this is right
- Toggle inventories will continue to expand unless removal rates increase to match or exceed addition rates.
- A measurable fraction of toggles will become permanent in any system that follows the observed lifespan patterns.
- Teams can use the five-metric framework and its threshold zones to diagnose and compare their toggle-management performance against the two studied projects.
- Public release of the extraction scripts and data sets allows other projects to apply the same measurement approach.
Where Pith is reading between the lines
- Projects could reduce accumulation by enforcing explicit removal deadlines tied to the median lifespan observed in comparable systems.
- The large difference in median lifespans between the two studied codebases suggests that organizational practices, rather than technical constraints alone, drive how long toggles persist.
- Applying the same measurement pipeline to additional systems would test whether the reported growth rates and permanent-toggle percentages generalize beyond these two examples.
Load-bearing premise
The events recorded in commit histories accurately represent the true intended addition and removal dates of every toggle.
What would settle it
A longitudinal study of another large system in which the cumulative number of toggle removals equals or exceeds the number of additions over the same multi-year period, or in which no toggle exceeds the longest previously observed removal interval.
Figures
read the original abstract
Feature toggles enable gradual rollouts and experimentation in software systems, yet often persist beyond their intended lifecycle, accumulating as technical debt. Prior research has examined feature toggle interactions and complexity, but no longitudinal study has quantified how toggles evolve over time across different organizational contexts. We analyse over 4,000 toggle events in Kubernetes (10 MLoC, 8.5 years) and GitLab (5 MLoC, 5 years). We find that feature toggle removals lags behind additions in both systems (by roughly 35% and 13%, respectively), leading to growing toggle inventories. Their lifespan patterns also differ notably, with Kubernetes toggles lasting a median of 734 days versus 185 in GitLab. Then, some feature toggles (1.33% and 0.73%, respectively) exceed all previously observed removal durations, becoming de facto permanent. Building on these findings, we propose a benchmarking framework with five key metrics and their empirically derived threshold zones, enabling practitioners to assess and compare toggle management practices across projects. All scripts and data are publicly available.
Editorial analysis
A structured set of objections, weighed in public.
Circularity Check
No circularity: results are direct empirical counts and medians from commit data
full rationale
The paper performs a longitudinal empirical analysis of over 4,000 toggle events extracted from the commit histories of two independent large-scale projects (Kubernetes and GitLab). Reported quantities such as removal lags (35% and 13%), median lifespans (734 vs 185 days), and de-facto permanent toggle percentages (1.33% and 0.73%) are computed directly as counts, ratios, and medians from the observed addition/removal events. The proposed five-metric benchmarking framework and its threshold zones are derived from these same empirical distributions. No equations, fitted parameters, predictions, or self-citations are present that reduce any claimed result to its inputs by construction. The derivation chain consists solely of data extraction followed by descriptive statistics, with no self-definitional, fitted-input, or uniqueness-imported steps.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Toggle addition and removal events can be reliably identified from version-control commit histories of the two projects.
Reference graph
Works this paper leans on
-
[1]
BMJ317, 1572–1580 (1998).https://doi.org/10.1136/bmj.317.7172.1572
Bland, J.M., Altman, D.G.: Survival probabilities (the Kaplan-Meier method). BMJ317, 1572–1580 (1998).https://doi.org/10.1136/bmj.317.7172.1572
-
[2]
Davies, A.: Feature toggles: The good, the bad, and the ugly (2018),https://ww w.youtube.com/watch?v=r7VI5x2XKXw
work page 2018
-
[3]
Ega, S.S., Motamarri, V.: Feature flags and configuration: Balancing flexibility with maintainability in software development. JECS4(8), 751–760 (2025),https: //sarcouncil.com/2025/08/feature-flags-and-configuration-balancing-f lexibility-with-maintainability-in-software-development
work page 2025
-
[4]
Ferranti, M.: 11 open-source feature flag tools (2024),https://www.getunleash .io/blog/11-open-source-feature-flag-tools
work page 2024
-
[5]
GitHub: Octoverse: A new developer joins GitHub every second as AI leads Type- Script to#1 (2025),https://github.blog/news-insights/octoverse/octovers e-a-new-developer-joins-github-every-second-as-ai-leads-typescript-t o-1/
work page 2025
-
[6]
GitHub: What 986 million code pushes say about the developer workflow in 2025 (2025),https://github.blog/news-insights/octoverse/what-986-million-c ode-pushes-say-about-the-developer-workflow-in-2025/
work page 2025
-
[7]
GitHub Engineering: How we ship code faster and safer with feature flags (2021), https://github.blog/engineering/ship-code-faster-safer-feature-flags/
work page 2021
-
[8]
Harmes, R.: Flipping out (December 2009),https://code.flickr.net/2009/12/ 02/flipping-out/
work page 2009
-
[9]
Hodgson, P.: Feature toggles (aka feature flags) (October 09, 2017),https://mart infowler.com/articles/feature-toggles.html
work page 2017
-
[10]
Em- pirical Software Engineering26(2021).https://doi.org/10.1007/s10664-020 -09902-y
Hoyos, J., Abdalkareem, R., Mujahid, S., Shihab, E., Bedoya, A.E.: On the removal of feature toggles: A study of python projects and practitioners motivations. Em- pirical Software Engineering26(2021).https://doi.org/10.1007/s10664-020 -09902-y
-
[11]
Humble, J., Farley, D.: Continuous delivery: reliable software releases through build, test, and deployment automation. Pearson Education (2010)
work page 2010
-
[12]
Empirical Software Engineering21(2), 449–482 (2016).https: //doi.org/10.1007/s10664-015-9360-1
Hunsen, C., Zhang, B., Siegmund, J., Kästner, C., Leßenich, O., Becker, M., Apel, S.: Preprocessor-based variability in open-source and industrial software systems: An empirical study. Empirical Software Engineering21(2), 449–482 (2016).https: //doi.org/10.1007/s10664-015-9360-1
-
[13]
Jézéquel, J.M., Kienzle, J., Acher, M.: From feature models to feature toggles in practice. In: SPLC. pp. 234–244. ACM (2022).https://doi.org/10.1145/3546 932.3547009
-
[14]
Ketkar, A., Ramos, D., Clapp, L., Barik, R., Ramanathan, M.K.: A lightweight polyglot code transformation language. Proceedings of the ACM on Programming Languages8(PLDI), 1288–1312 (2024).https://doi.org/10.1145/3656429
-
[15]
ACM Transactions on Software Engineering and Methodology (2025).https: //doi.org/10.1145/3729423
Kuiter, E., Sundermann, C., Thüm, T., Heß, T., Krieter, S., Saake, G.: How con- figurable is the Linux kernel? Analyzing two decades of feature-model history. ACM Transactions on Software Engineering and Methodology (2025).https: //doi.org/10.1145/3729423
-
[16]
Information and Software Technology 145, 106813 (2022).https://doi.org/10.1016/j.infsof.2021.106813
Mahdavi-Hezaveh, R., Ajmeri, N., Williams, L.: Feature toggles as code: Heuristics and metrics for structuring feature toggles. Information and Software Technology 145, 106813 (2022).https://doi.org/10.1016/j.infsof.2021.106813
-
[17]
Empirical Software Engineering26(1) (January 2021).https://doi.org/10.1007/s10664-020-09901-z 20 Xh
Mahdavi-Hezaveh, R., Dremann, J., Williams, L.: Software development with fea- ture toggles: Practices used by practitioners. Empirical Software Engineering26(1) (January 2021).https://doi.org/10.1007/s10664-020-09901-z 20 Xh. Tërnava
-
[18]
Meinicke, J., Hoyos, J., Vasilescu, B., Kästner, C.: Capture the feature flag: De- tecting feature flags in open-source. In: MSR. p. 169–173. ACM (2020).https: //doi.org/10.1145/3379597.3387463
-
[19]
Meinicke, J., Wong, C.P., Vasilescu, B., Kästner, C.: Exploring differences and commonalities between feature flags and configuration options. In: ICSE-SEP. p. 233–242. ACM (2020).https://doi.org/10.1145/3377813.3381366
-
[20]
Neely,S.,Stolt,S.:Continuousdelivery?Easy!Justchangeeverything(well,maybe it is not that easy). In: 2013 Agile Conference. pp. 121–128. IEEE (2013).https: //doi.org/10.1109/AGILE.2013.17
-
[21]
OpenFeature: OpenFeature: Standardizing feature flagging for everyone (2025), https://openfeature.dev/
work page 2025
-
[22]
Osherove, R.: Feature toggle framework list (2021),https://pipelinedriven.o rg/feature-toggle-frameworks-list/
work page 2021
-
[23]
Prutchi, E.S., de S. Campos Junior, H., Murta, L.G.P.: How the adoption of feature toggles correlates with branch merges and defects in open-source projects? SPE 52(2), 506–536 (2022).https://doi.org/10.1002/spe.3034
-
[24]
Rahman, M.T., Querel, L.P., Rigby, P.C., Adams, B.: Feature toggles: Practitioner practices and a case study. In: MSR. p. 201–211. ACM (2016).https://doi.org/ 10.1145/2901739.2901745
-
[25]
Rahman, T.: Feature toggle usage patterns: A case study on Google Chromium. In: MSR. pp. 142–147. IEEE (2023).https://doi.org/10.1109/MSR59073.2023. 00032
-
[26]
Rahman, T., Shalabi, I., Sharma, T.: Exploring influence of feature toggles on code complexity. In: EASE. p. 363–368. ACM (2024).https://doi.org/10.1145/3661 167.3661190
-
[27]
Ramanathan,M.K.,Clapp,L.,Barik,R.,Sridharan,M.:Piranha:Reducingfeature flag debt at Uber. In: ICSE. p. 221–230 (2020).https://doi.org/10.1145/3377 813.3381350
-
[28]
Reddit: What is the ideal way to add implementation switch or feature flags in code? (2024),https://www.reddit.com/r/ExperiencedDevs/comments/1cb2mz m/what_is_the_ideal_way_to_add_implementation/
work page 2024
-
[29]
Reflag: Building AI flag cleanup (2025),https://reflag.com/blog/building-a i-flag-cleanup
work page 2025
-
[30]
Schröder, M., Kevic, K., Gopstein, D., Murphy, B., Beckmann, J.: Discovering feature flag interdependencies in Microsoft Office. In: ESEC/FSE. p. 1419–1429. ACM (2022).https://doi.org/10.1145/3540250.3558942
-
[31]
On the dual nature of necessity in use of rust unsafe code,
Shackleton, W., Cohn-Gordon, K., Rigby, P.C., Abreu, R., Gill, J., Nagappan, N., Nakad, K., Papagiannis, I., Petre, L., Megreli, G., et al.: Dead code removal at Meta: Automatically deleting millions of lines of code and petabytes of deprecated data. In: ESEC/FSE (2023).https://doi.org/10.1145/3611643.3613871
- [32]
- [33]
-
[34]
getunleash.io/blog/feature-toggle-life-time-best-practices
Østhus, E.: Feature toggle life time best practices (June 25 2021),https://www. getunleash.io/blog/feature-toggle-life-time-best-practices
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.