pith. sign in

arxiv: 2604.15872 · v1 · submitted 2026-04-17 · 💻 cs.SE

Feature Toggle Dynamics in Large-Scale Systems: Prevalence, Growth, Lifespan, and Benchmarking

Pith reviewed 2026-05-10 08:33 UTC · model grok-4.3

classification 💻 cs.SE
keywords feature togglestechnical debtsoftware evolutionlongitudinal studycommit history miningbenchmarking frameworkKubernetesGitLab
0
0 comments X

The pith

Removals of feature toggles lag behind additions in Kubernetes and GitLab, producing growing inventories and a small share of de facto permanent toggles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks more than four thousand feature toggle events across eight and a half years in Kubernetes and five years in GitLab by extracting additions and removals from commit histories. It shows that additions consistently outrun removals, so the total number of toggles in each codebase keeps rising. Median lifespans differ sharply between the two projects, and a small percentage of toggles outlive every previously recorded removal, effectively becoming permanent. From these patterns the authors derive five metrics together with empirically set threshold zones that let teams compare how well they are managing their own toggles.

Core claim

Longitudinal analysis of commit histories in Kubernetes and GitLab reveals that toggle removals trail additions by roughly 35 percent and 13 percent respectively, so toggle inventories grow over time. Median lifespans are 734 days in Kubernetes and 185 days in GitLab. Between 0.73 percent and 1.33 percent of toggles exceed the longest removal durations previously observed and therefore function as permanent features. These observations are used to define a benchmarking framework consisting of five metrics and their corresponding threshold zones for assessing toggle-management health.

What carries the argument

Longitudinal extraction of feature-toggle addition and removal events from version-control commit histories, used to measure prevalence, growth rates, lifespan distributions, and the emergence of permanent toggles.

If this is right

  • Toggle inventories will continue to expand unless removal rates increase to match or exceed addition rates.
  • A measurable fraction of toggles will become permanent in any system that follows the observed lifespan patterns.
  • Teams can use the five-metric framework and its threshold zones to diagnose and compare their toggle-management performance against the two studied projects.
  • Public release of the extraction scripts and data sets allows other projects to apply the same measurement approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Projects could reduce accumulation by enforcing explicit removal deadlines tied to the median lifespan observed in comparable systems.
  • The large difference in median lifespans between the two studied codebases suggests that organizational practices, rather than technical constraints alone, drive how long toggles persist.
  • Applying the same measurement pipeline to additional systems would test whether the reported growth rates and permanent-toggle percentages generalize beyond these two examples.

Load-bearing premise

The events recorded in commit histories accurately represent the true intended addition and removal dates of every toggle.

What would settle it

A longitudinal study of another large system in which the cumulative number of toggle removals equals or exceeds the number of additions over the same multi-year period, or in which no toggle exceeds the longest previously observed removal interval.

Figures

Figures reproduced from arXiv: 2604.15872 by Xhevahire T\"ernava.

Figure 1
Figure 1. Figure 1: Added vs. removed feature gates in Kubernetes and flags in GitLab of all eight subdirectories under the config/feature_flags/ directory. For each file and directory, we parsed the commit logs and identified toggle additions and removals through pattern matching on the unified diff output. Lines pre￾fixed with + indicated additions, and lines prefixed with - indicated removals. For Kubernetes, we matched fe… view at source ↗
Figure 2
Figure 2. Figure 2: Daily active toggles in Kubernetes and GitLab, cumulative values over the project’s history. We first noted that in the last analysed commits, 155 feature gates in Kubernetes and 403 feature flags in GitLab are active [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of feature toggles longevity in Kubernetes and GitLab RQ2 insights: Active feature toggles generally accumulate over a software project’s lifecycle, though growth rates vary significantly (from 13 to 79 feature toggles per year in our study). When normalized by codebase size, feature toggle density can differ fivefold across projects, reflecting different release cycles, organizational practic… view at source ↗
Figure 4
Figure 4. Figure 4: Kaplan-Meier survival curves showing feature toggle lifespans in Kubernetes and GitLab. Active toggles are marked with vertical ticks, whereas red marks indicate those exceeding the maximum observed lifespan of removed toggles anomalies (e.g., rebasing or cherry-picking across branches). These represent less than 1% of all removed toggles and do not affect our findings (cf. Section 6). The maximum observed… view at source ↗
read the original abstract

Feature toggles enable gradual rollouts and experimentation in software systems, yet often persist beyond their intended lifecycle, accumulating as technical debt. Prior research has examined feature toggle interactions and complexity, but no longitudinal study has quantified how toggles evolve over time across different organizational contexts. We analyse over 4,000 toggle events in Kubernetes (10 MLoC, 8.5 years) and GitLab (5 MLoC, 5 years). We find that feature toggle removals lags behind additions in both systems (by roughly 35% and 13%, respectively), leading to growing toggle inventories. Their lifespan patterns also differ notably, with Kubernetes toggles lasting a median of 734 days versus 185 in GitLab. Then, some feature toggles (1.33% and 0.73%, respectively) exceed all previously observed removal durations, becoming de facto permanent. Building on these findings, we propose a benchmarking framework with five key metrics and their empirically derived threshold zones, enabling practitioners to assess and compare toggle management practices across projects. All scripts and data are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Circularity Check

0 steps flagged

No circularity: results are direct empirical counts and medians from commit data

full rationale

The paper performs a longitudinal empirical analysis of over 4,000 toggle events extracted from the commit histories of two independent large-scale projects (Kubernetes and GitLab). Reported quantities such as removal lags (35% and 13%), median lifespans (734 vs 185 days), and de-facto permanent toggle percentages (1.33% and 0.73%) are computed directly as counts, ratios, and medians from the observed addition/removal events. The proposed five-metric benchmarking framework and its threshold zones are derived from these same empirical distributions. No equations, fitted parameters, predictions, or self-citations are present that reduce any claimed result to its inputs by construction. The derivation chain consists solely of data extraction followed by descriptive statistics, with no self-definitional, fitted-input, or uniqueness-imported steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study is purely empirical and therefore rests on assumptions about data extraction accuracy and representativeness rather than mathematical axioms or new postulated entities.

axioms (1)
  • domain assumption Toggle addition and removal events can be reliably identified from version-control commit histories of the two projects.
    The entire analysis depends on parsing commit logs to count toggle events over multi-year periods.

pith-pipeline@v0.9.0 · 5491 in / 1265 out tokens · 27686 ms · 2026-05-10T08:33:06.711585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

  1. [1]

    BMJ317, 1572–1580 (1998).https://doi.org/10.1136/bmj.317.7172.1572

    Bland, J.M., Altman, D.G.: Survival probabilities (the Kaplan-Meier method). BMJ317, 1572–1580 (1998).https://doi.org/10.1136/bmj.317.7172.1572

  2. [2]

    Davies, A.: Feature toggles: The good, the bad, and the ugly (2018),https://ww w.youtube.com/watch?v=r7VI5x2XKXw

  3. [3]

    JECS4(8), 751–760 (2025),https: //sarcouncil.com/2025/08/feature-flags-and-configuration-balancing-f lexibility-with-maintainability-in-software-development

    Ega, S.S., Motamarri, V.: Feature flags and configuration: Balancing flexibility with maintainability in software development. JECS4(8), 751–760 (2025),https: //sarcouncil.com/2025/08/feature-flags-and-configuration-balancing-f lexibility-with-maintainability-in-software-development

  4. [4]

    Ferranti, M.: 11 open-source feature flag tools (2024),https://www.getunleash .io/blog/11-open-source-feature-flag-tools

  5. [5]

    GitHub: Octoverse: A new developer joins GitHub every second as AI leads Type- Script to#1 (2025),https://github.blog/news-insights/octoverse/octovers e-a-new-developer-joins-github-every-second-as-ai-leads-typescript-t o-1/

  6. [6]

    GitHub: What 986 million code pushes say about the developer workflow in 2025 (2025),https://github.blog/news-insights/octoverse/what-986-million-c ode-pushes-say-about-the-developer-workflow-in-2025/

  7. [7]

    GitHub Engineering: How we ship code faster and safer with feature flags (2021), https://github.blog/engineering/ship-code-faster-safer-feature-flags/

  8. [8]

    Harmes, R.: Flipping out (December 2009),https://code.flickr.net/2009/12/ 02/flipping-out/

  9. [9]

    Hodgson, P.: Feature toggles (aka feature flags) (October 09, 2017),https://mart infowler.com/articles/feature-toggles.html

  10. [10]

    Em- pirical Software Engineering26(2021).https://doi.org/10.1007/s10664-020 -09902-y

    Hoyos, J., Abdalkareem, R., Mujahid, S., Shihab, E., Bedoya, A.E.: On the removal of feature toggles: A study of python projects and practitioners motivations. Em- pirical Software Engineering26(2021).https://doi.org/10.1007/s10664-020 -09902-y

  11. [11]

    Pearson Education (2010)

    Humble, J., Farley, D.: Continuous delivery: reliable software releases through build, test, and deployment automation. Pearson Education (2010)

  12. [12]

    Empirical Software Engineering21(2), 449–482 (2016).https: //doi.org/10.1007/s10664-015-9360-1

    Hunsen, C., Zhang, B., Siegmund, J., Kästner, C., Leßenich, O., Becker, M., Apel, S.: Preprocessor-based variability in open-source and industrial software systems: An empirical study. Empirical Software Engineering21(2), 449–482 (2016).https: //doi.org/10.1007/s10664-015-9360-1

  13. [13]

    In: SPLC

    Jézéquel, J.M., Kienzle, J., Acher, M.: From feature models to feature toggles in practice. In: SPLC. pp. 234–244. ACM (2022).https://doi.org/10.1145/3546 932.3547009

  14. [14]

    Proceedings of the ACM on Programming Languages8(PLDI), 1288–1312 (2024).https://doi.org/10.1145/3656429

    Ketkar, A., Ramos, D., Clapp, L., Barik, R., Ramanathan, M.K.: A lightweight polyglot code transformation language. Proceedings of the ACM on Programming Languages8(PLDI), 1288–1312 (2024).https://doi.org/10.1145/3656429

  15. [15]

    ACM Transactions on Software Engineering and Methodology (2025).https: //doi.org/10.1145/3729423

    Kuiter, E., Sundermann, C., Thüm, T., Heß, T., Krieter, S., Saake, G.: How con- figurable is the Linux kernel? Analyzing two decades of feature-model history. ACM Transactions on Software Engineering and Methodology (2025).https: //doi.org/10.1145/3729423

  16. [16]

    Information and Software Technology 145, 106813 (2022).https://doi.org/10.1016/j.infsof.2021.106813

    Mahdavi-Hezaveh, R., Ajmeri, N., Williams, L.: Feature toggles as code: Heuristics and metrics for structuring feature toggles. Information and Software Technology 145, 106813 (2022).https://doi.org/10.1016/j.infsof.2021.106813

  17. [17]

    Empirical Software Engineering26(1) (January 2021).https://doi.org/10.1007/s10664-020-09901-z 20 Xh

    Mahdavi-Hezaveh, R., Dremann, J., Williams, L.: Software development with fea- ture toggles: Practices used by practitioners. Empirical Software Engineering26(1) (January 2021).https://doi.org/10.1007/s10664-020-09901-z 20 Xh. Tërnava

  18. [18]

    Meinicke, J., Hoyos, J., Vasilescu, B., Kästner, C.: Capture the feature flag: De- tecting feature flags in open-source. In: MSR. p. 169–173. ACM (2020).https: //doi.org/10.1145/3379597.3387463

  19. [19]

    In: Proceedings of the 42nd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP ’20)

    Meinicke, J., Wong, C.P., Vasilescu, B., Kästner, C.: Exploring differences and commonalities between feature flags and configuration options. In: ICSE-SEP. p. 233–242. ACM (2020).https://doi.org/10.1145/3377813.3381366

  20. [20]

    In: 2013 Agile Conference

    Neely,S.,Stolt,S.:Continuousdelivery?Easy!Justchangeeverything(well,maybe it is not that easy). In: 2013 Agile Conference. pp. 121–128. IEEE (2013).https: //doi.org/10.1109/AGILE.2013.17

  21. [21]

    OpenFeature: OpenFeature: Standardizing feature flagging for everyone (2025), https://openfeature.dev/

  22. [22]

    Osherove, R.: Feature toggle framework list (2021),https://pipelinedriven.o rg/feature-toggle-frameworks-list/

  23. [23]

    Prutchi, E.S., de S. Campos Junior, H., Murta, L.G.P.: How the adoption of feature toggles correlates with branch merges and defects in open-source projects? SPE 52(2), 506–536 (2022).https://doi.org/10.1002/spe.3034

  24. [24]

    Rahman, M.T., Querel, L.P., Rigby, P.C., Adams, B.: Feature toggles: Practitioner practices and a case study. In: MSR. p. 201–211. ACM (2016).https://doi.org/ 10.1145/2901739.2901745

  25. [25]

    Rahman, T.: Feature toggle usage patterns: A case study on Google Chromium. In: MSR. pp. 142–147. IEEE (2023).https://doi.org/10.1109/MSR59073.2023. 00032

  26. [26]

    In: EASE

    Rahman, T., Shalabi, I., Sharma, T.: Exploring influence of feature toggles on code complexity. In: EASE. p. 363–368. ACM (2024).https://doi.org/10.1145/3661 167.3661190

  27. [27]

    In: ICSE

    Ramanathan,M.K.,Clapp,L.,Barik,R.,Sridharan,M.:Piranha:Reducingfeature flag debt at Uber. In: ICSE. p. 221–230 (2020).https://doi.org/10.1145/3377 813.3381350

  28. [28]

    Reddit: What is the ideal way to add implementation switch or feature flags in code? (2024),https://www.reddit.com/r/ExperiencedDevs/comments/1cb2mz m/what_is_the_ideal_way_to_add_implementation/

  29. [29]

    Reflag: Building AI flag cleanup (2025),https://reflag.com/blog/building-a i-flag-cleanup

  30. [30]

    In: ESEC/FSE

    Schröder, M., Kevic, K., Gopstein, D., Murphy, B., Beckmann, J.: Discovering feature flag interdependencies in Microsoft Office. In: ESEC/FSE. p. 1419–1429. ACM (2022).https://doi.org/10.1145/3540250.3558942

  31. [31]

    On the dual nature of necessity in use of rust unsafe code,

    Shackleton, W., Cohn-Gordon, K., Rigby, P.C., Abreu, R., Gill, J., Nagappan, N., Nakad, K., Papagiannis, I., Petre, L., Megreli, G., et al.: Dead code removal at Meta: Automatically deleting millions of lines of code and petabytes of deprecated data. In: ESEC/FSE (2023).https://doi.org/10.1145/3611643.3613871

  32. [32]

    In: VaMoS

    Tërnava, X., Lesoil, L., Randrianaina, G.A., Khelladi, D.E., Acher, M.: On the interaction of feature toggles. In: VaMoS. ACM (2022).https://doi.org/10.114 5/3510466.3510485

  33. [33]

    In: SBQS

    Wolfart, D., Assunção, W.K.G., Martinez, J.: Variability debt: Characterization, causes and consequences. In: SBQS. pp. 1–10. ACM (2021).https://doi.org/10 .1145/3493244.3493250

  34. [34]

    getunleash.io/blog/feature-toggle-life-time-best-practices

    Østhus, E.: Feature toggle life time best practices (June 25 2021),https://www. getunleash.io/blog/feature-toggle-life-time-best-practices