How Developers Adopt, Use, and Evolve CI/CD Caching: An Empirical Study on GitHub Actions
Pith reviewed 2026-05-10 15:02 UTC · model grok-4.3
The pith
Cache-adopting GitHub repositories are more active than non-adopters, and their caching setups evolve through frequent human-driven fixes and later bot-driven version updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Through examination of 266 cache-adopting repositories and 686 non-adopters, the work establishes that cache adopters exhibit greater activity and popularity; that caching is applied across multiple CI/CD job types using a variety of mechanisms rather than one standard approach; that caching configurations undergo frequent, repetitive changes with faster evolution in build and test jobs; and that parameter updates are mainly human-driven to resolve issues while version updates occur later and are frequently bot-driven for dependency maintenance.
What carries the argument
The classification of 17,185 workflow configuration changes across 10,373 commits, distinguishing cache-related modifications by type (parameter vs version) and by actor (human vs bot) within GitHub Actions workflow files.
If this is right
- Cache-adopting repositories show higher levels of activity and popularity compared with non-adopters.
- Caching appears across many CI/CD job types through diverse mechanisms instead of a single standardized method.
- Caching configurations change frequently in repetitive patterns, with quicker evolution in build and test jobs than in other types.
- Parameter updates are driven mainly by humans to fix problems, whereas version updates happen later and are often performed by bots for dependency maintenance.
Where Pith is reading between the lines
- Tooling that automates parameter tuning based on common failure patterns could reduce the human maintenance burden documented in the study.
- The observed variety of caching approaches points to a need for platform-level defaults or templates that might lower the barrier to effective use.
- The distinction between human and bot drivers suggests that dependency bots already handle part of the work, leaving opportunity to extend similar automation to parameter-level fixes.
Load-bearing premise
The 952 repositories and their parsed workflows and commits form a representative sample of GitHub Actions usage without major selection or parsing bias.
What would settle it
A larger or differently sampled study that finds no difference in activity or popularity between cache-adopting and non-adopting repositories would undermine the reported observations.
Figures
read the original abstract
Continuous Integration/Continuous Delivery (CI/CD) caching is widely used to reduce repeated computation and improve CI/CD efficiency, yet maintaining effective caching requires ongoing maintenance effort. In this paper, we present the first empirical study on how developers configure and evolve caching in CI/CD workflows on GitHub Actions. We analyze 952 GitHub repositories (266 cache adopters and 686 non-adopters), to compare repository characteristics, characterize caching usage at the job and step levels, uncover patterns in caching configuration evolution, and identify the drivers of cache-related changes. Our analysis spans 1,556 workflow files, 10,373 commits, and 17,185 workflow configuration changes, including an average of 9.37 cache-related changes per repository. Our main observations are: (1) cache-adopting repositories are more active and popular than non-adopters; (2) caching is used across multiple CI/CD job types through a variety of caching mechanisms rather than a single standardized approach; (3) caching configurations evolve through frequent, repetitive maintenance patterns, with rapid updates in build and test jobs and slower evolution in other job types; and (4) cache-related modifications are driven by distinct maintenance needs: parameter updates are mainly human-driven to fix issues, while version updates occur later and are often bot-driven for dependency maintenance. Our findings quantify the substantial maintenance effort involved in CI/CD caching and highlight opportunities to improve reliability and tool support.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first empirical study of CI/CD caching adoption, usage, and evolution in GitHub Actions. It analyzes 952 repositories (266 cache adopters vs. 686 non-adopters), 1,556 workflow files, 10,373 commits, and 17,185 configuration changes. Key claims are that (1) cache-adopting repositories are more active and popular, (2) caching appears across job types via diverse mechanisms rather than a single standard, (3) configurations evolve through frequent repetitive maintenance (rapid in build/test jobs, slower elsewhere), and (4) parameter updates are mostly human-driven for fixes while version updates are later and often bot-driven for dependencies. The work quantifies maintenance effort and suggests tool improvements.
Significance. If the dataset and classifications are representative, the study offers concrete, large-scale evidence on the practical costs of CI/CD caching in open-source projects. It is the first such analysis focused on GitHub Actions, provides falsifiable patterns (e.g., job-type differences in evolution speed), and directly supports recommendations for better caching tooling. The public-data basis in principle allows replication.
major comments (2)
- [§3] §3 (Data Collection and Filtering): The selection of the 952 repositories and the detection of the 266 cache adopters are described only at a high level. No explicit search queries, date ranges, popularity thresholds, or exclusion criteria are provided, nor is there validation that the parsing correctly identifies cache steps (e.g., actions/cache or equivalent). Because every subsequent comparison and statistic conditions on this sample, the risk of selection bias directly undermines observation (1) and the generalizability of (2)–(4).
- [§4.3, §5] §4.3 and §5 (Evolution Analysis): The classification of 17,185 configuration changes into human- vs. bot-driven and the attribution of “parameter updates” vs. “version updates” lacks a reproducible rule set or inter-rater validation. Without these details it is impossible to assess whether the reported timing differences and driver patterns in observation (4) are robust or artifacts of the commit-message heuristics used.
minor comments (2)
- [Table 2] Table 2 (or equivalent) reports average changes per repository but does not include standard deviations or confidence intervals, making it harder to judge the variability behind the “9.37 cache-related changes” figure.
- [Abstract] The abstract states “the first empirical study” without a brief related-work sentence; a single sentence citing the closest prior CI/CD or GitHub Actions studies would strengthen the novelty claim.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions we will incorporate to improve clarity, reproducibility, and robustness.
read point-by-point responses
-
Referee: [§3] §3 (Data Collection and Filtering): The selection of the 952 repositories and the detection of the 266 cache adopters are described only at a high level. No explicit search queries, date ranges, popularity thresholds, or exclusion criteria are provided, nor is there validation that the parsing correctly identifies cache steps (e.g., actions/cache or equivalent). Because every subsequent comparison and statistic conditions on this sample, the risk of selection bias directly undermines observation (1) and the generalizability of (2)–(4).
Authors: We agree that the original description of the repository selection and cache-step detection process was at a high level and that greater transparency is required to evaluate selection bias and replicability. In the revised manuscript we will expand §3 with the precise GitHub search queries, the exact date range used for repository discovery, all popularity and activity thresholds applied, the full list of exclusion criteria, and a detailed account of the parsing rules (including regular expressions and heuristics) used to identify cache steps such as actions/cache. We will also add a short validation subsection describing how we manually inspected a random sample of detected workflows to confirm correct identification of cache usage. These additions will directly support the generalizability claims in observations (1)–(4). revision: yes
-
Referee: [§4.3, §5] §4.3 and §5 (Evolution Analysis): The classification of 17,185 configuration changes into human- vs. bot-driven and the attribution of “parameter updates” vs. “version updates” lacks a reproducible rule set or inter-rater validation. Without these details it is impossible to assess whether the reported timing differences and driver patterns in observation (4) are robust or artifacts of the commit-message heuristics used.
Authors: We acknowledge that the classification rules for human- versus bot-driven changes and for parameter versus version updates were not presented with sufficient detail or validation evidence. In the revised version we will add an explicit, reproducible rule set in §4.3: bot detection will be defined by a combination of author login patterns (e.g., known bot accounts), commit-message keywords (e.g., “dependabot”, “renovate”, “auto-update”), and commit frequency heuristics; parameter updates will be distinguished from version updates by whether the changed fields affect cache-key parameters versus action version specifiers. We will include concrete examples of each category and report the results of a manual validation performed on a stratified sample of 200 changes (with inter-rater agreement statistics). This will allow readers to judge the robustness of the timing and driver patterns reported in observation (4). revision: yes
Circularity Check
No circularity: empirical observations derived from independent public data analysis
full rationale
The paper conducts an empirical study by collecting and analyzing public GitHub repository data, workflow files, and commits to derive four observations about CI/CD caching adoption and evolution. No self-definitional relations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the derivation chain; the central claims rest on direct comparison of adopter vs. non-adopter statistics and change patterns extracted from the dataset. Sampling and parsing choices may affect generalizability or introduce bias (a validity concern), but they do not reduce any result to its inputs by construction, as required for circularity. The analysis is self-contained against external benchmarks and does not invoke uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The chosen repositories and time periods reflect general developer practices in using GitHub Actions.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:250718062
Abrokwah E, Ghaleb TA (2025) An empirical study of complexity, heterogeneity, and compliance of github actions workflows. arXiv preprint arXiv:250718062
work page 2025
-
[2]
arXiv preprint arXiv:250616453
AlMulla B, Assi M, Hassan S (2025) Understanding the challenges and promises of developing generative ai apps: An empirical study. arXiv preprint arXiv:250616453
work page 2025
-
[3]
Empirical Software Engineering 24(3):1259--1295
Baltes S, Diehl S (2019) Usage and attribution of stack overflow code snippets in github projects. Empirical Software Engineering 24(3):1259--1295
work page 2019
-
[4]
Benedetti G, Verderame L, Merlo A (2022) Automatic security assessment of github actions workflows. In: Proceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses, pp 37--45
work page 2022
-
[5]
In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pp 1--12
Bouzenia I, Pradel M (2024) Resource usage and optimization opportunities in workflows of github actions. In: Proceedings of the 46th IEEE/ACM International Conference on Software Engineering, pp 1--12
work page 2024
-
[6]
Chen T, Zhang Y, Chen S, Wang T, Wu Y (2021) Let's supercharge the workflows: An empirical study of github actions. In: 2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C), IEEE, pp 01--10
work page 2021
-
[7]
Decan A, Mens T, Mazrae PR, Golzadeh M (2022) On the use of github actions in software development repositories. In: 2022 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, pp 235--245
work page 2022
-
[8]
Journal of Systems and Software 206:111827
Decan A, Mens T, Delicheh HO (2023) On the outdatedness of workflows in the github actions ecosystem. Journal of Systems and Software 206:111827
work page 2023
-
[9]
Gagniuc PA (2017) Markov chains: from theory to implementation and experimentation. John Wiley & Sons
work page 2017
-
[10]
Gallaba K (2019) Improving the robustness and efficiency of continuous integration and deployment. In: 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), IEEE, pp 619--623
work page 2019
-
[11]
IEEE Transactions on Software Engineering 48(6):2040--2052
Gallaba K, Ewart J, Junqueira Y, McIntosh S (2020) Accelerating continuous integration by caching environments and inferring dependencies. IEEE Transactions on Software Engineering 48(6):2040--2052
work page 2020
-
[12]
ACM Transactions on Software Engineering and Methodology 35(2):1--40
Ghaleb TA, Abduljalil O, Hassan S (2026 a ) Ci/cd configuration practices in open source android apps: An empirical study. ACM Transactions on Software Engineering and Methodology 35(2):1--40
work page 2026
-
[13]
arXiv preprint arXiv:260119146
Ghaleb TA, da Costa DA, Zou Y (2026 b ) The promise and reality of continuous integration caching: An empirical study of travis ci builds. arXiv preprint arXiv:260119146
work page 2026
-
[14]
Empirical Software Engineering 29(6):150
Hao H, Hasan KA, Qin H, Macedo M, Tian Y, Ding SH, Hassan AE (2024) An empirical study on developers’ shared conversations with chatgpt in github pull requests and issues. Empirical Software Engineering 29(6):150
work page 2024
-
[15]
ACM Transactions on Software Engineering and Methodology
Huang J, Lin B (2026) On the reruns of github actions workflows. ACM Transactions on Software Engineering and Methodology
work page 2026
-
[16]
Rahman N (2023) Exploring the role of continuous integration and continuous deployment (ci/cd) in enhancing automation in modern software development: A study of patterns. Tools, And Outcomes
work page 2023
-
[17]
Rostami Mazrae P, Decan A, Mens T, Wessel M (2025) An empirical study of the evolution of github actions workflows. Available at SSRN 5369484
work page 2025
-
[18]
Shahin M, Babar MA, Zhu L (2017) Continuous integration, delivery and deployment: a systematic review on approaches, tools, challenges and practices. IEEE access 5:3909--3943
work page 2017
-
[19]
Valenzuela-Toledo P, Bergel A (2022) Evolution of github action workflows. In: 2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), IEEE, pp 123--127
work page 2022
-
[20]
Valenzuela-Toledo P, Bergel A, Kehrer T, Nierstrasz O (2024) The hidden costs of automation: An empirical study on github actions workflow maintenance. In: 2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM), IEEE, pp 213--223
work page 2024
-
[21]
Empirical Software Engineering 28(6):131
Wessel M, Vargovich J, Gerosa MA, Treude C (2023) Github actions: the impact on the pull request process. Empirical Software Engineering 28(6):131
work page 2023
-
[22]
Journal of Computer, Signal, and System Research 2(3):59--68
Yang S (2025) The impact of continuous integration and continuous delivery on software development efficiency. Journal of Computer, Signal, and System Research 2(3):59--68
work page 2025
-
[23]
ACM Transactions on Software Engineering and Methodology
Zheng L, Li S, Huang X, Huang J, Lin B, Chen J, Xuan J (2025) Why do github actions workflows fail? an empirical study. ACM Transactions on Software Engineering and Methodology
work page 2025
-
[24]
Zheng S, Adams B, Hassan AE (2024) Does using bazel help speed up continuous integration builds? Empirical Software Engineering 29(5):110
work page 2024
-
[25]
, " * write output.state after.block = add.period write newline
ENTRY address archive author booktitle chapter doi edition editor eid eprint howpublished institution journal key month note number organization pages publisher school series title type url volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all ...
-
[26]
" write newline "" before.all 'output.state := FUNCTION add.period duplicate empty 'skip "." * add.blank if FUNCTION if.digit duplicate "0" = swap duplicate "1" = swap duplicate "2" = swap duplicate "3" = swap duplicate "4" = swap duplicate "5" = swap duplicate "6" = swap duplicate "7" = swap duplicate "8" = swap "9" = or or or or or or or or or FUNCTION ...
-
[27]
, " * write output.state after.block = add.period write newline
ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'after.sentence := #3 '...
-
[28]
" write newline "" before.all 'output.state := FUNCTION if.digit duplicate "0" = swap duplicate "1" = swap duplicate "2" = swap duplicate "3" = swap duplicate "4" = swap duplicate "5" = swap duplicate "6" = swap duplicate "7" = swap duplicate "8" = swap "9" = or or or or or or or or or FUNCTION n.separate 't := "" #0 'numnames := t empty not t #-1 #1 subs...
-
[29]
, " * write output.state after.block = add.period write newline
ENTRY address author booktitle chapter doi edition editor eid howpublished institution journal key month note number organization pages publisher school series title type url volume year label INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 'after.sentence := #3 '...
-
[30]
" write newline "" before.all 'output.state := FUNCTION if.digit duplicate "0" = swap duplicate "1" = swap duplicate "2" = swap duplicate "3" = swap duplicate "4" = swap duplicate "5" = swap duplicate "6" = swap duplicate "7" = swap duplicate "8" = swap "9" = or or or or or or or or or FUNCTION n.separate 't := "" #0 'numnames := t empty not t #-1 #1 subs...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.