Beyond the YAML File: Understanding Real-World GitHub Actions Workflow Adoption
Pith reviewed 2026-05-10 05:05 UTC · model grok-4.3
The pith
Real-world GitHub Actions data reveals three distinct developer responses to workflow failures along with a gap between configuration and actual use.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We identify three distinct failure response patterns, observe that higher usage intensity of GHA workflows correlates with lower failure rates, and uncover a configuration-usage gap where the presence of configuration files masks disabled or unused workflows. Moreover, our qualitative analysis of relationships between project characteristics and utilization patterns yields five hypotheses for future validation.
What carries the argument
Mixed-methods analysis of 258,300 workflow run records combined with in-depth review of 21 repositories to map failure responses and usage patterns.
Load-bearing premise
The chosen set of 952 repositories for quantitative data and 21 for qualitative analysis accurately reflects how GitHub Actions are used more broadly.
What would settle it
Finding a different number of failure response patterns or no correlation between usage intensity and failure rates in a larger random sample of repositories would challenge the main findings.
Figures
read the original abstract
Continuous Integration and Continuous Deployment (CI/CD) have become fundamental to modern software development, with GitHub Actions (GHA) emerging as a dominant automation platform. In this study, we analyze real-world execution records of GHA, examining how developers react to workflow failures, how these workflows are utilized by projects, and how these aspects relate to project characteristics. We quantitatively analyze 258,300 workflow run records from 952 repositories and perform an in-depth qualitative analysis of 21 selected, diverse GitHub repositories to understand how maintainers and contributors interact with workflow results. We identify three distinct failure response patterns, observe that higher usage intensity of GHA workflows correlates with lower failure rates, and uncover a configuration-usage gap where the presence of configuration files masks disabled or unused workflows. Moreover, our qualitative analysis of relationships between project characteristics and utilization patterns yields five hypotheses for future validation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents an empirical study of real-world GitHub Actions (GHA) adoption. It quantitatively analyzes 258,300 workflow run records from 952 repositories to examine failure responses, usage intensity, and failure rates, and qualitatively studies 21 selected repositories to identify patterns in how maintainers interact with workflow results. The central claims are the identification of three distinct failure response patterns, a negative correlation between higher GHA usage intensity and lower failure rates, a configuration-usage gap where config files mask disabled workflows, and five hypotheses relating project characteristics to utilization patterns.
Significance. If the findings hold after addressing sampling and analysis details, the work offers valuable large-scale observational data on CI/CD practices with GHA, a dominant platform. The scale of the quantitative dataset (258k runs) is a strength for identifying usage patterns and correlations, and the mixed-methods approach yields actionable hypotheses. This could inform tool builders and practitioners on workflow design, though the observational design inherently limits causal claims.
major comments (3)
- [§3] §3 (Data Collection and Sampling): The criteria and process for selecting the 952 repositories (and the 21 for qualitative analysis) are not described in sufficient detail to evaluate representativeness or rule out selection bias. This is load-bearing for the correlation between usage intensity and failure rates (reported in §4) and the three failure patterns, as unmeasured factors like repository popularity, age, or language could confound results.
- [§4.2] §4.2 (Quantitative Results on Usage and Failures): The reported negative correlation lacks mention of statistical controls for potential confounders (e.g., project size, team activity, primary language). Without these or sensitivity analyses, the claim that higher usage intensity correlates with lower failure rates cannot be confidently attributed to usage rather than external variables.
- [§5] §5 (Qualitative Analysis): The failure classification criteria, inter-rater reliability measures, and exact selection process for the 21 repositories are not specified. This undermines the validity of the three identified failure response patterns and the configuration-usage gap, as measurement bias or non-representative cases could produce the observed patterns.
minor comments (2)
- [Abstract] The abstract and introduction could more clearly distinguish between the quantitative sample (952 repos) and qualitative subsample (21 repos) to avoid reader confusion about scope.
- [Figures/Tables] Figure captions and table descriptions would benefit from explicit definitions of 'usage intensity' and 'failure rate' to improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback, which highlights important areas for improving the transparency and robustness of our empirical study. We address each major comment point by point below, outlining the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§3] §3 (Data Collection and Sampling): The criteria and process for selecting the 952 repositories (and the 21 for qualitative analysis) are not described in sufficient detail to evaluate representativeness or rule out selection bias. This is load-bearing for the correlation between usage intensity and failure rates (reported in §4) and the three failure patterns, as unmeasured factors like repository popularity, age, or language could confound results.
Authors: We agree that greater detail on the sampling process is required to allow evaluation of representativeness and potential biases. In the revised manuscript, we will expand §3 with a full description of the repository selection criteria, including the source population (e.g., GitHub repositories with public workflow histories), inclusion filters such as minimum workflow run counts and activity thresholds, and steps taken to promote diversity in programming languages and project sizes. For the 21 repositories in the qualitative analysis, we will document the purposive sampling approach used to capture variation in failure response patterns. We will also add an explicit limitations subsection addressing selection bias and generalizability. revision: yes
-
Referee: [§4.2] §4.2 (Quantitative Results on Usage and Failures): The reported negative correlation lacks mention of statistical controls for potential confounders (e.g., project size, team activity, primary language). Without these or sensitivity analyses, the claim that higher usage intensity correlates with lower failure rates cannot be confidently attributed to usage rather than external variables.
Authors: We concur that controlling for confounders strengthens causal interpretation in observational data. In the revision, we will augment the analysis in §4.2 with multivariate regression models that include controls for project size (measured by stars and contributors), team activity (commit frequency), primary language, and repository age. We will also report sensitivity analyses, such as stratified correlations and alternative model specifications, to assess the stability of the negative association between usage intensity and failure rates. These additions will be presented alongside the existing descriptive results while maintaining the observational framing of the study. revision: yes
-
Referee: [§5] §5 (Qualitative Analysis): The failure classification criteria, inter-rater reliability measures, and exact selection process for the 21 repositories are not specified. This undermines the validity of the three identified failure response patterns and the configuration-usage gap, as measurement bias or non-representative cases could produce the observed patterns.
Authors: We recognize the need for explicit methodological transparency in the qualitative component. In the revised §5, we will specify the failure classification criteria in detail, including the coding scheme, category definitions, and illustrative examples from the data. We will report inter-rater reliability statistics (e.g., Cohen's kappa) from the independent coding performed by the research team. Additionally, we will describe the exact selection process for the 21 repositories, including how cases were chosen to reflect diversity in failure patterns and project characteristics. These clarifications will support readers' assessment of the identified patterns and the configuration-usage gap. revision: yes
Circularity Check
No circularity: purely observational empirical study
full rationale
The paper performs quantitative analysis of 258300 external GitHub workflow runs from 952 repositories plus qualitative coding of 21 cases. It reports observed failure-response patterns, a usage-intensity correlation, a configuration-usage gap, and five hypotheses. No equations, fitted parameters, predictions, derivations, or self-citations appear in the provided text or abstract; all claims are direct summaries of collected external data with no reduction to internal definitions or prior author results.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The 952 repositories provide a representative sample of GitHub Actions usage across open-source projects.
- domain assumption Failure responses observed in the 21 qualitative repositories can be generalized into three distinct patterns.
Reference graph
Works this paper leans on
-
[1]
Jessy Ayala and Joshua Garcia. 2023. An empirical study on workflows and secu- rity policies in popular github repositories. In2023 IEEE/ACM 1st International Workshop on Software Vulnerability (SVM). IEEE, 6–9
work page 2023
-
[2]
Alberto Bacchelli and Christian Bird. 2013. Expectations, outcomes, and chal- lenges of modern code review. In2013 35th International Conference on Software Engineering (ICSE), 712–721
work page 2013
-
[3]
Moritz Beller, Radjino Bholanath, Shane McIntosh, and Andy Zaidman. 2016. Analyzing the state of static analysis: A large-scale evaluation in open source software. InIEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER). IEEE, 470–481
work page 2016
-
[4]
Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. Oops, my tests broke the build: an explorative analysis of travis CI with github. InProceedings of the 14th International Conference on Mining Software Repositories (MSR). IEEE, 356–367
work page 2017
-
[5]
Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. Travistorrent: syn- thesizing travis CI and github for full-stack research on continuous integration. InProceedings of the 14th International Conference on Mining Software Reposito- ries (MSR). IEEE, 447–450
work page 2017
-
[6]
Giacomo Benedetti, Luca Verderame, and Alessio Merlo. 2022. Automatic secu- rity assessment of GitHub Actions workflows. InProceedings of the 2022 ACM Workshop on Software Supply Chain Offensive Research and Ecosystem Defenses, 37–45
work page 2022
-
[7]
Al Bessey et al. 2010. A few billion lines of code later: using static analysis to find bugs in the real world.Commun. ACM, 53, 2, 66–75
work page 2010
-
[8]
Islem Bouzenia and Michael Pradel. 2024. Resource usage and optimization opportunities in workflows of GitHub Actions. InProceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE). ACM, 25:1– 25:12
work page 2024
-
[9]
Tingting Chen, Yang Zhang, Shu Chen, Tao Wang, and Yiwen Wu. 2021. Let’s supercharge the workflows: an empirical study of GitHub Actions. In2021 IEEE 21st International Conference on Software Quality, Reliability and Security Companion (QRS-C). IEEE, 01–10
work page 2021
-
[10]
2013.Applied multiple regression/correlation analysis for the behavioral sciences
Jacob Cohen, Patricia Cohen, Stephen G West, and Leona S Aiken. 2013.Applied multiple regression/correlation analysis for the behavioral sciences. Routledge
work page 2013
-
[11]
Alexandre Decan, Tom Mens, and Hassan Onsori Delicheh. 2023. On the out- datedness of workflows in the GitHub Actions ecosystem.Journal of Systems and Software, 206, 111827
work page 2023
-
[12]
Alexandre Decan, Tom Mens, Pooya Rostami Mazrae, and Mehdi Golzadeh
-
[13]
In IEEE International Conference on Software Maintenance and Evolution, (ICSME)
On the use of GitHub Actions in software development repositories. In IEEE International Conference on Software Maintenance and Evolution, (ICSME). IEEE, 235–245
-
[14]
2005.A Modern Introduction to Probability and Statistics: Understanding why and how
Frederik Michel Dekking, Cornelis Kraaikamp, Hendrik Paul Lopuhaä, and Ludolf Erwin Meester. 2005.A Modern Introduction to Probability and Statistics: Understanding why and how. Vol. 488. Springer
work page 2005
-
[15]
Hassan Onsori Delicheh, Alexandre Decan, and Tom Mens. 2023. A preliminary study of GitHub Actions dependencies. InSATToSE, 66–77
work page 2023
-
[16]
Hassan Onsori Delicheh and Tom Mens. 2024. Mitigating security issues in GitHub Actions. InProceedings of the 2024 ACM/IEEE 4th International Work- shop on Engineering and Cybersecurity of Critical Systems (EnCyCriS) and 2024 IEEE/ACM Second International Workshop on Software Vulnerability, 6–11
work page 2024
- [17]
-
[18]
Do as I do, not as I say: do contribution guidelines match the GitHub contribution process? In2019 IEEE International Conference on Software Main- tenance and Evolution, (ICSME). IEEE, 286–290
-
[19]
Ernst, and Margaret-Anne Storey
Omar Elazhary, Colin Werner, Ze Shi Li, Derek Lowlind, Neil A. Ernst, and Margaret-Anne Storey. 2022. Uncovering the benefits and challenges of con- tinuous integration practices.IEEE Transactions on Software Engineering, 48, 7, 2570–2583. doi:10.1109/TSE.2021.3064953
-
[20]
M. Fowler and M. Foemmel. [n. d.] Continuous integration. [Online; accessed 29-May-2025]. (). https://tinyurl.com/ycbl2uhj
work page 2025
-
[21]
Randy Garrison, Martha Cleveland-Innes, Marguerite Koole, and James Kappelman
D. Randy Garrison, Martha Cleveland-Innes, Marguerite Koole, and James Kappelman. 2006. Revisiting methodological issues in transcript analysis: ne- gotiated coding and reliability.Internet High. Educ., 9, 1, 1–8
work page 2006
-
[22]
2017.Discovery of grounded theory: Strate- gies for qualitative research
Barney Glaser and Anselm Strauss. 2017.Discovery of grounded theory: Strate- gies for qualitative research. Routledge
work page 2017
-
[23]
Mehdi Golzadeh, Alexandre Decan, and Tom Mens. 2022. On the rise and fall of CI services in GitHub. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 662–672
work page 2022
-
[24]
Georgios Gousios and Andy Zaidman. 2014. A dataset for pull-based devel- opment research. InProceedings of the 11th Working Conference on Mining Software Repositories(MSR 2014). ACM, 368–371
work page 2014
-
[25]
Georgios Gousios, Andy Zaidman, Margaret-Anne D. Storey, and Arie van Deursen. 2015. Work practices and challenges in pull-based development: the integrator’s perspective. In37th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 358–368
work page 2015
-
[26]
Michael Hilton, Timothy Tunnell, Kai Huang, Darko Marinov, and Danny Dig. 2016. Usage, costs, and benefits of continuous integration in open-source projects. InProceedings of the 31st IEEE/ACM International Conference on Auto- mated Software Engineering (ASE). ACM, 426–437
work page 2016
-
[27]
Brittany Johnson, Yoonki Song, Emerson Murphy-Hill, and Robert Bowdidge
-
[28]
Why don’t software developers use static analysis tools to find bugs? In International Conference on Software Engineering (ICSE). IEEE, 672–681
-
[29]
Ali Khatami, Carolin Brandt, and Andy Zaidman. 2026. Replication package for “Beyond the YAML File: Understanding Real-World Github Actions Workflow Adoption. (2026). doi:10.5281/zenodo.18258226
-
[30]
Ali Khatami, Carolin Brandt, and Andy Zaidman. 2024. Software quality as- surance analytics: enabling software engineers to reflect on QA practices. In 2024 IEEE International Conference on Source Code Analysis and Manipulation (SCAM), 189–200
work page 2024
-
[31]
Ali Khatami, Cédric Willekens, and Andy Zaidman. 2024. Catching smells in the act: A github actions workflow investigation. InInternational Conference on Source Code Analysis and Manipulation (SCAM). IEEE, 47–58
work page 2024
-
[32]
Ali Khatami and Andy Zaidman. 2024. State-of-the-practice in quality assur- ance in Java-based open source software development.Software: Practice and Experience, 54, 8, 1408–1446
work page 2024
-
[33]
Timothy Kinsman, Mairieli Wessel, Marco A Gerosa, and Christoph Treude
-
[34]
How do software developers use github actions to automate their work- flows? In2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 420–431
-
[35]
Eriks Klotins, Tony Gorschek, Katarina Sundelin, and Erik Falk. 2022. Towards cost-benefit evaluation for continuous software engineering activities.Empiri- cal Software Engineering, 157, 6
work page 2022
-
[36]
Igibek Koishybayev, Aleksandr Nahapetyan, Raima Zachariah, Siddharth Mu- ralee, Bradley Reaves, Alexandros Kapravelos, and Aravind Machiry. 2022. Characterizing the security of GitHub CI workflows. In31st USENIX Security Symposium (USENIX Security 22), 2747–2763
work page 2022
-
[37]
Zhixing Li, Yue Yu, Tao Wang, Shanshan Li, and Huaimin Wang. 2022. Op- portunities and challenges in repeated revisions to pull-requests: an empirical study.Proc. ACM Hum.-Comput. Interact., 6, CSCW2, Article 317, (Nov. 2022), 35 pages
work page 2022
-
[38]
Pooya Rostami Mazrae, Alexandre Decan, and Tom Mens. 2024. Gawd: a differ- encing tool for github actions workflows. InProceedings of the 21st International Conference on Mining Software Repositories, 682–686
work page 2024
-
[39]
Pooya Rostami Mazrae, Tom Mens, Mehdi Golzadeh, and Alexandre Decan
-
[40]
On the usage, co-usage and migration of CI/CD tools: a qualitative analy- sis.Empirical Software Engineering, 28, 2, 52
-
[41]
Jadson Santos, Daniel Alencar da Costa, Shane McIntosh, and Uirá Kulesza
-
[42]
On the need to monitor continuous integration practices.Empirical Software Engineering, 30, 5, (June 2025), 47 pages
work page 2025
-
[43]
Sk Golam Saroar and Maleknaz Nayebi. 2023. Developers’ perception of GitHub Actions: a survey analysis. InProceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering. ACM, 121–130
work page 2023
-
[44]
Pablo Valenzuela-Toledo and Alexandre Bergel. 2022. Evolution of github action workflows. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 123–127
work page 2022
-
[45]
Erik van der Veen, Georgios Gousios, and Andy Zaidman. 2015. Automatically prioritizing pull requests. In12th IEEE/ACM Working Conference on Mining Software Repositories (MSR). IEEE, 357–361
work page 2015
-
[46]
Bogdan Vasilescu, Yue Yu, Huaimin Wang, Premkumar Devanbu, and Vladimir Filkov. 2015. Quality and productivity outcomes relating to continuous inte- gration in GitHub. InProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 805–816.isbn: 9781450336758
work page 2015
-
[47]
Mairieli Wessel, Joseph Vargovich, Marco Aurélio Gerosa, and Christoph Treude. 2023. Github actions: the impact on the pull request process.Empir. Softw. Eng., 28, 6, 131
work page 2023
-
[48]
Yang Zhang, Yiwen Wu, Tingting Chen, Tao Wang, Hui Liu, and Huaimin Wang. 2024. How do developers talk about GitHub Actions? Evidence from online software development community. InProceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE). ACM
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.