Continuous Integration Theater

Bruno Cartaxo; Daniel da Costa; Gustavo Pinto; Leonardo Furtado; Wagner Felidr\'e

arxiv: 1907.01602 · v1 · pith:QDFMP4UFnew · submitted 2019-07-02 · 💻 cs.SE

Continuous Integration Theater

Wagner Felidr\'e , Leonardo Furtado , Daniel da Costa , Bruno Cartaxo , Gustavo Pinto This is my paper

Pith reviewed 2026-05-25 10:29 UTC · model grok-4.3

classification 💻 cs.SE

keywords continuous integrationtravisciopen source projectsbuild failurescode coverageunhealthy practicesci theaterinfrequent commits

0 comments

The pith

Many TravisCI projects use continuous integration without following its core practices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes 1,270 open-source projects using TravisCI to identify unhealthy CI practices. It reports that roughly 60 percent of projects make infrequent commits, which complicates merging. Additionally, 85 percent of projects have at least one build that remains broken for more than four days. Code coverage averages 78 percent in the projects where it could be measured, though some have very low coverage. The authors conclude that these patterns indicate 'Continuous Integration Theater,' where tools are adopted but not used effectively.

Core claim

By inspecting 1,270 open-source projects that use TravisCI, we quantitatively studied how common it is to use CI with infrequent commits, in projects with poor test coverage, with builds that stay broken for long periods, and with builds that take too long to run. We observed that 748 (~60%) projects face infrequent commits, 85% have at least one broken build that takes more than four days to be fixed, and for the majority the build is executed under the 10 minutes rule of thumb.

What carries the argument

Continuous Integration Theater, the situation in which software engineers do not employ CI tools effectively, leading to unhealthy practices.

Load-bearing premise

That the 1,270 TravisCI projects represent typical CI usage and that the chosen cutoffs for infrequent commits, long-broken builds, and long build times validly mark unhealthy practices.

What would settle it

A replication study on a different set of projects or with different thresholds showing substantially lower rates of infrequent commits and long-broken builds.

Figures

Figures reproduced from arXiv: 1907.01602 by Bruno Cartaxo, Daniel da Costa, Gustavo Pinto, Leonardo Furtado, Wagner Felidr\'e.

**Figure 2.** Figure 2: Size of the project and its frequency perday of the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Frequency of commits, grouped by the size of the projects (boxplots), and the programming languages (Ruby on the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Code coverage per programming language [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Distribution of days with broken build, grouped by the size of the projects (boxplots), and the programming languages [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Build duration, grouped by the size of the projects (boxplots), and the programming languages (Ruby on the left and [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Background: Continuous Integration (CI) systems are now the bedrock of several software development practices. Several tools such as TravisCI, CircleCI, and Hudson, that implement CI practices, are commonly adopted by software engineers. However, the way that software engineers use these tools could lead to what we call "Continuous Integration Theater", a situation in which software engineers do not employ these tools effectively, leading to unhealthy CI practices. Aims: The goal of this paper is to make sense of how commonplace are these unhealthy continuous integration practices being employed in practice. Method: By inspecting 1,270 open-source projects that use TravisCI, the most used CI service, we quantitatively studied how common is to use CI (1) with infrequent commits, (2) in a software project with poor test coverage, (3) with builds that stay broken for long periods, and (4) with builds that take too long to run. Results: We observed that 748 ($sim$60%) projects face infrequent commits, which essentially makes the merging process harder. Moreover, we were able to find code coverage information for 51 projects. The average code coverage was 78%, although Ruby projects have a higher code coverage than Java projects (86% and 63%, respectively). However, some projects with very small coverage ($sim$4%) were found. Still, we observed that 85% of the studied projects have at least one broken build that take more than four days to be fixed. Interestingly, very small projects (up to 1,000 lines of code) are the ones that take the longest to fix broken builds. Finally, we noted that, for the majority of the studied projects, the build is executed under the 10 minutes rule of thumb. Conclusions: Our results are important to an increasing community of software engineers that employ CI practices on daily basis but may not be aware of bad practices that are eventually employed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper tallies CI problems across 1270 TravisCI projects but rests its headline percentages on untested cutoffs for commit frequency, broken-build duration, and build time.

read the letter

The core finding is that roughly 60% of projects show infrequent commits and 85% have at least one build broken longer than four days. These are direct counts from project metadata on TravisCI, the most common service at the time. The paper also pulls coverage numbers for the 51 projects that expose them (average 78%, higher in Ruby than Java) and notes that most builds finish under ten minutes while small projects take longer to repair broken ones. That is the actual contribution: prevalence figures for four specific practices in one CI platform's open-source users. No new framework or algorithm appears. The work is straightforward observational reporting rather than a derivation or controlled experiment. The thresholds themselves receive no derivation or sensitivity checks in the abstract, and the sample is limited to projects already using TravisCI, which likely tilts toward more CI-aware teams. Coverage data exists for only a small slice of the set. These are real but contained limitations; the counts themselves are reproducible from the metadata if the exact selection rules are documented. Readers who track CI adoption or write practitioner guidelines could use the numbers as one data point, provided they treat the cutoffs as illustrative rather than definitive. The paper shows clear data collection and honest reporting of what was observed, with no obvious internal contradictions. It is worth sending to referees so they can check the full methods, selection criteria, and whether the authors added any robustness tests in the body. Desk rejection would be too quick given the concrete sample size and topic relevance.

Referee Report

4 major / 2 minor

Summary. The paper examines the prevalence of four unhealthy CI practices ('Continuous Integration Theater') across 1,270 open-source projects using TravisCI: infrequent commits (748 projects, ~60%), poor test coverage (data for 51 projects, average 78%), builds remaining broken for more than four days (85% of projects), and builds exceeding 10 minutes (minority of projects). It concludes these practices are common and warrant attention from the CI community.

Significance. If the prevalence estimates prove robust, the work supplies concrete observational counts from a sizable TravisCI sample that document gaps between CI tool adoption and effective usage. This could usefully inform practitioner guidelines and CI platform design. The explicit project counts and breakdown by language (e.g., Ruby vs. Java coverage) are strengths.

major comments (4)

[Abstract / Results] Abstract and Results: the 60% infrequent-commits figure and the 85% long-broken-build figure rest on three un-derived cutoffs (commit frequency, four-day broken-build window, ten-minute build duration) with no sensitivity analysis or alternative thresholds reported; modest changes to any cutoff could shift the headline percentages substantially.
[Results] Results (coverage paragraph): coverage data exist for only 51 of 1,270 projects; the reported averages and language comparisons therefore rest on a small, possibly non-representative subset and should be qualified accordingly.
[Method] Method: the sample is drawn exclusively from TravisCI users, yet no discussion addresses whether this introduces selection bias toward more CI-aware projects, limiting claims about the broader population of CI users.
[Results] Results: prevalence estimates are given as point values with no error bars, confidence intervals, or statistical tests; this weakens the quantitative claims even for the chosen thresholds.

minor comments (2)

[Abstract] Abstract: the phrase '10 minutes rule of thumb' appears without prior definition; the main text should state the exact rule and its provenance.
[Conclusions] Conclusions: the final paragraph could more explicitly restate the data limitations (small coverage subsample, TravisCI-only sample) alongside the prevalence numbers.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important limitations in our presentation of results. We agree that the manuscript would benefit from additional analysis and qualifications. We will revise the paper to incorporate sensitivity analyses, better qualification of the coverage subsample, discussion of selection bias, and uncertainty measures for prevalence estimates.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results: the 60% infrequent-commits figure and the 85% long-broken-build figure rest on three un-derived cutoffs (commit frequency, four-day broken-build window, ten-minute build duration) with no sensitivity analysis or alternative thresholds reported; modest changes to any cutoff could shift the headline percentages substantially.

Authors: We acknowledge that the chosen thresholds (infrequent commits, four-day broken builds, and ten-minute builds) are presented without sensitivity analysis. While the ten-minute threshold is described as a 'rule of thumb' in the manuscript and the four-day window is motivated by prior work on build breakage, we agree that robustness should be demonstrated. In the revision we will add a sensitivity analysis subsection that varies each threshold and reports how the headline percentages change. revision: yes
Referee: [Results] Results (coverage paragraph): coverage data exist for only 51 of 1,270 projects; the reported averages and language comparisons therefore rest on a small, possibly non-representative subset and should be qualified accordingly.

Authors: The referee correctly notes the small sample (n=51) for coverage. We will revise the results and discussion sections to explicitly qualify this subsample as potentially non-representative, state the limitation prominently, and avoid over-generalizing the language-specific comparisons. revision: yes
Referee: [Method] Method: the sample is drawn exclusively from TravisCI users, yet no discussion addresses whether this introduces selection bias toward more CI-aware projects, limiting claims about the broader population of CI users.

Authors: We agree that restricting the sample to TravisCI projects may introduce selection bias. The revised manuscript will include an explicit limitations paragraph discussing this issue and its implications for generalizability beyond TravisCI users. revision: yes
Referee: [Results] Results: prevalence estimates are given as point values with no error bars, confidence intervals, or statistical tests; this weakens the quantitative claims even for the chosen thresholds.

Authors: We accept that point estimates alone are insufficient. The revision will add binomial confidence intervals for the main prevalence figures (60% and 85%) and, where feasible, for the coverage statistics. We will also note the absence of formal hypothesis tests as a limitation of the observational design. revision: yes

Circularity Check

0 steps flagged

No circularity; purely observational counts from explicit thresholds

full rationale

The paper reports direct observational statistics (e.g., 748 projects with infrequent commits, 85% with broken builds >4 days) obtained by applying chosen cutoffs to the 1,270-project TravisCI dataset. No equations, fitted parameters, predictions, or derivations appear. No self-citations, uniqueness theorems, or ansatzes are invoked to support the central claims. The results are computed counts from the data under the stated definitions; they do not reduce to the inputs by construction. Threshold arbitrariness is a validity concern, not a circularity issue per the defined patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

Findings rest on four ad-hoc thresholds (infrequent commits, poor coverage, >4 days broken, >10 min builds) and on the assumption that TravisCI metadata accurately captures practice quality.

free parameters (2)

broken-build duration threshold = 4 days
Four days is used to mark 'long periods' without derivation from the data or external benchmark.
build duration threshold = 10 minutes
Ten minutes is invoked as a 'rule of thumb' without justification or sensitivity analysis.

axioms (2)

domain assumption TravisCI usage is a valid proxy for CI adoption and the selected projects represent broader CI practice.
Method section selects only TravisCI projects.
domain assumption Public coverage reports from 51 projects are sufficient to characterize test quality across the sample.
Results report coverage only for this small subset.

pith-pipeline@v0.9.0 · 5886 in / 1449 out tokens · 38845 ms · 2026-05-25T10:29:34.065555+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation

Jez Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation . Addison- Wesley Professional, 1st edition, 2010

work page 2010
[2]

Brooks, Jr

Frederick P. Brooks, Jr. The Mythical Man-Month: Essays on Softw . Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1978

work page 1978
[3]

Work practices and challenges in continuous integration: A survey with travis CI users

Gustavo Pinto, Fernando Castor, Rodrigo Bonif ´acio, and Marcel Rebouc ¸as. Work practices and challenges in continuous integration: A survey with travis CI users. Softw., Pract. Exper ., 48(12):2223–2236, 2018

work page 2018
[4]

Rebouc ¸as, R

M. Rebouc ¸as, R. O. Santos, G. Pinto, and F. Castor. How does contributors’ involvement inﬂuence the build status of an open-source software project? In Proceedings of the 14th International Conference on Mining Software Repositories , MSR ’17, pages 475–478, Piscataway, NJ, USA, 2017. IEEE Press

work page 2017
[5]

Vasilescu, Y

B. Vasilescu, Y . Yu, H. Wang, P. Devanbu, and V . Filkov. Quality and productivity outcomes relating to continuous integration in github. In Proceedings of the 2015 10th Joint Meeting on F oundations of Software Engineering, ESEC/FSE 2015, pages 805–816, 2015

work page 2015
[6]

Vasilescu, S

B. Vasilescu, S. van Schuylenburg, J. Wulms, A. Serebrenik, and M. G. J. van den Brand. Continuous integration in a social-coding world: Empiri- cal evidence from github. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution , ICSME ’14, pages 401–405, Washington, DC, USA, 2014. IEEE Computer Society

work page 2014
[7]

Hilton, T

M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig. Usage, costs, and beneﬁts of continuous integration in open-source projects. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering , ASE 2016, pages 426–437, 2016

work page 2016
[8]

Building a collaborative culture: a grounded theory of well succeeded devops adoption in practice

Welder Pinheiro Luz, Gustavo Pinto, and Rodrigo Bonif ´acio. Building a collaborative culture: a grounded theory of well succeeded devops adoption in practice. In Proceedings of the 12th ACM/IEEE Interna- tional Symposium on Empirical Software Engineering and Measurement, ESEM 2018, Oulu, Finland, October 11-12, 2018 , pages 6:1–6:10, 2018

work page 2018
[9]

Continuous integration

Martin Fowler. Continuous integration. https://www.martinfowler.com/ articles/continuousIntegration.html. Accessed: 2019-06-23

work page 2019
[10]

One size does not ﬁt all: an empirical study of containerized continuous deployment workﬂows

Yang Zhang, Bogdan Vasilescu, Huaimin Wang, and Vladimir Filkov. One size does not ﬁt all: an empirical study of containerized continuous deployment workﬂows. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the F oundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, ...

work page 2018
[11]

Exploring scrumbutan empirical study of scrum anti-patterns

Veli-Pekka Eloranta, Kai Koskimies, and Tommi Mikkonen. Exploring scrumbutan empirical study of scrum anti-patterns. Information and Software Technology, 74:194 – 203, 2016

work page 2016
[12]

Trade-offs in continuous integration: assurance, security, and ﬂexibility

Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. Trade-offs in continuous integration: assurance, security, and ﬂexibility. In Proceedings of the 2017 11th Joint Meeting on F oundations of Software Engineering , pages 197–207. ACM, 2017

work page 2017
[13]

Ammann and J

P. Ammann and J. Offutt and. Coverage criteria for logical expressions. In 14th International Symposium on Software Reliability Engineering,

work page
[14]

, pages 99–107, Nov 2003

ISSRE 2003. , pages 99–107, Nov 2003

work page 2003
[15]

Comparing non-adequate test suites using coverage criteria

Milos Gligoric, Alex Groce, Chaoqiang Zhang, Rohan Sharma, Moham- mad Amin Alipour, and Darko Marinov. Comparing non-adequate test suites using coverage criteria. In Proceedings of the 2013 International Symposium on Software Testing and Analysis , ISSTA 2013, pages 302– 313, 2013

work page 2013
[16]

Coverage criteria for testing of object interactions in sequence diagrams

Atanas Rountev, Scott Kagan, and Jason Sawin. Coverage criteria for testing of object interactions in sequence diagrams. In Maura Cerioli, editor, Fundamental Approaches to Software Engineering , pages 289– 304, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg

work page 2005
[17]

Travistorrent: synthesizing travis CI and github for full-stack research on continuous integration

Moritz Beller, Georgios Gousios, and Andy Zaidman. Travistorrent: synthesizing travis CI and github for full-stack research on continuous integration. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017 , pages 447–450, 2017

work page 2017
[18]

A large-scale study of test coverage evolution

Michael Hilton, Jonathan Bell, and Darko Marinov. A large-scale study of test coverage evolution. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering , ASE 2018, pages 53–63, 2018

work page 2018
[19]

Evaluating and improving semistructured merge

Guilherme Cavalcanti, Paulo Borba, and Paola Accioly. Evaluating and improving semistructured merge. Proc. ACM Program. Lang. , 1(OOPSLA):59:1–59:27, October 2017

work page 2017
[20]

Hora, and Marco Tulio Valente

Guilherme Avelino, Leonardo Teixeira Passos, Andr ´e C. Hora, and Marco Tulio Valente. A novel approach for estimating truck factors. In 24th IEEE International Conference on Program Comprehension, ICPC 2016, Austin, TX, USA, May 16-17, 2016 , pages 1–10, 2016

work page 2016
[21]

Beller, G

M. Beller, G. Gousios, and A. Zaidman. Oops, my tests broke the build: An explorative analysis of travis ci with github. In Proceedings of the 14th International Conference on Mining Software Repositories , MSR ’17, pages 356–367, Piscataway, NJ, USA, 2017. IEEE Press

work page 2017
[22]

An empirical study of the long duration of continuous integration builds

Taher Ahmed Ghaleb, Daniel Alencar da Costa, and Ying Zou. An empirical study of the long duration of continuous integration builds. Empirical Software Engineering , pages 1–38, 2019

work page 2019
[23]

Studying the impact of adopting continuous integration on the delivery time of pull requests

Jo ˜ao Helis Bernardo, Daniel Alencar da Costa, and Uir ´a Kulesza. Studying the impact of adopting continuous integration on the delivery time of pull requests. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) , pages 131–141. IEEE, 2018

work page 2018
[24]

The impact of continuous integration on other software development practices: a large-scale empirical study

Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu. The impact of continuous integration on other software development practices: a large-scale empirical study. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 60–71. IEEE, 2017

work page 2017
[25]

Test activities in the continuous integration and delivery pipeline

Torvald M ˚artensson, Daniel St ˚ahl, and Jan Bosch. Test activities in the continuous integration and delivery pipeline. Journal of Software: Evolution and Process , page e2153, 2019

work page 2019
[26]

Noise and heterogeneity in historical build data: an empirical study of travis ci

Keheliya Gallaba, Christian Macho, Martin Pinzger, and Shane McIn- tosh. Noise and heterogeneity in historical build data: an empirical study of travis ci. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering , pages 87–97. ACM, 2018

work page 2018
[27]

A study on the interplay between pull request review and continuous integration builds

Fiorella Zampetti, Gabriele Bavota, Gerardo Canfora, and Massimiliano Di Penta. A study on the interplay between pull request review and continuous integration builds. In 2019 IEEE 26th International Con- ference on Software Analysis, Evolution and Reengineering (SANER) , pages 38–48. IEEE, 2019

work page 2019

[1] [1]

Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation

Jez Humble and David Farley. Continuous Delivery: Reliable Software Releases Through Build, Test, and Deployment Automation . Addison- Wesley Professional, 1st edition, 2010

work page 2010

[2] [2]

Brooks, Jr

Frederick P. Brooks, Jr. The Mythical Man-Month: Essays on Softw . Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1st edition, 1978

work page 1978

[3] [3]

Work practices and challenges in continuous integration: A survey with travis CI users

Gustavo Pinto, Fernando Castor, Rodrigo Bonif ´acio, and Marcel Rebouc ¸as. Work practices and challenges in continuous integration: A survey with travis CI users. Softw., Pract. Exper ., 48(12):2223–2236, 2018

work page 2018

[4] [4]

Rebouc ¸as, R

M. Rebouc ¸as, R. O. Santos, G. Pinto, and F. Castor. How does contributors’ involvement inﬂuence the build status of an open-source software project? In Proceedings of the 14th International Conference on Mining Software Repositories , MSR ’17, pages 475–478, Piscataway, NJ, USA, 2017. IEEE Press

work page 2017

[5] [5]

Vasilescu, Y

B. Vasilescu, Y . Yu, H. Wang, P. Devanbu, and V . Filkov. Quality and productivity outcomes relating to continuous integration in github. In Proceedings of the 2015 10th Joint Meeting on F oundations of Software Engineering, ESEC/FSE 2015, pages 805–816, 2015

work page 2015

[6] [6]

Vasilescu, S

B. Vasilescu, S. van Schuylenburg, J. Wulms, A. Serebrenik, and M. G. J. van den Brand. Continuous integration in a social-coding world: Empiri- cal evidence from github. In Proceedings of the 2014 IEEE International Conference on Software Maintenance and Evolution , ICSME ’14, pages 401–405, Washington, DC, USA, 2014. IEEE Computer Society

work page 2014

[7] [7]

Hilton, T

M. Hilton, T. Tunnell, K. Huang, D. Marinov, and D. Dig. Usage, costs, and beneﬁts of continuous integration in open-source projects. In Proceedings of the 31st IEEE/ACM International Conference on Automated Software Engineering , ASE 2016, pages 426–437, 2016

work page 2016

[8] [8]

Building a collaborative culture: a grounded theory of well succeeded devops adoption in practice

Welder Pinheiro Luz, Gustavo Pinto, and Rodrigo Bonif ´acio. Building a collaborative culture: a grounded theory of well succeeded devops adoption in practice. In Proceedings of the 12th ACM/IEEE Interna- tional Symposium on Empirical Software Engineering and Measurement, ESEM 2018, Oulu, Finland, October 11-12, 2018 , pages 6:1–6:10, 2018

work page 2018

[9] [9]

Continuous integration

Martin Fowler. Continuous integration. https://www.martinfowler.com/ articles/continuousIntegration.html. Accessed: 2019-06-23

work page 2019

[10] [10]

One size does not ﬁt all: an empirical study of containerized continuous deployment workﬂows

Yang Zhang, Bogdan Vasilescu, Huaimin Wang, and Vladimir Filkov. One size does not ﬁt all: an empirical study of containerized continuous deployment workﬂows. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the F oundations of Software Engineering, ESEC/SIGSOFT FSE 2018, Lake Buena Vista, FL, USA, ...

work page 2018

[11] [11]

Exploring scrumbutan empirical study of scrum anti-patterns

Veli-Pekka Eloranta, Kai Koskimies, and Tommi Mikkonen. Exploring scrumbutan empirical study of scrum anti-patterns. Information and Software Technology, 74:194 – 203, 2016

work page 2016

[12] [12]

Trade-offs in continuous integration: assurance, security, and ﬂexibility

Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. Trade-offs in continuous integration: assurance, security, and ﬂexibility. In Proceedings of the 2017 11th Joint Meeting on F oundations of Software Engineering , pages 197–207. ACM, 2017

work page 2017

[13] [13]

Ammann and J

P. Ammann and J. Offutt and. Coverage criteria for logical expressions. In 14th International Symposium on Software Reliability Engineering,

work page

[14] [14]

, pages 99–107, Nov 2003

ISSRE 2003. , pages 99–107, Nov 2003

work page 2003

[15] [15]

Comparing non-adequate test suites using coverage criteria

Milos Gligoric, Alex Groce, Chaoqiang Zhang, Rohan Sharma, Moham- mad Amin Alipour, and Darko Marinov. Comparing non-adequate test suites using coverage criteria. In Proceedings of the 2013 International Symposium on Software Testing and Analysis , ISSTA 2013, pages 302– 313, 2013

work page 2013

[16] [16]

Coverage criteria for testing of object interactions in sequence diagrams

Atanas Rountev, Scott Kagan, and Jason Sawin. Coverage criteria for testing of object interactions in sequence diagrams. In Maura Cerioli, editor, Fundamental Approaches to Software Engineering , pages 289– 304, Berlin, Heidelberg, 2005. Springer Berlin Heidelberg

work page 2005

[17] [17]

Travistorrent: synthesizing travis CI and github for full-stack research on continuous integration

Moritz Beller, Georgios Gousios, and Andy Zaidman. Travistorrent: synthesizing travis CI and github for full-stack research on continuous integration. In Proceedings of the 14th International Conference on Mining Software Repositories, MSR 2017, Buenos Aires, Argentina, May 20-28, 2017 , pages 447–450, 2017

work page 2017

[18] [18]

A large-scale study of test coverage evolution

Michael Hilton, Jonathan Bell, and Darko Marinov. A large-scale study of test coverage evolution. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering , ASE 2018, pages 53–63, 2018

work page 2018

[19] [19]

Evaluating and improving semistructured merge

Guilherme Cavalcanti, Paulo Borba, and Paola Accioly. Evaluating and improving semistructured merge. Proc. ACM Program. Lang. , 1(OOPSLA):59:1–59:27, October 2017

work page 2017

[20] [20]

Hora, and Marco Tulio Valente

Guilherme Avelino, Leonardo Teixeira Passos, Andr ´e C. Hora, and Marco Tulio Valente. A novel approach for estimating truck factors. In 24th IEEE International Conference on Program Comprehension, ICPC 2016, Austin, TX, USA, May 16-17, 2016 , pages 1–10, 2016

work page 2016

[21] [21]

Beller, G

M. Beller, G. Gousios, and A. Zaidman. Oops, my tests broke the build: An explorative analysis of travis ci with github. In Proceedings of the 14th International Conference on Mining Software Repositories , MSR ’17, pages 356–367, Piscataway, NJ, USA, 2017. IEEE Press

work page 2017

[22] [22]

An empirical study of the long duration of continuous integration builds

Taher Ahmed Ghaleb, Daniel Alencar da Costa, and Ying Zou. An empirical study of the long duration of continuous integration builds. Empirical Software Engineering , pages 1–38, 2019

work page 2019

[23] [23]

Studying the impact of adopting continuous integration on the delivery time of pull requests

Jo ˜ao Helis Bernardo, Daniel Alencar da Costa, and Uir ´a Kulesza. Studying the impact of adopting continuous integration on the delivery time of pull requests. In 2018 IEEE/ACM 15th International Conference on Mining Software Repositories (MSR) , pages 131–141. IEEE, 2018

work page 2018

[24] [24]

The impact of continuous integration on other software development practices: a large-scale empirical study

Yangyang Zhao, Alexander Serebrenik, Yuming Zhou, Vladimir Filkov, and Bogdan Vasilescu. The impact of continuous integration on other software development practices: a large-scale empirical study. In 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE) , pages 60–71. IEEE, 2017

work page 2017

[25] [25]

Test activities in the continuous integration and delivery pipeline

Torvald M ˚artensson, Daniel St ˚ahl, and Jan Bosch. Test activities in the continuous integration and delivery pipeline. Journal of Software: Evolution and Process , page e2153, 2019

work page 2019

[26] [26]

Noise and heterogeneity in historical build data: an empirical study of travis ci

Keheliya Gallaba, Christian Macho, Martin Pinzger, and Shane McIn- tosh. Noise and heterogeneity in historical build data: an empirical study of travis ci. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering , pages 87–97. ACM, 2018

work page 2018

[27] [27]

A study on the interplay between pull request review and continuous integration builds

Fiorella Zampetti, Gabriele Bavota, Gerardo Canfora, and Massimiliano Di Penta. A study on the interplay between pull request review and continuous integration builds. In 2019 IEEE 26th International Con- ference on Software Analysis, Evolution and Reengineering (SANER) , pages 38–48. IEEE, 2019

work page 2019