Towards Build Optimization Using Digital Twins

Ali Tizghadam; Francis Bordeleau; Henri A\"idasso

arxiv: 2503.19381 · v2 · submitted 2025-03-25 · 💻 cs.SE · cs.LG

Towards Build Optimization Using Digital Twins

Henri A\"idasso , Francis Bordeleau , Ali Tizghadam This is my paper

Pith reviewed 2026-05-22 23:21 UTC · model grok-4.3

classification 💻 cs.SE cs.LG

keywords digital twinscontinuous integrationbuild optimizationCI pipelinesmachine learningprocess modelingsoftware engineering

0 comments

The pith

Digital twins of CI build processes can model duration, failures, and flakiness together to enable ongoing optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current fixes for long, failing, or flaky CI builds treat these problems separately even though they influence one another. It proposes building digital twins of the entire build process so that machine learning on historical data can support simulation of changes and automatic repair suggestions. The authors present the CBDT framework as an initial implementation that includes real-time data collection, performance tracking, what-if analysis, and prescriptive services. A sympathetic reader would see this as a way to move from isolated patches to systematic, data-driven improvement of build pipelines.

Core claim

Developing Digital Twins of build processes enables global and continuous improvement by offering digital shadowing through real-time data acquisition and monitoring, machine-learning models of interrelated build aspects, exploration of what-if scenarios from historical patterns, and prescriptive services that automate failure and performance repairs.

What carries the argument

The CI Build process Digital Twin (CBDT) framework, which supplies real-time build data acquisition, continuous performance monitoring, ML modeling of duration-failure-flakiness relations, what-if scenario exploration, and automated prescriptive repair services.

If this is right

ML models of build duration, failures, and flakiness can reveal their interdependencies instead of treating each issue alone.
What-if scenarios drawn from past data allow prediction of the effects of configuration or code changes before they are applied.
Prescriptive repair services can automatically adjust builds to reduce duration or eliminate flakiness on an ongoing basis.
Continuous monitoring and shadowing functions keep the twin synchronized with the live build process for repeated improvement cycles.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same twin structure could be applied to other CI-adjacent processes such as test suite maintenance or deployment pipelines.
Accuracy of the ML component would need validation on build logs from multiple organizations rather than a single codebase.
Integration with existing CI platforms would require standardized data interfaces for the shadowing step to function without custom engineering.

Load-bearing premise

Machine learning models trained on historical build data can accurately capture the interrelated aspects of build duration, failures, and flakiness sufficiently to support reliable what-if scenarios and prescriptive repair services.

What would settle it

An experiment that applies the framework's suggested repairs to live CI pipelines and measures no sustained reduction in average build duration or failure rate compared with untreated control pipelines.

Figures

Figures reproduced from arXiv: 2503.19381 by Ali Tizghadam, Francis Bordeleau, Henri A\"idasso.

**Figure 2.** Figure 2: Architecture of the CI Build process Digital Twin [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

read the original abstract

Despite the indisputable benefits of Continuous Integration (CI) pipelines (or builds), CI still presents significant challenges regarding long durations, failures, and flakiness. Prior studies addressed CI challenges in isolation, yet these issues are interrelated and require a holistic approach for effective optimization. To bridge this gap, this paper proposes a novel idea of developing Digital Twins (DTs) of build processes to enable global and continuous improvement. To support such an idea, we introduce the CI Build process Digital Twin (CBDT) framework as a minimum viable product. This framework offers digital shadowing functionalities, including real-time build data acquisition and continuous monitoring of build process performance metrics. Furthermore, we discuss guidelines and challenges in the practical implementation of CBDTs, including (1) modeling different aspects of the build process using Machine Learning, (2) exploring what-if scenarios based on historical patterns, and (3) implementing prescriptive services such as automated failure and performance repair to continuously improve build processes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a high-level idea paper proposing a digital twin framework for CI builds with no data, models, or validation behind it.

read the letter

The main takeaway is that this paper offers a conceptual sketch for digital twins of CI processes but supplies no evidence that the approach would work. It names the CBDT framework and argues for treating build duration, failures, and flakiness together rather than in isolation, which prior work has done separately. That observation is fair and worth noting, but the rest stays at the level of listing challenges and high-level guidelines without any concrete steps or examples. What the paper does reasonably is frame the interdependencies and sketch components like real-time data shadowing, ML modeling, what-if exploration, and prescriptive repair. It also flags three practical implementation areas without overclaiming results. Those points give a reader a quick sense of where a digital-twin approach might differ from existing CI tuning tools. The soft spots are central rather than minor. The feasibility claim rests on the idea that historical build data can train models accurate enough for reliable counterfactual queries and automated fixes, yet the manuscript gives no feature suggestions, no loss functions, no toy demonstration, and no discussion of data issues such as sparsity or concept drift. Without even a small worked example, the prescriptive-service promise stays untested. The guidelines for implementation are generic and do not engage with existing ML work on build prediction or with digital-twin literature from other domains. This paper is mainly for researchers already interested in CI optimization or in applying digital twins inside software engineering. A reader looking for methods, datasets, or empirical checks will find little to use. It does not show enough technical grounding to justify sending it through peer review; an editor would likely ask for at least a prototype or simulation before investing referee time.

Referee Report

2 major / 2 minor

Summary. The paper proposes developing Digital Twins of CI build processes to holistically optimize interrelated challenges of build duration, failures, and flakiness (previously addressed in isolation). It introduces the CI Build process Digital Twin (CBDT) framework as an MVP, covering digital shadowing for real-time data acquisition and monitoring, plus guidelines for ML modeling of build aspects, what-if scenario exploration, and prescriptive repair services for continuous improvement.

Significance. If realized with accurate models, the CBDT framework would advance CI optimization by shifting from isolated fixes to a unified, data-driven approach enabling global and continuous build process improvement. The explicit framing of modeling challenges and prescriptive services is a constructive contribution to the vision of DTs in software engineering.

major comments (2)

[Implementation Guidelines (modeling challenges subsection)] The central claim that the CBDT enables reliable what-if scenarios and prescriptive repair rests on the feasibility of ML models capturing the joint distribution of duration, failures, and flakiness from historical data. However, the Implementation Guidelines section asserts this without any concrete feature engineering, loss function, toy demonstration, accuracy metric, or reference to prior work that has jointly modeled these three metrics, leaving the prescriptive-service promise unsupported.
[CBDT Framework] The CBDT framework description presents digital shadowing and the three services as sufficient for global optimization, yet supplies no validation strategy, data requirements, or counter-factual query example. This makes the 'minimum viable product' claim load-bearing on an untested assumption rather than demonstrated capability.

minor comments (2)

[Abstract] The abstract states that the paper 'discuss[es] guidelines and challenges' but the manuscript would benefit from an explicit subsection separating concrete implementation guidelines from open research challenges.
[Introduction] Terminology such as 'digital shadowing functionalities' and 'prescriptive services' is introduced without a short glossary or reference to standard DT literature definitions, which could improve accessibility for SE readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and insightful comments. Our manuscript proposes the CBDT framework as a conceptual MVP to address interrelated CI challenges holistically, outlining components and implementation challenges rather than delivering a fully implemented and validated system. We respond to each major comment below, indicating revisions where appropriate to clarify scope and strengthen the presentation.

read point-by-point responses

Referee: [Implementation Guidelines (modeling challenges subsection)] The central claim that the CBDT enables reliable what-if scenarios and prescriptive repair rests on the feasibility of ML models capturing the joint distribution of duration, failures, and flakiness from historical data. However, the Implementation Guidelines section asserts this without any concrete feature engineering, loss function, toy demonstration, accuracy metric, or reference to prior work that has jointly modeled these three metrics, leaving the prescriptive-service promise unsupported.

Authors: We agree the section provides only high-level guidelines on ML modeling challenges without concrete details such as feature engineering, loss functions, or demonstrations. This aligns with the paper's scope as a vision paper that identifies open challenges rather than claiming existing reliable joint models. The prescriptive services are presented as part of the proposed framework whose feasibility depends on future ML work. We will revise to add references to prior studies on individual metrics (build duration prediction, failure prediction, and flakiness detection) and explicitly note the absence of established joint modeling approaches as a key research gap. revision: yes
Referee: [CBDT Framework] The CBDT framework description presents digital shadowing and the three services as sufficient for global optimization, yet supplies no validation strategy, data requirements, or counter-factual query example. This makes the 'minimum viable product' claim load-bearing on an untested assumption rather than demonstrated capability.

Authors: The CBDT is explicitly framed as an MVP framework outlining core functionalities (digital shadowing plus the three services) to enable the vision, consistent with other conceptual frameworks in software engineering. We acknowledge the current version lacks explicit validation strategies or examples. In revision we will add a dedicated paragraph on potential validation approaches (e.g., retrospective analysis on public CI datasets), high-level data requirements for shadowing, and one illustrative counter-factual query example to demonstrate the intended use of the what-if and prescriptive components. revision: yes

Circularity Check

0 steps flagged

No circularity: high-level conceptual framework with no derivations or fitted quantities

full rationale

The manuscript is a framework proposal that introduces the CBDT concept, lists three implementation challenges (ML modeling, what-if scenarios, prescriptive services), and provides guidelines without any equations, parameter fitting, predictions, or mathematical derivations. No load-bearing steps reduce to self-definition, fitted inputs, or self-citations. The central claim remains an untested feasibility assertion rather than a derivation that collapses by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The paper is a proposal for a new framework; it does not derive results from prior literature but postulates that digital twins can be built and will be useful for CI optimization.

axioms (2)

domain assumption Build process issues (duration, failures, flakiness) are interrelated and therefore require a holistic modeling approach rather than isolated fixes.
Stated in the abstract as the motivation for moving beyond prior studies that addressed issues in isolation.
domain assumption Machine learning can be used to model different aspects of the build process for what-if scenario exploration and prescriptive services.
Listed as one of the three main implementation guidelines in the abstract.

invented entities (1)

CI Build process Digital Twin (CBDT) framework no independent evidence
purpose: To provide digital shadowing, real-time monitoring, ML-based modeling, what-if analysis, and automated repair for CI builds.
Introduced in the abstract as the minimum viable product supporting the digital twin idea.

pith-pipeline@v0.9.0 · 5693 in / 1419 out tokens · 39236 ms · 2026-05-22T23:21:25.967753+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Henri Aïdasso, Mohammed Sayagh, and Francis Bordeleau. 2025. Build Op- timization: A Systematic Literature Review. doi:10.48550/arXiv.2501.11940 arXiv:2501.11940 [cs]

work page doi:10.48550/arxiv.2501.11940 2025
[2]

Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. TravisTorrent: Synthe- sizing Travis CI and GitHub for Full-Stack Research on Continuous Integration. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 447–450. doi:10.1109/MSR.2017.24

work page doi:10.1109/msr.2017.24 2017
[3]

Keheliya Gallaba, John Ewart, Yves Junqueira, and Shane McIntosh. 2022. Ac- celerating Continuous Integration by Caching Environments and Inferring De- pendencies. IEEE Transactions on Software Engineering 48, 6 (2022), 2040 – 2052. http://dx.doi.org/10.1109/TSE.2020.3048335

work page doi:10.1109/tse.2020.3048335 2022
[4]

Ghaleb, Safwat Hassan, and Ying Zou

Taher A. Ghaleb, Safwat Hassan, and Ying Zou. 2023. Studying the Interplay Between the Durations and Breakages of Continuous Integration Builds. IEEE Transactions on Software Engineering 49, 4 (April 2023), 2476–2497. doi:10.1109/ TSE.2022.3222160

work page arXiv 2023
[5]

Mikkelsen, Cláudio Gomes, and Peter G

Santiago Gil, Peter H. Mikkelsen, Cláudio Gomes, and Peter G. Larsen. 2024. Survey on open-source digital twin frameworks–A case study approach.Software: Practice and Experience 54, 6 (2024), 929–960. doi:10.1002/spe.3305

work page doi:10.1002/spe.3305 2024
[6]

Edward Glaessgen and David Stargel. 2012. The Digital Twin Paradigm for Future NASA and U.S. Air Force Vehicles. In 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference; 20th AIAA/ASME/AHS Adaptive Structures Conference; 14th AIAA . American Institute of Aeronautics and Astronautics, Honolulu, Hawaii. doi:10.2514/6.2012-1818

work page doi:10.2514/6.2012-1818 2012
[7]

Stefan Grafberger, Paul Groth, and Sebastian Schelter. 2022. Towards data- centric what-if analysis for native machine learning pipelines. In Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning (DEEM ’22). Association for Computing Machinery, New York, NY, USA, 1–5. doi:10.1145/3533028.3533303

work page doi:10.1145/3533028.3533303 2022
[8]

Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. 2017. Trade-offs in continuous integration: assurance, security, and flexibility. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 197–207. doi:10.1145/3106237.3106270

work page doi:10.1145/3106237.3106270 2017
[9]

Lampel, S

J. Lampel, S. Just, S. Apel, and A. Zeller. 2021. When life gives you oranges: detecting and diagnosing intermittent job failures at Mozilla. In ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York, NY, USA, 1381 – 92. http://dx.doi.org/10.1...

work page doi:10.1145/3468264.3473931 2021
[10]

M. M. Lehman. 1979. On understanding laws, evolution, and conservation in the large-program life cycle. Journal of Systems and Software 1 (Jan. 1979), 213–221. doi:10.1016/0164-1212(79)90022-0

work page doi:10.1016/0164-1212(79)90022-0 1979
[11]

Yiling Lou, Junjie Chen, Lingming Zhang, Dan Hao, and Lu Zhang. 2019. History- driven build failure fixing: how far are we?. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019) . ACM, New York, NY, USA, 43–54. doi:10.1145/3293882.3330578

work page doi:10.1145/3293882.3330578 2019
[12]

Ansong Ni and Ming Li. 2018. ACONA: Active online model adaptation for predicting continuous integration build failures. In Proceedings - International Conference on Software Engineering . Gothenburg, Sweden, 366 – 367. http: //dx.doi.org/10.1145/3183440.3195012 ISSN: 02705257

work page doi:10.1145/3183440.3195012 2018
[13]

Doriane Olewicki, Mathieu Nayrolles, and Bram Adams. 2022. Towards language- independent Brown Build Detection. In Proceedings - International Conference on Software Engineering, Vol. 2022-May. Pittsburgh, PA, United states, 2177 – 2188. http://dx.doi.org/10.1145/3510003.3510122 ISSN: 02705257

work page doi:10.1145/3510003.3510122 2022
[14]

Mali Senapathi, Jim Buchan, and Hady Osman. 2018. DevOps Capabilities, Prac- tices, and Challenges: Insights from a Case Study. In Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018 (EASE ’18). Association for Computing Machinery, New York, NY, USA, 57–67. doi:10.1145/3210459.3210465

work page doi:10.1145/3210459.3210465 2018
[15]

Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. 2024. A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transac- tions on Pattern Analysis and Machine Intelligence 46, 8 (Aug. 2024), 5362–5383. doi:10.1109/TPAMI.2024.3367329

work page doi:10.1109/tpami.2024.3367329 2024

[1] [1]

Henri Aïdasso, Mohammed Sayagh, and Francis Bordeleau. 2025. Build Op- timization: A Systematic Literature Review. doi:10.48550/arXiv.2501.11940 arXiv:2501.11940 [cs]

work page doi:10.48550/arxiv.2501.11940 2025

[2] [2]

Moritz Beller, Georgios Gousios, and Andy Zaidman. 2017. TravisTorrent: Synthe- sizing Travis CI and GitHub for Full-Stack Research on Continuous Integration. In 2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR). 447–450. doi:10.1109/MSR.2017.24

work page doi:10.1109/msr.2017.24 2017

[3] [3]

Keheliya Gallaba, John Ewart, Yves Junqueira, and Shane McIntosh. 2022. Ac- celerating Continuous Integration by Caching Environments and Inferring De- pendencies. IEEE Transactions on Software Engineering 48, 6 (2022), 2040 – 2052. http://dx.doi.org/10.1109/TSE.2020.3048335

work page doi:10.1109/tse.2020.3048335 2022

[4] [4]

Ghaleb, Safwat Hassan, and Ying Zou

Taher A. Ghaleb, Safwat Hassan, and Ying Zou. 2023. Studying the Interplay Between the Durations and Breakages of Continuous Integration Builds. IEEE Transactions on Software Engineering 49, 4 (April 2023), 2476–2497. doi:10.1109/ TSE.2022.3222160

work page arXiv 2023

[5] [5]

Mikkelsen, Cláudio Gomes, and Peter G

Santiago Gil, Peter H. Mikkelsen, Cláudio Gomes, and Peter G. Larsen. 2024. Survey on open-source digital twin frameworks–A case study approach.Software: Practice and Experience 54, 6 (2024), 929–960. doi:10.1002/spe.3305

work page doi:10.1002/spe.3305 2024

[6] [6]

Edward Glaessgen and David Stargel. 2012. The Digital Twin Paradigm for Future NASA and U.S. Air Force Vehicles. In 53rd AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference; 20th AIAA/ASME/AHS Adaptive Structures Conference; 14th AIAA . American Institute of Aeronautics and Astronautics, Honolulu, Hawaii. doi:10.2514/6.2012-1818

work page doi:10.2514/6.2012-1818 2012

[7] [7]

Stefan Grafberger, Paul Groth, and Sebastian Schelter. 2022. Towards data- centric what-if analysis for native machine learning pipelines. In Proceedings of the Sixth Workshop on Data Management for End-To-End Machine Learning (DEEM ’22). Association for Computing Machinery, New York, NY, USA, 1–5. doi:10.1145/3533028.3533303

work page doi:10.1145/3533028.3533303 2022

[8] [8]

Michael Hilton, Nicholas Nelson, Timothy Tunnell, Darko Marinov, and Danny Dig. 2017. Trade-offs in continuous integration: assurance, security, and flexibility. In Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering (ESEC/FSE 2017). Association for Computing Machinery, New York, NY, USA, 197–207. doi:10.1145/3106237.3106270

work page doi:10.1145/3106237.3106270 2017

[9] [9]

Lampel, S

J. Lampel, S. Just, S. Apel, and A. Zeller. 2021. When life gives you oranges: detecting and diagnosing intermittent job failures at Mozilla. In ESEC/FSE 2021: Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering . New York, NY, USA, 1381 – 92. http://dx.doi.org/10.1...

work page doi:10.1145/3468264.3473931 2021

[10] [10]

M. M. Lehman. 1979. On understanding laws, evolution, and conservation in the large-program life cycle. Journal of Systems and Software 1 (Jan. 1979), 213–221. doi:10.1016/0164-1212(79)90022-0

work page doi:10.1016/0164-1212(79)90022-0 1979

[11] [11]

Yiling Lou, Junjie Chen, Lingming Zhang, Dan Hao, and Lu Zhang. 2019. History- driven build failure fixing: how far are we?. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2019) . ACM, New York, NY, USA, 43–54. doi:10.1145/3293882.3330578

work page doi:10.1145/3293882.3330578 2019

[12] [12]

Ansong Ni and Ming Li. 2018. ACONA: Active online model adaptation for predicting continuous integration build failures. In Proceedings - International Conference on Software Engineering . Gothenburg, Sweden, 366 – 367. http: //dx.doi.org/10.1145/3183440.3195012 ISSN: 02705257

work page doi:10.1145/3183440.3195012 2018

[13] [13]

Doriane Olewicki, Mathieu Nayrolles, and Bram Adams. 2022. Towards language- independent Brown Build Detection. In Proceedings - International Conference on Software Engineering, Vol. 2022-May. Pittsburgh, PA, United states, 2177 – 2188. http://dx.doi.org/10.1145/3510003.3510122 ISSN: 02705257

work page doi:10.1145/3510003.3510122 2022

[14] [14]

Mali Senapathi, Jim Buchan, and Hady Osman. 2018. DevOps Capabilities, Prac- tices, and Challenges: Insights from a Case Study. In Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018 (EASE ’18). Association for Computing Machinery, New York, NY, USA, 57–67. doi:10.1145/3210459.3210465

work page doi:10.1145/3210459.3210465 2018

[15] [15]

Liyuan Wang, Xingxing Zhang, Hang Su, and Jun Zhu. 2024. A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transac- tions on Pattern Analysis and Machine Intelligence 46, 8 (Aug. 2024), 5362–5383. doi:10.1109/TPAMI.2024.3367329

work page doi:10.1109/tpami.2024.3367329 2024