Reasoning Beyond Prediction: From Data-Driven to Causal Software Engineering

Luca Giamattei; Roberto Pietrantuono; Stefano Russo

arxiv: 2606.27960 · v1 · pith:RUAXZZNXnew · submitted 2026-06-26 · 💻 cs.SE · cs.AI

Reasoning Beyond Prediction: From Data-Driven to Causal Software Engineering

Roberto Pietrantuono , Luca Giamattei , Stefano Russo This is my paper

Pith reviewed 2026-06-29 03:57 UTC · model grok-4.3

classification 💻 cs.SE cs.AI

keywords causal reasoningsoftware engineeringdeep learninghuman-machine cooperationintelligent supportcausal inference

0 comments

The pith

Software engineering needs machines that amplify causal reasoning rather than only making data-driven predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that deep learning tools, while useful for automation and pattern-based predictions, cannot meet the demands of increasingly complex software development involving interdependent tasks across AI products, distributed systems, and cyber-physical environments. It calls for a shift to a human-machine cooperation model in which machines help engineers reason about causes and effects to provide more intelligent support. A sympathetic reader would care because current methods leave a gap in handling the creative and quality-assurance aspects of modern software work, where correlations alone prove insufficient.

Core claim

The authors call for a new paradigm of human-machine cooperation in software engineering where machines actively amplify engineers' reasoning through the lens of causation, rather than limiting themselves to automating routine tasks or predicting from learned patterns.

What carries the argument

The lens of causation, which shifts machine support from pattern prediction to helping engineers analyze cause-effect relationships in interdependent development tasks.

If this is right

Tools could move beyond routine automation to support creative decision-making in complex system design.
Human-machine cooperation would incorporate explicit cause-effect analysis for quality assurance and architecture decisions.
Support systems would address the limitations of correlation-based methods in pervasively distributed and embedded environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adapting causal inference methods from statistics could be tested first on requirements traceability or change impact analysis.
Future developer assistants might combine causal graphs with existing code models to flag potential side effects of modifications.
This shift could influence how AI is integrated into cyber-physical system development where physical outcomes depend on causal chains.

Load-bearing premise

Current data-driven deep learning methods are insufficient for modern software development demands and causal reasoning approaches can supply the missing intelligent support.

What would settle it

A study that measures whether introducing causal analysis tools improves engineer performance on tasks such as debugging interdependent components or assuring quality in distributed systems, compared against existing deep learning tools.

Figures

Figures reproduced from arXiv: 2606.27960 by Luca Giamattei, Roberto Pietrantuono, Stefano Russo.

**Figure 1.** Figure 1: The CSE conceptual framework Although there are many ways to instantiate a CSE solution for a specific problem, the idea is thatcausal modelsshould be at the core: they must be used to drive and check the reasoning process to solve the task rigorously and transparently. Rigorously means capable of quantifying the causal effects along the whole reasoning chain and harnessing such estimates to give the … view at source ↗

**Figure 2.** Figure 2: Causal methods in so ware engineering from the inter [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗

read the original abstract

Software engineering is an intellectually demanding, creative discipline that juggles a web of interdependent tasks to design, build, and assure the quality of increasingly complex systems. As our expectations from software soar - with demands spanning AI-driven products, pervasively distributed and cloud-native architectures, and deeply embedded cyber-physical environments - its complexity steadily increases. In response, a new wave of co-engineering methods and tools, fueled by deep learning, has emerged to augment the process, enhancing automation and decision support. Yet, these advances remain far from delivering the kind of intelligent support that modern software development demands. We call for a new paradigm of human-machine cooperation: one where machines don't just automate routine tasks or predict from learned patterns, but actively amplify engineers' reasoning through the lens of causation. As software becomes smarter, a smarter support is needed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Position paper calling for causal over predictive AI in software engineering tools, but with no methods, evidence, or examples to back the shift.

read the letter

This is a short position paper arguing that software engineering needs tools built around causal reasoning rather than the current wave of deep-learning prediction. It points out the growing complexity of modern SE work and claims that pattern-based automation won't deliver the kind of support engineers actually need.

The piece does a reasonable job laying out why SE tasks are interdependent and creative in ways that might exceed simple learned correlations. Framing the issue as a need for machines to amplify causal reasoning is the clearest new angle, even if the predictive-versus-causal distinction itself is not original to this domain.

The main problem is that the argument stops at the level of advocacy. There are no concrete examples of where current data-driven tools fail in SE, no sketch of a causal alternative, and no comparison or small case that would let a reader judge whether the proposed direction would actually help. The central claim about needing smarter, causation-based support is asserted rather than developed.

The paper is aimed at researchers already working on AI for software engineering who might want to explore causal methods. It offers little for someone looking for usable techniques or evidence. I would not send it for peer review in its current form; it would need technical content or at least worked examples before it merits referee time.

Referee Report

1 major / 0 minor

Summary. The manuscript is a position paper that argues current data-driven deep learning methods for augmenting software engineering tasks fall short of providing the intelligent support needed for increasingly complex systems, and calls for a paradigm shift toward causal reasoning to actively amplify engineers' reasoning in human-machine cooperation.

Significance. If developed with concrete methods and evidence, the advocated shift from prediction to causation could stimulate new research directions in AI-assisted software engineering; however, the manuscript provides no technical constructions, examples, or comparisons, limiting its immediate contribution.

major comments (1)

[Abstract] Abstract: the central motivation that deep-learning advances 'remain far from delivering the kind of intelligent support that modern software development demands' is stated without any supporting examples, empirical comparisons, cited limitations, or references, which is load-bearing for the call to a new causal paradigm.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our position paper. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the central motivation that deep-learning advances 'remain far from delivering the kind of intelligent support that modern software development demands' is stated without any supporting examples, empirical comparisons, cited limitations, or references, which is load-bearing for the call to a new causal paradigm.

Authors: We agree that the abstract's motivational claim would be strengthened by explicit support. Although position papers often open with a synthesized view of field limitations, the referee is correct that this statement is load-bearing. In revision we will augment the abstract with one or two concise examples (e.g., brittleness of DL-based code completion under novel requirements) together with citations to recent surveys documenting generalization and explainability shortfalls in data-driven SE tools. This will better anchor the subsequent call for causal reasoning without altering the position-paper character of the work. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a position/advocacy paper whose central claim is a call for a paradigm shift toward causal reasoning in software engineering tools. It contains no equations, derivations, fitted parameters, models, or technical constructions whose correctness could reduce to self-definition, self-citation, or renaming of inputs. The abstract and described content state that data-driven methods fall short and advocate causation without any load-bearing formal step that could be circular by construction. No self-citation chains or ansatzes are invoked to justify a result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a position statement with no technical derivations, data fits, or formal structures; it rests on the premise that causal methods are superior without introducing countable free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5668 in / 957 out tokens · 62044 ms · 2026-06-29T03:57:23.971391+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 28 canonical work pages

[1]

Sebastian Baltes, Timo Speith, Brenda Chiteri, Seyedmoein Mohs enimoﬁdi, Shalini Chakraborty, and Daniel Buschek
[2]

IEEE Transactions on Software Engineering (2026), 1–18

On the Need to Rethink Trust in AI Assistants for Software Dev elopment: A Critical Review. IEEE Transactions on Software Engineering (2026), 1–18. https://doi.org/10.1109/TSE.2026.3659804

work page doi:10.1109/tse.2026.3659804 2026
[3]

Sándor Battaglini-Fischer, Nishanthi Srinivasan, Bálint László Szar vas, Xiaoyu Chu, and Alexandru Iosup. 2025. FAILS: A Framework for Automated Collection and Analysis of LLM Service Incidents. In Companion of the 16th ACM/SPEC In- ternational Conference on Performance Engineering (ICPE ’25). ACM, 187–194. https://doi.org/10.1145/3680256.3721320

work page doi:10.1145/3680256.3721320 2025
[4]

Zhang, Max Hort, Mark Harman, and Federica Sa rro

Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, and Federica Sa rro. 2024. Fairness Testing: A Compre- hensive Survey and Analysis of Trends. ACM Trans. Softw. Eng. Methodol. 33, 5, Article 137 (June 2024), 59 pages. https://doi.org/10.1145/3652155

work page doi:10.1145/3652155 2024
[5]

Wuersching, D

Andrew G. Clark, Michael Foster, Neil Walkinshaw, and Robert M. Hierons. 2023. Metamorphic Testing with Causal Graphs. In 2023 IEEE Conference on Software Testing, Veriﬁcation and V alidation (ICST) . 153–164. https://doi.org/10.1109/ICST57152.2023.00023

work page doi:10.1109/icst57152.2023.00023 2023
[6]

Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch. 2023. CNCF Annual Survey 2023. Tech- nical Report. Linux Foundation Research. https://www.cncf.io/rep orts/cncf-annual-survey-2023/ Accessed: 2026-03- 16

2023
[7]

Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch
[8]

Technical Report

CNCF Annual Survey 2024 . Technical Report. Linux Foundation Research. https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf Accessed: 2026-03-16. Report presenting results of the 2024 CNCF Annual Survey

2024
[9]

Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild

Michael Foster, Robert M. Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild. 2025. Using causal infer- ence to test systems with hidden and interacting variables: an evaluat ive case study. In 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025). ACM. https://doi.org/10.1145/3756681.3756967

work page doi:10.1145/3756681.3756967 2025
[10]

Luca Giamattei, Antonio Guerriero, Ivano Malavolta, Cristian Mas cia, Roberto Pietrantuono, and Stefano Russo. 2024. Identifying Performance Issues in Microservice Architectures throu gh Causal Reasoning. In 2024 IEEE/ACM Interna- tional Conference on Automation of Software Test (AST) . 149–153

2024
[11]

Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Stef ano Russo. 2024. Causality-driven Testing of Autonomous Driving Systems. ACM Transactions on Software Engineering and Methodology 33, 3, Article 74 (2024). https://doi.org/10.1145/3635709

work page doi:10.1145/3635709 2024
[12]

Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Ste fano Russo. 2025. Causal reasoning in Software Quality Assurance: A systematic review. Information and Software Technology 178 (2025), 107599. https://doi.org/10.1016/j.infsof.2024.107599

work page doi:10.1016/j.infsof.2024.107599 2025
[13]

Richard Hahn, and Huan Liu

Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2 020. A Survey of Learning Causality with Data: Problems and Methods. Comput. Surveys 53, 4, Article 75 (2020), 37 pages. https://doi.org/10.1145/ 3397269

2020
[14]

Oumayma Hamdi, Ali Ouni, Eman Abdullah AlOmar, Mel Ó Cinnéide, a nd Mohamed Wiem Mkaouer
[15]

In 2021 IEEE/ACM 8th International Conference on Mobile Softw are Engineering and Systems (MobileSoft)

An Empirical Study on the Impact of Refactoring on Quality Me trics in Android Applications. In 2021 IEEE/ACM 8th International Conference on Mobile Softw are Engineering and Systems (MobileSoft) . 28–39. https://doi.org/10.1109/MobileSoft52590.2021.00010

work page doi:10.1109/mobilesoft52590.2021.00010 2021
[16]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Syste matic Literature Review. ACM Trans. Softw. Eng. Methodol. 33, 8, Article 220 (Dec. 2024), 79 pages. https://doi.org/10. 1145/3695988

2024
[17]

Yamin Hu, Wenjian Luo, and Zongyao Hu. 2023. A practical approa ch to explaining defect proneness of code commits by causal discovery. Engineering Applications of Artiﬁcial Intelligence 123 (2023), 106187. https://doi.org/10.1016/j.engappai.2023.106187

work page doi:10.1016/j.engappai.2023.106187 2023
[18]

Kaicheng Huang, Fanyu Wang, Yutan Huang, and Chetan Arora. 2025 . Prompt Engineering for Requirements En- gineering: A Literature Review and Roadmap . In 33rd International Requirements Engineering Conference W orkshops. IEEE, Los Alamitos, CA, USA, 548–557. https://doi.org/10.11 09/REW66121.2025.00081

arXiv 2025
[19]

Eisty, and Tim Menzies

Jeremy Hulse, Nasir U. Eisty, and Tim Menzies. 2025. Shaky stru ctures: The wobbly world of causal graphs in software analytics. Empirical Software Engineering 30, 5 (2025), 142. https://doi.org/10.1007/s10664-025-10 690-6

work page doi:10.1007/s10664-025-10 2025
[20]

Guido W. Imbens. 2020. Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empir- ical Practice in Economics. Journal of Economic Literature 58, 4 (2020), 1129–79. https://doi.org/10.1257/jel.20191 597

work page doi:10.1257/jel.20191 2020
[21]

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2025 . A Survey on Large Language Models for Code Generation. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3747588

work page doi:10.1145/3747588 2025
[22]

Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: understanding defects’ root causes. In 2020 ACM/IEEE 42nd International Conference on Software En gineering (ICSE) (Seoul, South Korea). ACM, 87–99. https://doi.org/10.1145/3377811.3380377 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . Advancing SE through the ...

work page doi:10.1145/3377811.3380377 2020
[23]

Palacio, Yixuan Zhang, and Denys Posh yvanyk

Dipin Khati, Yijin Liu, David N. Palacio, Yixuan Zhang, and Denys Posh yvanyk. 2025. Mapping the Trust Terrain: LLMs in Software Engineering - Insights and Perspectives. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3771282

work page doi:10.1145/3771282 2025
[24]

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. 2025. C ausal Reasoning and Large Language Models: Opening a New Frontier for Causality. Transactions on Machine Learning Research (2025), 2835–8856. https://openreview.net/forum?id=mqoxLkX210

2025
[25]

Gang Li and Honghua Dai. 2004. What will aﬀect software reuse: A causal model analysis. International Journal of Software Engineering and Knowledge Engineering 14, 03 (2004), 351–364. https://doi.org/10.1142/S0218194 00400166X

work page doi:10.1142/s0218194 2004
[26]

Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, and Kun Zhang. 2024. Discovery of the Hidden World with Large Language Models. In38th Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=w50ICQC6QJ

2024
[27]

Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zho u, Fuxiao Liu, Tianrui Guan, Hao- liang Wang, Tong Yu, Julian McAuley, Wei Ai, and Furong Huang. 2025. Lar ge Language Models and Causal Inference in Collaboration: A Comprehensive Survey. In Findings of the Association for Computational Linguis- tics: NAACL 2025 , Luis Chiruzzo, Alan R...

work page doi:10.18653/v1/2025 2025
[28]

Yacine Majdoub and Eya Ben Charrada. 2024. Debugging with Open- Source Large Language Models: An Evalu- ation. In Proceedings of the 18th ACM/IEEE International Symposium o n Empirical Software Engineering and Mea- surement (Barcelona, Spain) (ESEM ’24). Association for Computing Machinery, New York, NY, USA, 510–516. https://doi.org/10.1145/3674805.3690758

work page doi:10.1145/3674805.3690758 2024
[29]

Silverio Martínez-Fernández, Justus Bogner, Xavier Franch, Mar c Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. 2022. Software Engineering for AI-Base d Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 37e (April 2022), 59 pages. https://doi.org/10. 1145/3487043

2022
[30]

Cristian Mascia, Antonio Guerriero, Luca Giamattei, Roberto Piet rantuono, and Stefano Russo. 2025. Mi- croservices Performance Testing with Causality-enhanced Large Lang uage Models. In 2025 IEEE/ACM Second International Conference on AI Foundation Models an d Software Engineering (Forge) . 136–140. https://doi.org/10.1109/Forge66646.2025.00022

work page doi:10.1109/forge66646.2025.00022 2025
[31]

Bhuwan Paudel, Javier Gonzalez-Huerta, and Ehsan Zabarda st. 2026. Temporal Evolution of Architectural Complexity and Technical Debt in Microservices: An Exploratory Case Study. In Product-Focused Software Process Improvement , Giuseppe Scanniello, Valentina Lenarduzzi, Simone Romano, Sira Vegas, and Rita Francese (Eds.). Springer Nature Switzerland, Cham...

2026
[32]

Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Eﬀect (1st ed.). Basic Books, Inc., USA

2018
[33]

Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki Va n Stein, and Thomas Bäck. 2025. Multi-Step Reasoning with Large Language Models, a Survey. ACM Comput. Surv. 58, 6, Article 160 (Dec. 2025), 35 pages. https://doi.org/10.1145/3774896

work page doi:10.1145/3774896 2025
[34]

Advait Sarkar. 2024. AI Should Challenge, Not Obey. Commun. ACM 67, 10 (Sept. 2024), 18–21. https://doi.org/10.1145/3649404

work page doi:10.1145/3649404 2024
[35]

Julien Siebert. 2023. Applications of Statistical Causal Infer ence in Software Engineering. Information and Software Technology 159, C (2023), 16 pages. https://doi.org/10.1016/j.infsof.2 023.107198

work page doi:10.1016/j.infsof.2 2023
[36]

Alex Singla, Alexander Sukharevsky, Lareina Yee, and Michael C hui. 2024. The State of AI in Early 2024: Gen AI Adoption Spikes and Starts to Generate Value. McKinsey & Company. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024 Accessed: 2026-03-16

2024
[37]

Should I Give Up Now?

Jiessie Tie, Bingsheng Yao, Tianshi Li, Hongbo Fang, Syed Ishtiaque A hmed, Dakuo Wang, and Shurui Zhou. 2026. "Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering. ACM Trans. Softw. Eng. Methodol.(March 2026). https://doi.org/10.1145/3801972

work page doi:10.1145/3801972 2026
[38]

Guangya Wan, Yunsheng Lu, Yuqi Wu, Mengxuan Hu, and Sheng Li. 2025. Large language models for causal discovery: current landscape and future directions. In 34th International Joint Conference on Artiﬁcial Intellig ence (Montreal, Canada) (IJCAI ’25). Article 1186, 9 pages. https://doi.org/10.24963/ijcai.2025 /1186

work page doi:10.24963/ijcai.2025 2025
[39]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing W ang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering 50, 4 (2024), 911–

2024
[40]

https://doi.org/10.1109/TSE.2024.3368208

work page doi:10.1109/tse.2024.3368208 2024
[41]

Lei Wang, Shanshan Huang, Shu Wang, Jun Liao, Tingpeng Li, and Li Liu. 2024. A survey of causal dis- covery based on functional causal model. Engineering Applications of Artiﬁcial Intelligence 133 (2024), 108258. https://doi.org/10.1016/j.engappai.2024.108258

work page doi:10.1016/j.engappai.2024.108258 2024
[42]

Simin Wang, Liguo Huang, Amiao Gao, Jidong Ge, Tengfei Zhang, Haitao Feng, Ishna Satyarth, Ming Li, He Zhang, and Vincent Ng. 2023. Machine/Deep Learning for Software Engineering: A Sys tematic Literature Review. IEEE ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . 14 Pietrantuono et al. Transactions on Software Engineering 49, 3 (202...

work page doi:10.1109/tse.202 2023
[43]

Li Wu, Johan Tordsson, Erik Elmroth, and Odej Kao. 2021. Cau sal Inference Techniques for Microservice Performance Diagnosis: Evaluation and Guiding Recommendations. In IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 21–30. https://doi.org/10.1109/ACSOS52086.2021. 00029

work page doi:10.1109/acsos52086.2021 2021
[44]

Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Y ang, and Zhenyu Chen. 2026. A System- atic Literature Review on Large Language Models for Automated Pr ogram Repair. ACM Trans. Softw. Eng. Methodol. (March 2026). https://doi.org/10.1145/3799693 Received June 29, 2026 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026

work page doi:10.1145/3799693 2026

[1] [1]

Sebastian Baltes, Timo Speith, Brenda Chiteri, Seyedmoein Mohs enimoﬁdi, Shalini Chakraborty, and Daniel Buschek

[2] [2]

IEEE Transactions on Software Engineering (2026), 1–18

On the Need to Rethink Trust in AI Assistants for Software Dev elopment: A Critical Review. IEEE Transactions on Software Engineering (2026), 1–18. https://doi.org/10.1109/TSE.2026.3659804

work page doi:10.1109/tse.2026.3659804 2026

[3] [3]

Sándor Battaglini-Fischer, Nishanthi Srinivasan, Bálint László Szar vas, Xiaoyu Chu, and Alexandru Iosup. 2025. FAILS: A Framework for Automated Collection and Analysis of LLM Service Incidents. In Companion of the 16th ACM/SPEC In- ternational Conference on Performance Engineering (ICPE ’25). ACM, 187–194. https://doi.org/10.1145/3680256.3721320

work page doi:10.1145/3680256.3721320 2025

[4] [4]

Zhang, Max Hort, Mark Harman, and Federica Sa rro

Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, and Federica Sa rro. 2024. Fairness Testing: A Compre- hensive Survey and Analysis of Trends. ACM Trans. Softw. Eng. Methodol. 33, 5, Article 137 (June 2024), 59 pages. https://doi.org/10.1145/3652155

work page doi:10.1145/3652155 2024

[5] [5]

Wuersching, D

Andrew G. Clark, Michael Foster, Neil Walkinshaw, and Robert M. Hierons. 2023. Metamorphic Testing with Causal Graphs. In 2023 IEEE Conference on Software Testing, Veriﬁcation and V alidation (ICST) . 153–164. https://doi.org/10.1109/ICST57152.2023.00023

work page doi:10.1109/icst57152.2023.00023 2023

[6] [6]

Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch. 2023. CNCF Annual Survey 2023. Tech- nical Report. Linux Foundation Research. https://www.cncf.io/rep orts/cncf-annual-survey-2023/ Accessed: 2026-03- 16

2023

[7] [7]

Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch

[8] [8]

Technical Report

CNCF Annual Survey 2024 . Technical Report. Linux Foundation Research. https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf Accessed: 2026-03-16. Report presenting results of the 2024 CNCF Annual Survey

2024

[9] [9]

Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild

Michael Foster, Robert M. Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild. 2025. Using causal infer- ence to test systems with hidden and interacting variables: an evaluat ive case study. In 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025). ACM. https://doi.org/10.1145/3756681.3756967

work page doi:10.1145/3756681.3756967 2025

[10] [10]

Luca Giamattei, Antonio Guerriero, Ivano Malavolta, Cristian Mas cia, Roberto Pietrantuono, and Stefano Russo. 2024. Identifying Performance Issues in Microservice Architectures throu gh Causal Reasoning. In 2024 IEEE/ACM Interna- tional Conference on Automation of Software Test (AST) . 149–153

2024

[11] [11]

Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Stef ano Russo. 2024. Causality-driven Testing of Autonomous Driving Systems. ACM Transactions on Software Engineering and Methodology 33, 3, Article 74 (2024). https://doi.org/10.1145/3635709

work page doi:10.1145/3635709 2024

[12] [12]

Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Ste fano Russo. 2025. Causal reasoning in Software Quality Assurance: A systematic review. Information and Software Technology 178 (2025), 107599. https://doi.org/10.1016/j.infsof.2024.107599

work page doi:10.1016/j.infsof.2024.107599 2025

[13] [13]

Richard Hahn, and Huan Liu

Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2 020. A Survey of Learning Causality with Data: Problems and Methods. Comput. Surveys 53, 4, Article 75 (2020), 37 pages. https://doi.org/10.1145/ 3397269

2020

[14] [14]

Oumayma Hamdi, Ali Ouni, Eman Abdullah AlOmar, Mel Ó Cinnéide, a nd Mohamed Wiem Mkaouer

[15] [15]

In 2021 IEEE/ACM 8th International Conference on Mobile Softw are Engineering and Systems (MobileSoft)

An Empirical Study on the Impact of Refactoring on Quality Me trics in Android Applications. In 2021 IEEE/ACM 8th International Conference on Mobile Softw are Engineering and Systems (MobileSoft) . 28–39. https://doi.org/10.1109/MobileSoft52590.2021.00010

work page doi:10.1109/mobilesoft52590.2021.00010 2021

[16] [16]

Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Syste matic Literature Review. ACM Trans. Softw. Eng. Methodol. 33, 8, Article 220 (Dec. 2024), 79 pages. https://doi.org/10. 1145/3695988

2024

[17] [17]

Yamin Hu, Wenjian Luo, and Zongyao Hu. 2023. A practical approa ch to explaining defect proneness of code commits by causal discovery. Engineering Applications of Artiﬁcial Intelligence 123 (2023), 106187. https://doi.org/10.1016/j.engappai.2023.106187

work page doi:10.1016/j.engappai.2023.106187 2023

[18] [18]

Kaicheng Huang, Fanyu Wang, Yutan Huang, and Chetan Arora. 2025 . Prompt Engineering for Requirements En- gineering: A Literature Review and Roadmap . In 33rd International Requirements Engineering Conference W orkshops. IEEE, Los Alamitos, CA, USA, 548–557. https://doi.org/10.11 09/REW66121.2025.00081

arXiv 2025

[19] [19]

Eisty, and Tim Menzies

Jeremy Hulse, Nasir U. Eisty, and Tim Menzies. 2025. Shaky stru ctures: The wobbly world of causal graphs in software analytics. Empirical Software Engineering 30, 5 (2025), 142. https://doi.org/10.1007/s10664-025-10 690-6

work page doi:10.1007/s10664-025-10 2025

[20] [20]

Guido W. Imbens. 2020. Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empir- ical Practice in Economics. Journal of Economic Literature 58, 4 (2020), 1129–79. https://doi.org/10.1257/jel.20191 597

work page doi:10.1257/jel.20191 2020

[21] [21]

Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2025 . A Survey on Large Language Models for Code Generation. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3747588

work page doi:10.1145/3747588 2025

[22] [22]

Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: understanding defects’ root causes. In 2020 ACM/IEEE 42nd International Conference on Software En gineering (ICSE) (Seoul, South Korea). ACM, 87–99. https://doi.org/10.1145/3377811.3380377 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . Advancing SE through the ...

work page doi:10.1145/3377811.3380377 2020

[23] [23]

Palacio, Yixuan Zhang, and Denys Posh yvanyk

Dipin Khati, Yijin Liu, David N. Palacio, Yixuan Zhang, and Denys Posh yvanyk. 2025. Mapping the Trust Terrain: LLMs in Software Engineering - Insights and Perspectives. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3771282

work page doi:10.1145/3771282 2025

[24] [24]

Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. 2025. C ausal Reasoning and Large Language Models: Opening a New Frontier for Causality. Transactions on Machine Learning Research (2025), 2835–8856. https://openreview.net/forum?id=mqoxLkX210

2025

[25] [25]

Gang Li and Honghua Dai. 2004. What will aﬀect software reuse: A causal model analysis. International Journal of Software Engineering and Knowledge Engineering 14, 03 (2004), 351–364. https://doi.org/10.1142/S0218194 00400166X

work page doi:10.1142/s0218194 2004

[26] [26]

Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, and Kun Zhang. 2024. Discovery of the Hidden World with Large Language Models. In38th Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=w50ICQC6QJ

2024

[27] [27]

Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zho u, Fuxiao Liu, Tianrui Guan, Hao- liang Wang, Tong Yu, Julian McAuley, Wei Ai, and Furong Huang. 2025. Lar ge Language Models and Causal Inference in Collaboration: A Comprehensive Survey. In Findings of the Association for Computational Linguis- tics: NAACL 2025 , Luis Chiruzzo, Alan R...

work page doi:10.18653/v1/2025 2025

[28] [28]

Yacine Majdoub and Eya Ben Charrada. 2024. Debugging with Open- Source Large Language Models: An Evalu- ation. In Proceedings of the 18th ACM/IEEE International Symposium o n Empirical Software Engineering and Mea- surement (Barcelona, Spain) (ESEM ’24). Association for Computing Machinery, New York, NY, USA, 510–516. https://doi.org/10.1145/3674805.3690758

work page doi:10.1145/3674805.3690758 2024

[29] [29]

Silverio Martínez-Fernández, Justus Bogner, Xavier Franch, Mar c Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. 2022. Software Engineering for AI-Base d Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 37e (April 2022), 59 pages. https://doi.org/10. 1145/3487043

2022

[30] [30]

Cristian Mascia, Antonio Guerriero, Luca Giamattei, Roberto Piet rantuono, and Stefano Russo. 2025. Mi- croservices Performance Testing with Causality-enhanced Large Lang uage Models. In 2025 IEEE/ACM Second International Conference on AI Foundation Models an d Software Engineering (Forge) . 136–140. https://doi.org/10.1109/Forge66646.2025.00022

work page doi:10.1109/forge66646.2025.00022 2025

[31] [31]

Bhuwan Paudel, Javier Gonzalez-Huerta, and Ehsan Zabarda st. 2026. Temporal Evolution of Architectural Complexity and Technical Debt in Microservices: An Exploratory Case Study. In Product-Focused Software Process Improvement , Giuseppe Scanniello, Valentina Lenarduzzi, Simone Romano, Sira Vegas, and Rita Francese (Eds.). Springer Nature Switzerland, Cham...

2026

[32] [32]

Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Eﬀect (1st ed.). Basic Books, Inc., USA

2018

[33] [33]

Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki Va n Stein, and Thomas Bäck. 2025. Multi-Step Reasoning with Large Language Models, a Survey. ACM Comput. Surv. 58, 6, Article 160 (Dec. 2025), 35 pages. https://doi.org/10.1145/3774896

work page doi:10.1145/3774896 2025

[34] [34]

Advait Sarkar. 2024. AI Should Challenge, Not Obey. Commun. ACM 67, 10 (Sept. 2024), 18–21. https://doi.org/10.1145/3649404

work page doi:10.1145/3649404 2024

[35] [35]

Julien Siebert. 2023. Applications of Statistical Causal Infer ence in Software Engineering. Information and Software Technology 159, C (2023), 16 pages. https://doi.org/10.1016/j.infsof.2 023.107198

work page doi:10.1016/j.infsof.2 2023

[36] [36]

Alex Singla, Alexander Sukharevsky, Lareina Yee, and Michael C hui. 2024. The State of AI in Early 2024: Gen AI Adoption Spikes and Starts to Generate Value. McKinsey & Company. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024 Accessed: 2026-03-16

2024

[37] [37]

Should I Give Up Now?

Jiessie Tie, Bingsheng Yao, Tianshi Li, Hongbo Fang, Syed Ishtiaque A hmed, Dakuo Wang, and Shurui Zhou. 2026. "Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering. ACM Trans. Softw. Eng. Methodol.(March 2026). https://doi.org/10.1145/3801972

work page doi:10.1145/3801972 2026

[38] [38]

Guangya Wan, Yunsheng Lu, Yuqi Wu, Mengxuan Hu, and Sheng Li. 2025. Large language models for causal discovery: current landscape and future directions. In 34th International Joint Conference on Artiﬁcial Intellig ence (Montreal, Canada) (IJCAI ’25). Article 1186, 9 pages. https://doi.org/10.24963/ijcai.2025 /1186

work page doi:10.24963/ijcai.2025 2025

[39] [39]

Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing W ang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering 50, 4 (2024), 911–

2024

[40] [40]

https://doi.org/10.1109/TSE.2024.3368208

work page doi:10.1109/tse.2024.3368208 2024

[41] [41]

Lei Wang, Shanshan Huang, Shu Wang, Jun Liao, Tingpeng Li, and Li Liu. 2024. A survey of causal dis- covery based on functional causal model. Engineering Applications of Artiﬁcial Intelligence 133 (2024), 108258. https://doi.org/10.1016/j.engappai.2024.108258

work page doi:10.1016/j.engappai.2024.108258 2024

[42] [42]

Simin Wang, Liguo Huang, Amiao Gao, Jidong Ge, Tengfei Zhang, Haitao Feng, Ishna Satyarth, Ming Li, He Zhang, and Vincent Ng. 2023. Machine/Deep Learning for Software Engineering: A Sys tematic Literature Review. IEEE ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . 14 Pietrantuono et al. Transactions on Software Engineering 49, 3 (202...

work page doi:10.1109/tse.202 2023

[43] [43]

Li Wu, Johan Tordsson, Erik Elmroth, and Odej Kao. 2021. Cau sal Inference Techniques for Microservice Performance Diagnosis: Evaluation and Guiding Recommendations. In IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 21–30. https://doi.org/10.1109/ACSOS52086.2021. 00029

work page doi:10.1109/acsos52086.2021 2021

[44] [44]

Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Y ang, and Zhenyu Chen. 2026. A System- atic Literature Review on Large Language Models for Automated Pr ogram Repair. ACM Trans. Softw. Eng. Methodol. (March 2026). https://doi.org/10.1145/3799693 Received June 29, 2026 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026

work page doi:10.1145/3799693 2026