Reasoning Beyond Prediction: From Data-Driven to Causal Software Engineering
Pith reviewed 2026-06-29 03:57 UTC · model grok-4.3
The pith
Software engineering needs machines that amplify causal reasoning rather than only making data-driven predictions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors call for a new paradigm of human-machine cooperation in software engineering where machines actively amplify engineers' reasoning through the lens of causation, rather than limiting themselves to automating routine tasks or predicting from learned patterns.
What carries the argument
The lens of causation, which shifts machine support from pattern prediction to helping engineers analyze cause-effect relationships in interdependent development tasks.
If this is right
- Tools could move beyond routine automation to support creative decision-making in complex system design.
- Human-machine cooperation would incorporate explicit cause-effect analysis for quality assurance and architecture decisions.
- Support systems would address the limitations of correlation-based methods in pervasively distributed and embedded environments.
Where Pith is reading between the lines
- Adapting causal inference methods from statistics could be tested first on requirements traceability or change impact analysis.
- Future developer assistants might combine causal graphs with existing code models to flag potential side effects of modifications.
- This shift could influence how AI is integrated into cyber-physical system development where physical outcomes depend on causal chains.
Load-bearing premise
Current data-driven deep learning methods are insufficient for modern software development demands and causal reasoning approaches can supply the missing intelligent support.
What would settle it
A study that measures whether introducing causal analysis tools improves engineer performance on tasks such as debugging interdependent components or assuring quality in distributed systems, compared against existing deep learning tools.
Figures
read the original abstract
Software engineering is an intellectually demanding, creative discipline that juggles a web of interdependent tasks to design, build, and assure the quality of increasingly complex systems. As our expectations from software soar - with demands spanning AI-driven products, pervasively distributed and cloud-native architectures, and deeply embedded cyber-physical environments - its complexity steadily increases. In response, a new wave of co-engineering methods and tools, fueled by deep learning, has emerged to augment the process, enhancing automation and decision support. Yet, these advances remain far from delivering the kind of intelligent support that modern software development demands. We call for a new paradigm of human-machine cooperation: one where machines don't just automate routine tasks or predict from learned patterns, but actively amplify engineers' reasoning through the lens of causation. As software becomes smarter, a smarter support is needed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript is a position paper that argues current data-driven deep learning methods for augmenting software engineering tasks fall short of providing the intelligent support needed for increasingly complex systems, and calls for a paradigm shift toward causal reasoning to actively amplify engineers' reasoning in human-machine cooperation.
Significance. If developed with concrete methods and evidence, the advocated shift from prediction to causation could stimulate new research directions in AI-assisted software engineering; however, the manuscript provides no technical constructions, examples, or comparisons, limiting its immediate contribution.
major comments (1)
- [Abstract] Abstract: the central motivation that deep-learning advances 'remain far from delivering the kind of intelligent support that modern software development demands' is stated without any supporting examples, empirical comparisons, cited limitations, or references, which is load-bearing for the call to a new causal paradigm.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our position paper. We address the single major comment below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central motivation that deep-learning advances 'remain far from delivering the kind of intelligent support that modern software development demands' is stated without any supporting examples, empirical comparisons, cited limitations, or references, which is load-bearing for the call to a new causal paradigm.
Authors: We agree that the abstract's motivational claim would be strengthened by explicit support. Although position papers often open with a synthesized view of field limitations, the referee is correct that this statement is load-bearing. In revision we will augment the abstract with one or two concise examples (e.g., brittleness of DL-based code completion under novel requirements) together with citations to recent surveys documenting generalization and explainability shortfalls in data-driven SE tools. This will better anchor the subsequent call for causal reasoning without altering the position-paper character of the work. revision: yes
Circularity Check
No significant circularity
full rationale
This is a position/advocacy paper whose central claim is a call for a paradigm shift toward causal reasoning in software engineering tools. It contains no equations, derivations, fitted parameters, models, or technical constructions whose correctness could reduce to self-definition, self-citation, or renaming of inputs. The abstract and described content state that data-driven methods fall short and advocate causation without any load-bearing formal step that could be circular by construction. No self-citation chains or ansatzes are invoked to justify a result.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Sebastian Baltes, Timo Speith, Brenda Chiteri, Seyedmoein Mohs enimofidi, Shalini Chakraborty, and Daniel Buschek
-
[2]
IEEE Transactions on Software Engineering (2026), 1–18
On the Need to Rethink Trust in AI Assistants for Software Dev elopment: A Critical Review. IEEE Transactions on Software Engineering (2026), 1–18. https://doi.org/10.1109/TSE.2026.3659804
-
[3]
Sándor Battaglini-Fischer, Nishanthi Srinivasan, Bálint László Szar vas, Xiaoyu Chu, and Alexandru Iosup. 2025. FAILS: A Framework for Automated Collection and Analysis of LLM Service Incidents. In Companion of the 16th ACM/SPEC In- ternational Conference on Performance Engineering (ICPE ’25). ACM, 187–194. https://doi.org/10.1145/3680256.3721320
-
[4]
Zhang, Max Hort, Mark Harman, and Federica Sa rro
Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, and Federica Sa rro. 2024. Fairness Testing: A Compre- hensive Survey and Analysis of Trends. ACM Trans. Softw. Eng. Methodol. 33, 5, Article 137 (June 2024), 59 pages. https://doi.org/10.1145/3652155
-
[5]
Andrew G. Clark, Michael Foster, Neil Walkinshaw, and Robert M. Hierons. 2023. Metamorphic Testing with Causal Graphs. In 2023 IEEE Conference on Software Testing, Verification and V alidation (ICST) . 153–164. https://doi.org/10.1109/ICST57152.2023.00023
-
[6]
Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch. 2023. CNCF Annual Survey 2023. Tech- nical Report. Linux Foundation Research. https://www.cncf.io/rep orts/cncf-annual-survey-2023/ Accessed: 2026-03- 16
2023
-
[7]
Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch
-
[8]
Technical Report
CNCF Annual Survey 2024 . Technical Report. Linux Foundation Research. https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf Accessed: 2026-03-16. Report presenting results of the 2024 CNCF Annual Survey
2024
-
[9]
Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild
Michael Foster, Robert M. Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild. 2025. Using causal infer- ence to test systems with hidden and interacting variables: an evaluat ive case study. In 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025). ACM. https://doi.org/10.1145/3756681.3756967
-
[10]
Luca Giamattei, Antonio Guerriero, Ivano Malavolta, Cristian Mas cia, Roberto Pietrantuono, and Stefano Russo. 2024. Identifying Performance Issues in Microservice Architectures throu gh Causal Reasoning. In 2024 IEEE/ACM Interna- tional Conference on Automation of Software Test (AST) . 149–153
2024
-
[11]
Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Stef ano Russo. 2024. Causality-driven Testing of Autonomous Driving Systems. ACM Transactions on Software Engineering and Methodology 33, 3, Article 74 (2024). https://doi.org/10.1145/3635709
-
[12]
Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Ste fano Russo. 2025. Causal reasoning in Software Quality Assurance: A systematic review. Information and Software Technology 178 (2025), 107599. https://doi.org/10.1016/j.infsof.2024.107599
-
[13]
Richard Hahn, and Huan Liu
Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2 020. A Survey of Learning Causality with Data: Problems and Methods. Comput. Surveys 53, 4, Article 75 (2020), 37 pages. https://doi.org/10.1145/ 3397269
2020
-
[14]
Oumayma Hamdi, Ali Ouni, Eman Abdullah AlOmar, Mel Ó Cinnéide, a nd Mohamed Wiem Mkaouer
-
[15]
An Empirical Study on the Impact of Refactoring on Quality Me trics in Android Applications. In 2021 IEEE/ACM 8th International Conference on Mobile Softw are Engineering and Systems (MobileSoft) . 28–39. https://doi.org/10.1109/MobileSoft52590.2021.00010
-
[16]
Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Syste matic Literature Review. ACM Trans. Softw. Eng. Methodol. 33, 8, Article 220 (Dec. 2024), 79 pages. https://doi.org/10. 1145/3695988
2024
-
[17]
Yamin Hu, Wenjian Luo, and Zongyao Hu. 2023. A practical approa ch to explaining defect proneness of code commits by causal discovery. Engineering Applications of Artificial Intelligence 123 (2023), 106187. https://doi.org/10.1016/j.engappai.2023.106187
-
[18]
Kaicheng Huang, Fanyu Wang, Yutan Huang, and Chetan Arora. 2025 . Prompt Engineering for Requirements En- gineering: A Literature Review and Roadmap . In 33rd International Requirements Engineering Conference W orkshops. IEEE, Los Alamitos, CA, USA, 548–557. https://doi.org/10.11 09/REW66121.2025.00081
arXiv 2025
-
[19]
Jeremy Hulse, Nasir U. Eisty, and Tim Menzies. 2025. Shaky stru ctures: The wobbly world of causal graphs in software analytics. Empirical Software Engineering 30, 5 (2025), 142. https://doi.org/10.1007/s10664-025-10 690-6
-
[20]
Guido W. Imbens. 2020. Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empir- ical Practice in Economics. Journal of Economic Literature 58, 4 (2020), 1129–79. https://doi.org/10.1257/jel.20191 597
-
[21]
Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2025 . A Survey on Large Language Models for Code Generation. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3747588
-
[22]
Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: understanding defects’ root causes. In 2020 ACM/IEEE 42nd International Conference on Software En gineering (ICSE) (Seoul, South Korea). ACM, 87–99. https://doi.org/10.1145/3377811.3380377 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . Advancing SE through the ...
-
[23]
Palacio, Yixuan Zhang, and Denys Posh yvanyk
Dipin Khati, Yijin Liu, David N. Palacio, Yixuan Zhang, and Denys Posh yvanyk. 2025. Mapping the Trust Terrain: LLMs in Software Engineering - Insights and Perspectives. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3771282
-
[24]
Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. 2025. C ausal Reasoning and Large Language Models: Opening a New Frontier for Causality. Transactions on Machine Learning Research (2025), 2835–8856. https://openreview.net/forum?id=mqoxLkX210
2025
-
[25]
Gang Li and Honghua Dai. 2004. What will affect software reuse: A causal model analysis. International Journal of Software Engineering and Knowledge Engineering 14, 03 (2004), 351–364. https://doi.org/10.1142/S0218194 00400166X
-
[26]
Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, and Kun Zhang. 2024. Discovery of the Hidden World with Large Language Models. In38th Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=w50ICQC6QJ
2024
-
[27]
Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zho u, Fuxiao Liu, Tianrui Guan, Hao- liang Wang, Tong Yu, Julian McAuley, Wei Ai, and Furong Huang. 2025. Lar ge Language Models and Causal Inference in Collaboration: A Comprehensive Survey. In Findings of the Association for Computational Linguis- tics: NAACL 2025 , Luis Chiruzzo, Alan R...
-
[28]
Yacine Majdoub and Eya Ben Charrada. 2024. Debugging with Open- Source Large Language Models: An Evalu- ation. In Proceedings of the 18th ACM/IEEE International Symposium o n Empirical Software Engineering and Mea- surement (Barcelona, Spain) (ESEM ’24). Association for Computing Machinery, New York, NY, USA, 510–516. https://doi.org/10.1145/3674805.3690758
-
[29]
Silverio Martínez-Fernández, Justus Bogner, Xavier Franch, Mar c Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. 2022. Software Engineering for AI-Base d Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 37e (April 2022), 59 pages. https://doi.org/10. 1145/3487043
2022
-
[30]
Cristian Mascia, Antonio Guerriero, Luca Giamattei, Roberto Piet rantuono, and Stefano Russo. 2025. Mi- croservices Performance Testing with Causality-enhanced Large Lang uage Models. In 2025 IEEE/ACM Second International Conference on AI Foundation Models an d Software Engineering (Forge) . 136–140. https://doi.org/10.1109/Forge66646.2025.00022
-
[31]
Bhuwan Paudel, Javier Gonzalez-Huerta, and Ehsan Zabarda st. 2026. Temporal Evolution of Architectural Complexity and Technical Debt in Microservices: An Exploratory Case Study. In Product-Focused Software Process Improvement , Giuseppe Scanniello, Valentina Lenarduzzi, Simone Romano, Sira Vegas, and Rita Francese (Eds.). Springer Nature Switzerland, Cham...
2026
-
[32]
Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect (1st ed.). Basic Books, Inc., USA
2018
-
[33]
Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki Va n Stein, and Thomas Bäck. 2025. Multi-Step Reasoning with Large Language Models, a Survey. ACM Comput. Surv. 58, 6, Article 160 (Dec. 2025), 35 pages. https://doi.org/10.1145/3774896
-
[34]
Advait Sarkar. 2024. AI Should Challenge, Not Obey. Commun. ACM 67, 10 (Sept. 2024), 18–21. https://doi.org/10.1145/3649404
-
[35]
Julien Siebert. 2023. Applications of Statistical Causal Infer ence in Software Engineering. Information and Software Technology 159, C (2023), 16 pages. https://doi.org/10.1016/j.infsof.2 023.107198
-
[36]
Alex Singla, Alexander Sukharevsky, Lareina Yee, and Michael C hui. 2024. The State of AI in Early 2024: Gen AI Adoption Spikes and Starts to Generate Value. McKinsey & Company. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024 Accessed: 2026-03-16
2024
-
[37]
Jiessie Tie, Bingsheng Yao, Tianshi Li, Hongbo Fang, Syed Ishtiaque A hmed, Dakuo Wang, and Shurui Zhou. 2026. "Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering. ACM Trans. Softw. Eng. Methodol.(March 2026). https://doi.org/10.1145/3801972
-
[38]
Guangya Wan, Yunsheng Lu, Yuqi Wu, Mengxuan Hu, and Sheng Li. 2025. Large language models for causal discovery: current landscape and future directions. In 34th International Joint Conference on Artificial Intellig ence (Montreal, Canada) (IJCAI ’25). Article 1186, 9 pages. https://doi.org/10.24963/ijcai.2025 /1186
-
[39]
Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing W ang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering 50, 4 (2024), 911–
2024
-
[40]
https://doi.org/10.1109/TSE.2024.3368208
-
[41]
Lei Wang, Shanshan Huang, Shu Wang, Jun Liao, Tingpeng Li, and Li Liu. 2024. A survey of causal dis- covery based on functional causal model. Engineering Applications of Artificial Intelligence 133 (2024), 108258. https://doi.org/10.1016/j.engappai.2024.108258
-
[42]
Simin Wang, Liguo Huang, Amiao Gao, Jidong Ge, Tengfei Zhang, Haitao Feng, Ishna Satyarth, Ming Li, He Zhang, and Vincent Ng. 2023. Machine/Deep Learning for Software Engineering: A Sys tematic Literature Review. IEEE ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . 14 Pietrantuono et al. Transactions on Software Engineering 49, 3 (202...
-
[43]
Li Wu, Johan Tordsson, Erik Elmroth, and Odej Kao. 2021. Cau sal Inference Techniques for Microservice Performance Diagnosis: Evaluation and Guiding Recommendations. In IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 21–30. https://doi.org/10.1109/ACSOS52086.2021. 00029
-
[44]
Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Y ang, and Zhenyu Chen. 2026. A System- atic Literature Review on Large Language Models for Automated Pr ogram Repair. ACM Trans. Softw. Eng. Methodol. (March 2026). https://doi.org/10.1145/3799693 Received June 29, 2026 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.