pith. sign in

arxiv: 2606.27960 · v1 · pith:RUAXZZNXnew · submitted 2026-06-26 · 💻 cs.SE · cs.AI

Reasoning Beyond Prediction: From Data-Driven to Causal Software Engineering

Pith reviewed 2026-06-29 03:57 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords causal reasoningsoftware engineeringdeep learninghuman-machine cooperationintelligent supportcausal inference
0
0 comments X

The pith

Software engineering needs machines that amplify causal reasoning rather than only making data-driven predictions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that deep learning tools, while useful for automation and pattern-based predictions, cannot meet the demands of increasingly complex software development involving interdependent tasks across AI products, distributed systems, and cyber-physical environments. It calls for a shift to a human-machine cooperation model in which machines help engineers reason about causes and effects to provide more intelligent support. A sympathetic reader would care because current methods leave a gap in handling the creative and quality-assurance aspects of modern software work, where correlations alone prove insufficient.

Core claim

The authors call for a new paradigm of human-machine cooperation in software engineering where machines actively amplify engineers' reasoning through the lens of causation, rather than limiting themselves to automating routine tasks or predicting from learned patterns.

What carries the argument

The lens of causation, which shifts machine support from pattern prediction to helping engineers analyze cause-effect relationships in interdependent development tasks.

If this is right

  • Tools could move beyond routine automation to support creative decision-making in complex system design.
  • Human-machine cooperation would incorporate explicit cause-effect analysis for quality assurance and architecture decisions.
  • Support systems would address the limitations of correlation-based methods in pervasively distributed and embedded environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adapting causal inference methods from statistics could be tested first on requirements traceability or change impact analysis.
  • Future developer assistants might combine causal graphs with existing code models to flag potential side effects of modifications.
  • This shift could influence how AI is integrated into cyber-physical system development where physical outcomes depend on causal chains.

Load-bearing premise

Current data-driven deep learning methods are insufficient for modern software development demands and causal reasoning approaches can supply the missing intelligent support.

What would settle it

A study that measures whether introducing causal analysis tools improves engineer performance on tasks such as debugging interdependent components or assuring quality in distributed systems, compared against existing deep learning tools.

Figures

Figures reproduced from arXiv: 2606.27960 by Luca Giamattei, Roberto Pietrantuono, Stefano Russo.

Figure 1
Figure 1. Figure 1: The CSE conceptual framework Although there are many ways to instantiate a CSE solu￾tion for a specific problem, the idea is thatcausal modelsshould be at the core: they must be used to drive and check the reason￾ing process to solve the task rig￾orously and transparently. Rig￾orously means capable of quan￾tifying the causal effects along the whole reasoning chain and harnessing such estimates to give the … view at source ↗
Figure 2
Figure 2. Figure 2: Causal methods in so ware engineering from the inter [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
read the original abstract

Software engineering is an intellectually demanding, creative discipline that juggles a web of interdependent tasks to design, build, and assure the quality of increasingly complex systems. As our expectations from software soar - with demands spanning AI-driven products, pervasively distributed and cloud-native architectures, and deeply embedded cyber-physical environments - its complexity steadily increases. In response, a new wave of co-engineering methods and tools, fueled by deep learning, has emerged to augment the process, enhancing automation and decision support. Yet, these advances remain far from delivering the kind of intelligent support that modern software development demands. We call for a new paradigm of human-machine cooperation: one where machines don't just automate routine tasks or predict from learned patterns, but actively amplify engineers' reasoning through the lens of causation. As software becomes smarter, a smarter support is needed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript is a position paper that argues current data-driven deep learning methods for augmenting software engineering tasks fall short of providing the intelligent support needed for increasingly complex systems, and calls for a paradigm shift toward causal reasoning to actively amplify engineers' reasoning in human-machine cooperation.

Significance. If developed with concrete methods and evidence, the advocated shift from prediction to causation could stimulate new research directions in AI-assisted software engineering; however, the manuscript provides no technical constructions, examples, or comparisons, limiting its immediate contribution.

major comments (1)
  1. [Abstract] Abstract: the central motivation that deep-learning advances 'remain far from delivering the kind of intelligent support that modern software development demands' is stated without any supporting examples, empirical comparisons, cited limitations, or references, which is load-bearing for the call to a new causal paradigm.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our position paper. We address the single major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central motivation that deep-learning advances 'remain far from delivering the kind of intelligent support that modern software development demands' is stated without any supporting examples, empirical comparisons, cited limitations, or references, which is load-bearing for the call to a new causal paradigm.

    Authors: We agree that the abstract's motivational claim would be strengthened by explicit support. Although position papers often open with a synthesized view of field limitations, the referee is correct that this statement is load-bearing. In revision we will augment the abstract with one or two concise examples (e.g., brittleness of DL-based code completion under novel requirements) together with citations to recent surveys documenting generalization and explainability shortfalls in data-driven SE tools. This will better anchor the subsequent call for causal reasoning without altering the position-paper character of the work. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a position/advocacy paper whose central claim is a call for a paradigm shift toward causal reasoning in software engineering tools. It contains no equations, derivations, fitted parameters, models, or technical constructions whose correctness could reduce to self-definition, self-citation, or renaming of inputs. The abstract and described content state that data-driven methods fall short and advocate causation without any load-bearing formal step that could be circular by construction. No self-citation chains or ansatzes are invoked to justify a result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is a position statement with no technical derivations, data fits, or formal structures; it rests on the premise that causal methods are superior without introducing countable free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5668 in / 957 out tokens · 62044 ms · 2026-06-29T03:57:23.971391+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 28 canonical work pages

  1. [1]

    Sebastian Baltes, Timo Speith, Brenda Chiteri, Seyedmoein Mohs enimofidi, Shalini Chakraborty, and Daniel Buschek

  2. [2]

    IEEE Transactions on Software Engineering (2026), 1–18

    On the Need to Rethink Trust in AI Assistants for Software Dev elopment: A Critical Review. IEEE Transactions on Software Engineering (2026), 1–18. https://doi.org/10.1109/TSE.2026.3659804

  3. [3]

    Sándor Battaglini-Fischer, Nishanthi Srinivasan, Bálint László Szar vas, Xiaoyu Chu, and Alexandru Iosup. 2025. FAILS: A Framework for Automated Collection and Analysis of LLM Service Incidents. In Companion of the 16th ACM/SPEC In- ternational Conference on Performance Engineering (ICPE ’25). ACM, 187–194. https://doi.org/10.1145/3680256.3721320

  4. [4]

    Zhang, Max Hort, Mark Harman, and Federica Sa rro

    Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, and Federica Sa rro. 2024. Fairness Testing: A Compre- hensive Survey and Analysis of Trends. ACM Trans. Softw. Eng. Methodol. 33, 5, Article 137 (June 2024), 59 pages. https://doi.org/10.1145/3652155

  5. [5]

    Wuersching, D

    Andrew G. Clark, Michael Foster, Neil Walkinshaw, and Robert M. Hierons. 2023. Metamorphic Testing with Causal Graphs. In 2023 IEEE Conference on Software Testing, Verification and V alidation (ICST) . 153–164. https://doi.org/10.1109/ICST57152.2023.00023

  6. [6]

    Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch. 2023. CNCF Annual Survey 2023. Tech- nical Report. Linux Foundation Research. https://www.cncf.io/rep orts/cncf-annual-survey-2023/ Accessed: 2026-03- 16

  7. [7]

    Cloud Native Computing Foundation (CNCF) and Linux Foundation Rese arch

  8. [8]

    Technical Report

    CNCF Annual Survey 2024 . Technical Report. Linux Foundation Research. https://www.cncf.io/wp-content/uploads/2025/04/cncf_annual_survey24_031225a.pdf Accessed: 2026-03-16. Report presenting results of the 2024 CNCF Annual Survey

  9. [9]

    Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild

    Michael Foster, Robert M. Hierons, Donghwan Shin, Neil Walkinshaw, and Christopher Wild. 2025. Using causal infer- ence to test systems with hidden and interacting variables: an evaluat ive case study. In 29th International Conference on Evaluation and Assessment in Software Engineering (EASE 2025). ACM. https://doi.org/10.1145/3756681.3756967

  10. [10]

    Luca Giamattei, Antonio Guerriero, Ivano Malavolta, Cristian Mas cia, Roberto Pietrantuono, and Stefano Russo. 2024. Identifying Performance Issues in Microservice Architectures throu gh Causal Reasoning. In 2024 IEEE/ACM Interna- tional Conference on Automation of Software Test (AST) . 149–153

  11. [11]

    Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Stef ano Russo. 2024. Causality-driven Testing of Autonomous Driving Systems. ACM Transactions on Software Engineering and Methodology 33, 3, Article 74 (2024). https://doi.org/10.1145/3635709

  12. [12]

    Luca Giamattei, Antonio Guerriero, Roberto Pietrantuono, and Ste fano Russo. 2025. Causal reasoning in Software Quality Assurance: A systematic review. Information and Software Technology 178 (2025), 107599. https://doi.org/10.1016/j.infsof.2024.107599

  13. [13]

    Richard Hahn, and Huan Liu

    Ruocheng Guo, Lu Cheng, Jundong Li, P. Richard Hahn, and Huan Liu. 2 020. A Survey of Learning Causality with Data: Problems and Methods. Comput. Surveys 53, 4, Article 75 (2020), 37 pages. https://doi.org/10.1145/ 3397269

  14. [14]

    Oumayma Hamdi, Ali Ouni, Eman Abdullah AlOmar, Mel Ó Cinnéide, a nd Mohamed Wiem Mkaouer

  15. [15]

    In 2021 IEEE/ACM 8th International Conference on Mobile Softw are Engineering and Systems (MobileSoft)

    An Empirical Study on the Impact of Refactoring on Quality Me trics in Android Applications. In 2021 IEEE/ACM 8th International Conference on Mobile Softw are Engineering and Systems (MobileSoft) . 28–39. https://doi.org/10.1109/MobileSoft52590.2021.00010

  16. [16]

    Xinyi Hou, Yanjie Zhao, Yue Liu, Zhou Yang, Kailong Wang, Li Li, Xiapu Luo, David Lo, John Grundy, and Haoyu Wang. 2024. Large Language Models for Software Engineering: A Syste matic Literature Review. ACM Trans. Softw. Eng. Methodol. 33, 8, Article 220 (Dec. 2024), 79 pages. https://doi.org/10. 1145/3695988

  17. [17]

    Yamin Hu, Wenjian Luo, and Zongyao Hu. 2023. A practical approa ch to explaining defect proneness of code commits by causal discovery. Engineering Applications of Artificial Intelligence 123 (2023), 106187. https://doi.org/10.1016/j.engappai.2023.106187

  18. [18]

    Kaicheng Huang, Fanyu Wang, Yutan Huang, and Chetan Arora. 2025 . Prompt Engineering for Requirements En- gineering: A Literature Review and Roadmap . In 33rd International Requirements Engineering Conference W orkshops. IEEE, Los Alamitos, CA, USA, 548–557. https://doi.org/10.11 09/REW66121.2025.00081

  19. [19]

    Eisty, and Tim Menzies

    Jeremy Hulse, Nasir U. Eisty, and Tim Menzies. 2025. Shaky stru ctures: The wobbly world of causal graphs in software analytics. Empirical Software Engineering 30, 5 (2025), 142. https://doi.org/10.1007/s10664-025-10 690-6

  20. [20]

    Guido W. Imbens. 2020. Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empir- ical Practice in Economics. Journal of Economic Literature 58, 4 (2020), 1129–79. https://doi.org/10.1257/jel.20191 597

  21. [21]

    Juyong Jiang, Fan Wang, Jiasi Shen, Sungju Kim, and Sunghun Kim. 2025 . A Survey on Large Language Models for Code Generation. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3747588

  22. [22]

    Brittany Johnson, Yuriy Brun, and Alexandra Meliou. 2020. Causal testing: understanding defects’ root causes. In 2020 ACM/IEEE 42nd International Conference on Software En gineering (ICSE) (Seoul, South Korea). ACM, 87–99. https://doi.org/10.1145/3377811.3380377 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . Advancing SE through the ...

  23. [23]

    Palacio, Yixuan Zhang, and Denys Posh yvanyk

    Dipin Khati, Yijin Liu, David N. Palacio, Yixuan Zhang, and Denys Posh yvanyk. 2025. Mapping the Trust Terrain: LLMs in Software Engineering - Insights and Perspectives. ACM Trans. Softw. Eng. Methodol. (2025). https://doi.org/10.1145/3771282

  24. [24]

    Emre Kiciman, Robert Ness, Amit Sharma, and Chenhao Tan. 2025. C ausal Reasoning and Large Language Models: Opening a New Frontier for Causality. Transactions on Machine Learning Research (2025), 2835–8856. https://openreview.net/forum?id=mqoxLkX210

  25. [25]

    Gang Li and Honghua Dai. 2004. What will affect software reuse: A causal model analysis. International Journal of Software Engineering and Knowledge Engineering 14, 03 (2004), 351–364. https://doi.org/10.1142/S0218194 00400166X

  26. [26]

    Chenxi Liu, Yongqiang Chen, Tongliang Liu, Mingming Gong, James Cheng, Bo Han, and Kun Zhang. 2024. Discovery of the Hidden World with Large Language Models. In38th Annual Conference on Neural Information Processing Systems. https://openreview.net/forum?id=w50ICQC6QJ

  27. [27]

    Xiaoyu Liu, Paiheng Xu, Junda Wu, Jiaxin Yuan, Yifan Yang, Yuhang Zho u, Fuxiao Liu, Tianrui Guan, Hao- liang Wang, Tong Yu, Julian McAuley, Wei Ai, and Furong Huang. 2025. Lar ge Language Models and Causal Inference in Collaboration: A Comprehensive Survey. In Findings of the Association for Computational Linguis- tics: NAACL 2025 , Luis Chiruzzo, Alan R...

  28. [28]

    Yacine Majdoub and Eya Ben Charrada. 2024. Debugging with Open- Source Large Language Models: An Evalu- ation. In Proceedings of the 18th ACM/IEEE International Symposium o n Empirical Software Engineering and Mea- surement (Barcelona, Spain) (ESEM ’24). Association for Computing Machinery, New York, NY, USA, 510–516. https://doi.org/10.1145/3674805.3690758

  29. [29]

    Silverio Martínez-Fernández, Justus Bogner, Xavier Franch, Mar c Oriol, Julien Siebert, Adam Trendowicz, Anna Maria Vollmer, and Stefan Wagner. 2022. Software Engineering for AI-Base d Systems: A Survey. ACM Trans. Softw. Eng. Methodol. 31, 2, Article 37e (April 2022), 59 pages. https://doi.org/10. 1145/3487043

  30. [30]

    Cristian Mascia, Antonio Guerriero, Luca Giamattei, Roberto Piet rantuono, and Stefano Russo. 2025. Mi- croservices Performance Testing with Causality-enhanced Large Lang uage Models. In 2025 IEEE/ACM Second International Conference on AI Foundation Models an d Software Engineering (Forge) . 136–140. https://doi.org/10.1109/Forge66646.2025.00022

  31. [31]

    Bhuwan Paudel, Javier Gonzalez-Huerta, and Ehsan Zabarda st. 2026. Temporal Evolution of Architectural Complexity and Technical Debt in Microservices: An Exploratory Case Study. In Product-Focused Software Process Improvement , Giuseppe Scanniello, Valentina Lenarduzzi, Simone Romano, Sira Vegas, and Rita Francese (Eds.). Springer Nature Switzerland, Cham...

  32. [32]

    Judea Pearl and Dana Mackenzie. 2018. The Book of Why: The New Science of Cause and Effect (1st ed.). Basic Books, Inc., USA

  33. [33]

    Aske Plaat, Annie Wong, Suzan Verberne, Joost Broekens, Niki Va n Stein, and Thomas Bäck. 2025. Multi-Step Reasoning with Large Language Models, a Survey. ACM Comput. Surv. 58, 6, Article 160 (Dec. 2025), 35 pages. https://doi.org/10.1145/3774896

  34. [34]

    Advait Sarkar. 2024. AI Should Challenge, Not Obey. Commun. ACM 67, 10 (Sept. 2024), 18–21. https://doi.org/10.1145/3649404

  35. [35]

    Julien Siebert. 2023. Applications of Statistical Causal Infer ence in Software Engineering. Information and Software Technology 159, C (2023), 16 pages. https://doi.org/10.1016/j.infsof.2 023.107198

  36. [36]

    Alex Singla, Alexander Sukharevsky, Lareina Yee, and Michael C hui. 2024. The State of AI in Early 2024: Gen AI Adoption Spikes and Starts to Generate Value. McKinsey & Company. https://www.mckinsey.com/capabilities/quantumblack/our-insights/the-state-of-ai-2024 Accessed: 2026-03-16

  37. [37]

    Should I Give Up Now?

    Jiessie Tie, Bingsheng Yao, Tianshi Li, Hongbo Fang, Syed Ishtiaque A hmed, Dakuo Wang, and Shurui Zhou. 2026. "Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering. ACM Trans. Softw. Eng. Methodol.(March 2026). https://doi.org/10.1145/3801972

  38. [38]

    Guangya Wan, Yunsheng Lu, Yuqi Wu, Mengxuan Hu, and Sheng Li. 2025. Large language models for causal discovery: current landscape and future directions. In 34th International Joint Conference on Artificial Intellig ence (Montreal, Canada) (IJCAI ’25). Article 1186, 9 pages. https://doi.org/10.24963/ijcai.2025 /1186

  39. [39]

    Junjie Wang, Yuchao Huang, Chunyang Chen, Zhe Liu, Song Wang, and Qing W ang. 2024. Software Testing With Large Language Models: Survey, Landscape, and Vision. IEEE Transactions on Software Engineering 50, 4 (2024), 911–

  40. [40]

    https://doi.org/10.1109/TSE.2024.3368208

  41. [41]

    Lei Wang, Shanshan Huang, Shu Wang, Jun Liao, Tingpeng Li, and Li Liu. 2024. A survey of causal dis- covery based on functional causal model. Engineering Applications of Artificial Intelligence 133 (2024), 108258. https://doi.org/10.1016/j.engappai.2024.108258

  42. [42]

    Simin Wang, Liguo Huang, Amiao Gao, Jidong Ge, Tengfei Zhang, Haitao Feng, Ishna Satyarth, Ming Li, He Zhang, and Vincent Ng. 2023. Machine/Deep Learning for Software Engineering: A Sys tematic Literature Review. IEEE ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026 . 14 Pietrantuono et al. Transactions on Software Engineering 49, 3 (202...

  43. [43]

    Li Wu, Johan Tordsson, Erik Elmroth, and Odej Kao. 2021. Cau sal Inference Techniques for Microservice Performance Diagnosis: Evaluation and Guiding Recommendations. In IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS). IEEE, 21–30. https://doi.org/10.1109/ACSOS52086.2021. 00029

  44. [44]

    Quanjun Zhang, Chunrong Fang, Yang Xie, Yuxiang Ma, Weisong Sun, Yun Y ang, and Zhenyu Chen. 2026. A System- atic Literature Review on Large Language Models for Automated Pr ogram Repair. ACM Trans. Softw. Eng. Methodol. (March 2026). https://doi.org/10.1145/3799693 Received June 29, 2026 ACM Accepted, Vol. 1, No. 1, Article . Publication date: June 2026