LLM-Driven Cost-Effective Requirements Change Impact Analysis
Pith reviewed 2026-05-18 01:54 UTC · model grok-4.3
The pith
ProReFiCIA uses LLMs with tailored prompts to identify requirements impacted by changes at 85.7% recall while limiting engineer review to 3% of the full set.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ProReFiCIA is an LLM-driven approach for automatically identifying the impacted requirements when changes occur. Using the best combination of an LLM and a prompt variant, ProReFiCIA achieves a recall of 85.7% on an unseen industrial dataset, demonstrating its effectiveness in identifying impacted requirements. Further, the cost of applying ProReFiCIA remains small, as the engineer only needs to review the predicted impacted requirements, which represent 3.0% of the entire set of requirements. Lastly, incorporating domain knowledge into the model via RAG increases recall to 95.7% while slightly raising the cost to only 3.6%.
What carries the argument
ProReFiCIA, an LLM-based system that processes requirement changes through selected models and prompt variants, optionally augmented by retrieval-augmented generation to inject domain knowledge.
If this is right
- Requirements engineers can shift from scanning entire documents to reviewing only a few percent of them when a change arises.
- Fewer impacted requirements are likely to be overlooked, reducing the chance of downstream defects or rework.
- The method stays affordable even when domain knowledge is added through retrieval, keeping added cost under 4% review effort.
- Prompt and model selection can be tuned per project to balance recall against the volume of items needing human check.
Where Pith is reading between the lines
- The same prompting and retrieval pattern could be adapted to trace changes across related artifacts such as test cases or design documents.
- Integration into commercial requirements tools might allow live impact alerts as engineers edit specifications.
- Open questions remain about how performance holds when the requirements language or domain differs sharply from the tested industrial set.
Load-bearing premise
The chosen industrial dataset represents typical real-world requirements and that the ground-truth labels for impacted requirements were assigned without bias or missing cases.
What would settle it
Running ProReFiCIA on a fresh industrial requirements set from another company or domain whose impacted requirements have been independently and completely labeled by multiple experts.
Figures
read the original abstract
Requirements are inherently subject to changes throughout the software development lifecycle. Within the limited budget available to requirements engineers, manually identifying the impact of such changes on other requirements is both error-prone and effort-intensive. That might lead to overlooked impacted requirements, which, if not properly managed, can cause serious issues in the downstream tasks. Inspired by the growing potential of large language models (LLMs) across diverse domains, we propose ProReFiCIA, an LLM-driven approach for automatically identifying the impacted requirements when changes occur. We conduct an extensive evaluation of ProReFiCIA using several LLMs and prompts variants tailored to this task. Using the best combination of an LLM and a prompt variant, ProReFiCIA achieves a recall of 85.7% on an unseen industrial dataset, demonstrating its effectiveness in identifying impacted requirements. Further, the cost of applying ProReFiCIA remains small, as the engineer only needs to review the predicted impacted requirements, which represent 3.0% of the entire set of requirements. Lastly, incorporating domain knowledge into the model via RAG increases recall to 95.7% while slightly raising the cost to only 3.6%.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes ProReFiCIA, an LLM-driven approach for automatically identifying requirements impacted by changes in the software development lifecycle. It evaluates multiple LLMs and prompt variants on an unseen industrial dataset, reporting that the best combination achieves 85.7% recall while requiring engineers to review only 3.0% of requirements; adding RAG for domain knowledge raises recall to 95.7% at 3.6% review cost.
Significance. If the evaluation protocol and ground-truth construction can be shown to be robust, the work offers a potentially practical, low-effort automation aid for a well-known pain point in requirements engineering. The focus on cost (review burden) rather than raw accuracy and the use of real industrial data are strengths that could translate to adoption if reproducibility and generalizability are addressed.
major comments (2)
- [Evaluation] Evaluation section: the central claim of 85.7% recall (and 95.7% with RAG) on the industrial dataset rests entirely on the correctness and completeness of the provided ground-truth impacted requirements. The manuscript supplies no description of how these labels were produced (single annotator vs. multiple, author-provided vs. independent, any inter-annotator agreement metric, or external validation), so it is impossible to determine whether the reported figures reflect true effectiveness or labeling artifacts.
- [Experimental setup] Experimental setup / results: the paper compares only among LLM-prompt combinations and does not report any non-LLM baseline (e.g., simple text-similarity, dependency-graph, or rule-based change-impact methods common in the RE literature). Without such controls it is difficult to isolate the contribution of the LLM component to the observed recall.
minor comments (2)
- [Abstract] Abstract: the acronym 'ProReFiCIA' is introduced without expansion; a parenthetical definition on first use would improve readability.
- The paper would benefit from an appendix or supplementary material containing the exact prompt templates and any RAG retrieval details to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate the revisions we will make to improve the paper.
read point-by-point responses
-
Referee: [Evaluation] Evaluation section: the central claim of 85.7% recall (and 95.7% with RAG) on the industrial dataset rests entirely on the correctness and completeness of the provided ground-truth impacted requirements. The manuscript supplies no description of how these labels were produced (single annotator vs. multiple, author-provided vs. independent, any inter-annotator agreement metric, or external validation), so it is impossible to determine whether the reported figures reflect true effectiveness or labeling artifacts.
Authors: We agree that the absence of a description of the ground-truth construction process is a significant omission that affects the interpretability of our results. In the revised manuscript, we will add a dedicated subsection under Evaluation that details the labeling procedure. This will specify that the ground-truth impacted requirements were identified by requirements engineers at the industrial partner using their established change-impact analysis process, with review by at least two experts for each change to mitigate individual bias. We will also report any available quality controls, such as cross-checks against historical project data. revision: yes
-
Referee: [Experimental setup] Experimental setup / results: the paper compares only among LLM-prompt combinations and does not report any non-LLM baseline (e.g., simple text-similarity, dependency-graph, or rule-based change-impact methods common in the RE literature). Without such controls it is difficult to isolate the contribution of the LLM component to the observed recall.
Authors: We acknowledge that the lack of traditional baselines makes it harder to quantify the specific advantage of the LLM component. Although our study focused on optimizing LLM-prompt combinations for this task, we agree a baseline comparison would strengthen the claims. In the revised version, we will add results from a simple TF-IDF cosine similarity baseline applied to the same industrial dataset, allowing direct comparison of recall and review cost against the LLM-based approach. revision: yes
Circularity Check
No circularity: empirical evaluation on external data
full rationale
The paper proposes ProReFiCIA as an LLM-driven method for requirements change impact analysis and reports performance via direct empirical testing on an unseen industrial dataset. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The 85.7% recall and related metrics are obtained from external evaluation rather than reducing to self-definitional inputs or prior author work by construction. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can reason about semantic relationships between requirements based on textual descriptions
Reference graph
Works this paper leans on
-
[1]
Muhammad Abbas, Alessio Ferrari, Anas Shatnawi, Eduard Enoiu, Mehrdad Saadatmand, and Daniel Sundmark
-
[2]
Requirements Engineering28, 1 (2023), 23–47
On the relationship between similar requirements and similar software: A case study in the railway domain. Requirements Engineering28, 1 (2023), 23–47
work page 2023
-
[3]
Sallam Abualhaija, Marcello Ceci, Nicolas Sannier, Domenico Bianculli, Dirk Zetzsche, and Marco Bodellini. 2024. Toward automated change impact analysis of financial regulations. InProceedings of the 1st IEEE/ACM Workshop on Software Engineering Challenges in Financial Firms. 31–32
work page 2024
-
[4]
Hasan Alkaf, Jameleddine Hassine, Taha Binalialhag, and Daniel Amyot. 2019. An automated change impact analysis approach for User Requirements Notation models.Journal of Systems and Software157 (2019), 110397
work page 2019
-
[5]
Ahmed Mubark Alsalemi and Eng-Thiam Yeoh. 2017. A systematic literature review of requirements volatility prediction. In2017 International Conference on Current Trends in Computer, Electrical, Electronics and Communication (CTCEEC). IEEE, 55–64
work page 2017
-
[6]
Giuliano Antoniol, Vincenzo Fabio Rollo, and Gabriele Venturi. 2005. Detecting groups of co-changing files in CVS repositories. InEighth International Workshop on Principles of Software Evolution (IWPSE’05). IEEE, 23–32
work page 2005
-
[7]
1996.Software change impact analysis
Robert S Arnold. 1996.Software change impact analysis. IEEE Computer Society Press
work page 1996
-
[8]
Robert S Arnold and Shawn A Bohner. 1993. Impact analysis-towards a framework for comparison. In1993 Conference on Software Maintenance. IEEE, 292–301
work page 1993
- [9]
-
[10]
Chetan Arora, Mehrdad Sabetzadeh, Arda Goknil, Lionel Briand, and Frank Zimmer. 2015. NARCIA: an automated tool for change impact analysis in natural language requirements. InProceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering. 962–965
work page 2015
-
[11]
Chetan Arora, Mehrdad Sabetzadeh, Arda Goknil, Lionel C. Briand, and Frank Zimmer. 2015. Change impact analysis for Natural Language requirements: An NLP approach.IEEE 23rd International Requirements Engineering Conference, RE 2015(2015), 6–15. doi:10.1109/RE.2015.7320403
-
[12]
Amir Aryani, Ian D Peake, Margaret Hamilton, Heinz Schmidt, and Michael Winikoff. 2009. Change propagation analysis using domain information. In2009 Australian Software Engineering Conference. IEEE, 34–43
work page 2009
-
[13]
Muneera Bano, Salma Imtiaz, Naveed Ikram, Mahmood Niazi, and Muhammad Usman. 2012. Causes of requirement change-a systematic literature review. (2012)
work page 2012
-
[14]
Eric PS Baumer, James Paul White, and Bill Tomlinson. 2010. Comparing semantic role labeling with typed depen- dency parsing in computational metaphor identification. InProceedings of the NAACL HLT 2010 Second Workshop on Computational Approaches to Linguistic Creativity. 14–22
work page 2010
-
[15]
Elizabeth Bjarnason, Per Runeson, Markus Borg, Michael Unterkalmsteiner, Emelie Engström, Björn Regnell, Giedre Sabaliauskaite, Annabella Loconsole, Tony Gorschek, and Robert Feldt. 2014. Challenges and practices in aligning requirements with verification and validation: a case study of six companies.Empirical software engineering19, 6 (2014), 1809–1855
work page 2014
-
[16]
Shawn A Bohner et al. 1996. Impact analysis in the software change process: a year 2000 perspective.. Inicsm, Vol. 96. 42–51
work page 1996
-
[17]
Brian J Chan, Chao-Ting Chen, Jui-Hung Cheng, and Hen-Hsen Huang. 2025. Don’t do rag: When cache-augmented generation is all you need for knowledge tasks. InCompanion Proceedings of the ACM on Web Conference 2025. 893–897
work page 2025
-
[18]
Ido Dagan, Oren Glickman, and Bernardo Magnini. 2005. The pascal recognising textual entailment challenge. In Machine learning challenges workshop. Springer, 177–190
work page 2005
-
[19]
Marie-Catherine De Marneffe and Christopher D Manning. 2008. The Stanford typed dependencies representation. In Coling 2008: proceedings of the workshop on cross-framework and cross-domain parser evaluation. 1–8
work page 2008
-
[20]
Ankit Dhamija and Sunil Sikka. 2019. A Systematic Study of Advancements in Change Impact Analysis Techniques. International Journal of Innovative Technology and Exploring Engineering (IJITEE)8, 8 (2019), 435–443. http://ieeexplore. ieee.org , Vol. 1, No. 1, Article . Publication date: November 2025. 30 Romina Etezadi, Sallam Abualhaija, Chetan Arora, and ...
work page 2019
-
[21]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models.arXiv preprint arXiv:2407.21783 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[22]
P Ebben, JW Koolwaaij, Mv Setten, and M Wibbels. 2002. Requirements for the WASP application platform.W ASP/D2 1 (2002)
work page 2002
- [23]
-
[24]
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine.Annals of statistics(2001), 1189–1232
work page 2001
-
[25]
Arda Goknil, Ivan Kurtev, and Klaas van den Berg. 2016. A rule-based change impact analysis approach in software architecture for requirements changes.arXiv preprint arXiv:1608.02757(2016)
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[26]
Arda Goknil, Ivan Kurtev, Klaas Van Den Berg, and Wietze Spijkerman. 2014. Change impact analysis for requirements: A metamodeling approach.Information and Software Technology56, 8 (2014), 950–972
work page 2014
-
[27]
Arda Goknil, Roderick van Domburg, Ivan Kurtev, Klaas van den Berg, and Fons Wijnhoven. 2014. Experimental evaluation of a tool for change impact prediction in requirements models: Design, results, and lessons learned. In2014 IEEE 4th International Model-Driven Requirements Engineering Workshop (MoDRE). IEEE, 57–66
work page 2014
-
[28]
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948(2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
Jameleddine Hassine, Juergen Rilling, Jacqueline Hewitt, and Rachida Dssouli. 2005. Change impact analysis for requirement evolution using use case maps. InEighth International Workshop on Principles of Software Evolution (IWPSE’05). IEEE, 81–90
work page 2005
-
[30]
Tobias Hey, Dominik Fuchß, Jan Keim, and Anne Koziolek. 2025. Requirements Traceability Link Recovery via Retrieval-Augmented Generation. InInternational Working Conference on Requirements Engineering: Foundation for Software Quality. Springer, 381–397
work page 2025
-
[31]
Jie Huang and Kevin Chen-Chuan Chang. 2023. Towards Reasoning in Large Language Models: A Survey. InFindings of the Association for Computational Linguistics: ACL 2023. 1049–1065
work page 2023
-
[32]
Shalinka Jayatilleke and Richard Lai. 2013. A method of specifying and classifying requirements change. In2013 22nd Australian Software Engineering Conference. IEEE, 175–180
work page 2013
-
[33]
Shalinka Jayatilleke and Richard Lai. 2018. A systematic review of requirements change management.Information and Software Technology93 (2018), 163–185
work page 2018
-
[34]
Shalinka Jayatilleke, Richard Lai, and Karl Reed. 2018. Managing software requirements changes through change specification and classification.Computer Science and Information Systems15, 2 (2018), 321–346
work page 2018
-
[35]
Shalinka Jayatilleke, Richard Lai, and Karl Reed. 2018. A method of requirements change analysis.Requirements Engineering23, 4 (2018), 493–508
work page 2018
-
[36]
Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, Lélio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, and William El Sayed. 2023. Mistral 7B. arXiv:2310.06825
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Per Jönsson and Mikael Lindvall. 2005. Impact analysis. InEngineering and managing software requirements. Springer, 117–142
work page 2005
-
[38]
Alexander Krassovitskiy, Rustam Mussabayev, and Kirill Yakunin. 2025. LLM-Enhanced Semantic Text Segmentation. Applied Sciences15, 19 (2025), 10849
work page 2025
-
[39]
Steffen Lehnert. 2011. A review of software change impact analysis. (2011)
work page 2011
-
[40]
Steffen Lehnert. 2011. A taxonomy for software change impact analysis. InProceedings of the 12th International Workshop on Principles of Software Evolution and the 7th annual ERCIM Workshop on Software Evolution. 41–50
work page 2011
-
[41]
Bixin Li, Xiaobing Sun, Hareton Leung, and Sai Zhang. 2013. A survey of code-based change impact analysis techniques. Software Testing, Verification and Reliability23, 8 (2013), 613–646
work page 2013
-
[42]
Xingfu Li, Bangchao Wang, Hongyan Wan, Yang Deng, and Zihan Wang. 2023. Applications of Machine Learning in Requirements Traceability: A Systematic Mapping Study (S).. InSEKE. 566–571
work page 2023
-
[43]
Yin Li, Juan Li, Ye Yang, and Mingshu Li. 2008. Requirement-centric traceability for change impact analysis: A case study. InInternational conference on software process. Springer, 100–111
work page 2008
-
[44]
Nelson F Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics11 (2024), 157–173
work page 2024
-
[45]
Pengfei Liu, Weizhe Yuan, Jinlan Fu, Zhengbao Jiang, Hiroaki Hayashi, and Graham Neubig. 2023. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing.ACM computing surveys55, 9 , Vol. 1, No. 1, Article . Publication date: November 2025. LLM-Driven Cost-Effective Requirements Change Impact Analysis 31 (2023), 1–35
work page 2023
-
[46]
Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Roz- ière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, et al. 2023. Augmented Language Models: a Survey.Transactions on Machine Learning Research2023 (2023)
work page 2023
-
[47]
Julia Mucha, Andreas Kaufmann, and Dirk Riehle. 2024. A systematic literature review of pre-requirements specification traceability.Requirements Engineering(2024), 1–23
work page 2024
-
[48]
Shiva Nejati, Mehrdad Sabetzadeh, Chetan Arora, Lionel C Briand, and Felix Mandoux. 2016. Automated change impact analysis between SysML models of requirements and design. InProceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 242–253
work page 2016
-
[49]
Klaus Pohl. 2016.Requirements engineering fundamentals: a study guide for the certified professional for requirements engineering exam-foundation level-IREB compliant. Rocky Nook, Inc
work page 2016
-
[50]
Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018)
work page 2018
-
[51]
Denny Sagita Rusdianto, Hadziq Fabroyir, and Umi Laili Yuhana. 2024. Innovative Approaches to Impact Analysis of Requirement Changes using LLM in Software Projects. In2024 IEEE International Symposium on Consumer Technology (ISCT). IEEE, 604–610
work page 2024
-
[52]
Gemini Team, Rohan Anil, Sebastian Borgeaud, Yonghui Wu, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalkwyk, Andrew M Dai, Anja Hauth, et al. 2023. Gemini: a family of highly capable multimodal models.arXiv preprint arXiv:2312.11805(2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[53]
Gemini Team, Petko Georgiev, Ving Ian Lei, Ryan Burnell, Libin Bai, Anmol Gulati, Garrett Tanzer, Damien Vincent, Zhufeng Pan, Shibo Wang, et al. 2024. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context.arXiv preprint arXiv:2403.05530(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[54]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.arXiv preprint arXiv:1706.03762(2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [55]
-
[56]
Andreas Vogelsang, Alexander Korn, Giovanna Broccia, Alessio Ferrari, Jannik Fischbach, and Chetan Arora. 2025. On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability. InIEEE/ACM 50th international conference on software engineering: new ideas and emerging results (ICSE-NIER)
work page 2025
-
[57]
Antje Von Knethen. 2002. Change-oriented requirements traceability. Support for evolution of embedded systems. In International Conference on Software Maintenance, 2002. Proceedings.IEEE, 482–485
work page 2002
-
[58]
Tongtong Wu, Linhao Luo, Yuan-Fang Li, Shirui Pan, Thuy-Trang Vu, and Gholamreza Haffari. 2024. Continual Learning for Large Language Models: A Survey.CoRR(2024)
work page 2024
-
[59]
Amir Reza Yazdanshenas and Leon Moonen. 2012. Fine-grained change impact analysis for component-based product families. In2012 28th IEEE International Conference on Software Maintenance (ICSM). IEEE, 119–128
work page 2012
- [60]
-
[61]
He Zhang, Juan Li, Liming Zhu, Ross Jeffery, Yan Liu, Qing Wang, and Mingshu Li. 2014. Investigating dependencies in software requirements for change propagation analysis.Information and Software Technology56, 1 (2014), 40–53. , Vol. 1, No. 1, Article . Publication date: November 2025
work page 2014
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.