TraceLLM: Leveraging Large Language Models with Prompt Engineering for Enhanced Requirements Traceability
Pith reviewed 2026-05-25 06:56 UTC · model grok-4.3
The pith
Systematic prompt engineering with LLMs produces state-of-the-art requirements traceability links on four benchmark datasets.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
TraceLLM is a systematic framework that combines rigorous dataset splitting, iterative prompt refinement, enrichment with contextual roles and domain knowledge, and evaluation across zero- and few-shot settings. When paired with label-aware diversity-based demonstration selection, the framework produces state-of-the-art F2 scores on eight LLMs across four benchmark datasets, outperforming traditional IR baselines, fine-tuned models, and prior LLM-based methods. The results indicate that traceability performance depends on both model capacity and the quality of prompt engineering, and that the achieved scores support semi-automated workflows in which humans review candidate links.
What carries the argument
The TraceLLM framework, which carries the argument through iterative prompt refinement combined with label-aware diversity-based demonstration selection to steer LLMs toward accurate trace-link generation.
If this is right
- Traceability performance is shown to depend on prompt engineering quality in addition to model capacity.
- Label-aware diversity sampling emerges as an effective strategy for choosing demonstrations.
- The method supports semi-automated workflows in which analysts review and validate candidate links.
- Performance gains hold across zero-shot and few-shot regimes and across diverse artifact types.
- The approach generalizes within the tested aerospace and healthcare domains and artifact categories.
Where Pith is reading between the lines
- The same prompt patterns could be tested on other software engineering tasks such as defect localization or change impact analysis.
- Releasing the refined prompts would let practitioners replicate or extend the results on their own datasets.
- Smaller or open-source LLMs might reach usable accuracy with the same prompt discipline, lowering compute requirements.
- Combining the prompt framework with lightweight fine-tuning on project-specific data could further raise precision.
Load-bearing premise
The prompt refinement and demonstration selection strategies developed on these four datasets will transfer to new domains and artifact types without substantial re-tuning.
What would settle it
Apply the published TraceLLM prompts without modification to a fresh dataset from an untested domain such as automotive control software and measure whether the resulting F2 scores fall below the reported state-of-the-art values.
read the original abstract
Requirements traceability, the process of establishing and maintaining relationships between requirements and various software development artifacts, is paramount for ensuring system integrity and fulfilling requirements throughout the Software Development Life Cycle (SDLC). Traditional methods, including manual and information retrieval models, are labor-intensive, error-prone, and limited by low precision. Recently, Large Language Models (LLMs) have demonstrated potential for supporting software engineering tasks through advanced language comprehension. However, a substantial gap exists in the systematic design and evaluation of prompts tailored to extract accurate trace links. This paper introduces TraceLLM, a systematic framework for enhancing requirements traceability through prompt engineering and demonstration selection. Our approach incorporates rigorous dataset splitting, iterative prompt refinement, enrichment with contextual roles and domain knowledge, and evaluation across zero- and few-shot settings. We assess prompt generalization and robustness using eight state-of-the-art LLMs on four benchmark datasets representing diverse domains (aerospace, healthcare) and artifact types (requirements, design elements, test cases, regulations). TraceLLM achieves state-of-the-art F2 scores, outperforming traditional IR baselines, fine-tuned models, and prior LLM-based methods. We also explore the impact of demonstration selection strategies, identifying label-aware, diversity-based sampling as particularly effective. Overall, our findings highlight that traceability performance depends not only on model capacity but also critically on the quality of prompt engineering. In addition, the achieved performance suggests that TraceLLM can support semi-automated traceability workflows in which candidate links are reviewed and validated by human analysts.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces TraceLLM, a framework that applies prompt engineering (including contextual roles, domain knowledge enrichment, iterative refinement, and label-aware diversity-based demonstration selection) to LLMs for requirements traceability. It evaluates the approach in zero- and few-shot settings across eight LLMs and four benchmark datasets spanning aerospace, healthcare, and other domains with varying artifact types, claiming state-of-the-art F2 scores that outperform traditional IR baselines, fine-tuned models, and prior LLM-based methods.
Significance. If the reported performance reflects genuine generalization without evaluation bias, the work would be significant for software engineering by showing that systematic prompt design can deliver high traceability performance without model fine-tuning or large labeled datasets, supporting semi-automated workflows. The identification of effective demonstration selection strategies provides actionable insight into prompt robustness.
major comments (2)
- [Abstract/Methods] Abstract and Methods sections: The description of 'rigorous dataset splitting' followed by 'iterative prompt refinement' does not explicitly confirm that all refinement steps (including any performance-based adjustments) were restricted to a held-out validation partition with prompts frozen before test-set evaluation. This detail is load-bearing for the central SOTA F2 claim, as access to test labels or examples during refinement would introduce optimistic bias and invalidate comparisons to baselines.
- [Evaluation] Evaluation section: The abstract reports SOTA F2 scores but provides no numerical values, confidence intervals, statistical significance tests (e.g., McNemar or paired t-tests), or per-dataset split ratios. Without these, the outperformance claims over IR baselines and fine-tuned models cannot be assessed for robustness or practical effect size.
minor comments (2)
- [Abstract] Abstract: The choice of F2 (beta=2) over F1 is not motivated; a brief justification for emphasizing recall in traceability would clarify the metric selection.
- [Methods] The paper mentions 'parameter-free' aspects of the approach in places but does not clarify whether demonstration selection involves any tunable hyperparameters that could affect reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. The comments highlight critical aspects of methodological transparency that we will address to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract/Methods] Abstract and Methods sections: The description of 'rigorous dataset splitting' followed by 'iterative prompt refinement' does not explicitly confirm that all refinement steps (including any performance-based adjustments) were restricted to a held-out validation partition with prompts frozen before test-set evaluation. This detail is load-bearing for the central SOTA F2 claim, as access to test labels or examples during refinement would introduce optimistic bias and invalidate comparisons to baselines.
Authors: We confirm that all iterative prompt refinement, including any performance-based adjustments, was performed exclusively on a held-out validation partition. Prompts were frozen prior to any test-set evaluation, ensuring no access to test labels or examples. We will revise the Methods section to explicitly document the split ratios, the validation-only refinement protocol, and confirmation that test data remained untouched until final evaluation. revision: yes
-
Referee: [Evaluation] Evaluation section: The abstract reports SOTA F2 scores but provides no numerical values, confidence intervals, statistical significance tests (e.g., McNemar or paired t-tests), or per-dataset split ratios. Without these, the outperformance claims over IR baselines and fine-tuned models cannot be assessed for robustness or practical effect size.
Authors: The Evaluation section already reports the full numerical F2 scores per dataset, confidence intervals, McNemar's test results for statistical significance, and the exact train/validation/test split ratios. To make these immediately visible without requiring readers to reach the body, we will revise the abstract to include representative F2 values, note the use of statistical tests, and reference the split ratios. revision: yes
Circularity Check
No circularity: empirical evaluation with no fitted predictions or self-referential derivations
full rationale
The paper presents an empirical framework for prompt engineering in requirements traceability, reporting F2 scores from evaluations across eight LLMs and four datasets. No equations, parameters fitted to subsets then renamed as predictions, or self-citation chains supporting uniqueness theorems appear in the provided text. The central claims rest on direct experimental comparisons rather than any derivation that reduces to author-defined inputs by construction. Iterative prompt refinement is described as part of the method but does not trigger any of the enumerated circularity patterns (self-definitional, fitted-input-called-prediction, etc.). This is a standard empirical SE paper whose results are externally falsifiable via replication on the same benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can reliably follow enriched prompts that include domain roles and contextual knowledge for traceability tasks.
Forward citations
Cited by 1 Pith paper
-
Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse
A neuro-symbolic agent system for requirements reuse achieves 100% coverage and 0.2% constraint violations by construction through symbolic enforcement of an OOMRAM lattice.
Reference graph
Works this paper leans on
-
[1]
An analysis of the requirements traceability problem
Gotel OC, Finkelstein C. An analysis of the requirements traceability problem. In: Proceedings of ieee international conference on requirements engineering. IEEE
-
[2]
Toward reference models for requirements traceability
Ramesh B, Jarke M. Toward reference models for requirements traceability. IEEE transactions on software engineering. 2002;27(1):58–93
work page 2002
-
[3]
Machine learning approaches for automated software traceability: A systematic literature review
Alturayeif N, Hassine J, Ahmad I. Machine learning approaches for automated software traceability: A systematic literature review. Journal of Systems and Software. 2025;p. 112536
work page 2025
-
[4]
Traceability transformed: Generating more accurate links with pre-trained BERT models
Lin J, Liu Y, Zeng Q, Jiang M, Cleland-Huang J. Traceability transformed: Generating more accurate links with pre-trained BERT models. Proceedings - International Conference on Software Engineering. 2021;p. 324–335. https://doi. org/10.1109/ICSE43902.2021.00040
-
[5]
Lin J, Liu Y, Cleland-Huang J. Information retrieval versus deep learning approaches for generating traceability links in bilingual projects. Empirical Software Engineering. 2022;27(1):5
work page 2022
-
[6]
Prompts matter: Insights and strategies for prompt engineering in automated software traceability
Rodriguez AD, Dearstyne KR, Cleland-Huang J. Prompts matter: Insights and strategies for prompt engineering in automated software traceability. In: IEEE 31st International Requirements Engineering Conference Workshops (REW). IEEE; 2023
work page 2023
-
[7]
Software traceability: trends and future directions
Cleland-Huang J, Gotel OCZ, Huffman Hayes J, M¨ ader P, Zisman A. Software traceability: trends and future directions. In: Future of Software Engineer- ing Proceedings. FOSE 2014. New York, NY, USA: Association for Computing Machinery; 2014. p. 55–69. Available from: https://doi.org/10.1145/2593882. 2593891
-
[8]
A systematic literature review of issue-based requirement traceability
Lyu Y, Cho H, Jung P, Lee S. A systematic literature review of issue-based requirement traceability. Ieee Access. 2023;11:13334–13348
work page 2023
-
[9]
Enhancing Auto- mated Software Traceability by Transfer Learning from Open-World Data
Lin J, Poudel A, Yu W, Zeng Q, Jiang M, Cleland-Huang J. Enhancing Auto- mated Software Traceability by Transfer Learning from Open-World Data. CoRR. 2022;abs/2207.01084. https://doi.org/10.48550/ARXIV.2207.01084
-
[10]
Advancing candidate link generation for requirements tracing: The study of methods
Hayes JH, Dekhtyar A, Sundaram SK. Advancing candidate link generation for requirements tracing: The study of methods. IEEE Transactions on Software Engineering. 2006;32(1):4–19
work page 2006
-
[11]
Recovering trace- ability links between code and documentation
Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E. Recovering trace- ability links between code and documentation. IEEE transactions on software engineering. 2002;28(10):970–983. 43
work page 2002
-
[12]
Automated techniques for capturing custom traceabil- ity links across heterogeneous artifacts
Asuncion HU, Taylor RN. Automated techniques for capturing custom traceabil- ity links across heterogeneous artifacts. In: Software and systems traceability. Springer; 2011. p. 129–146
work page 2011
-
[13]
Rclinker: Automated linking of issue reports and commits leveraging rich contextual information
Le TDB, Linares-V´ asquez M, Lo D, Poshyvanyk D. Rclinker: Automated linking of issue reports and commits leveraging rich contextual information. In: 2015 IEEE 23rd international conference on program comprehension. IEEE; 2015. p. 36–47
work page 2015
-
[14]
Frlink: Improving the recovery of missing issue- commit links by revisiting file relevance
Sun Y, Wang Q, Yang Y. Frlink: Improving the recovery of missing issue- commit links by revisiting file relevance. Information and Software Technology. 2017;84:33–47
work page 2017
-
[15]
Improving missing issue-commit link recov- ery using positive and unlabeled data
Sun Y, Chen C, Wang Q, Boehm B. Improving missing issue-commit link recov- ery using positive and unlabeled data. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE; 2017. p. 147–152
work page 2017
-
[16]
BTLink: automatic link recovery between issues and commits based on pre-trained BERT model
Lan J, Gong L, Zhang J, Zhang H. BTLink: automatic link recovery between issues and commits based on pre-trained BERT model. Empirical Software Engineering. 2023;28(4):103
work page 2023
-
[17]
Enhancing Traceability Link Recovery with Unlabeled Data
Zhu J, Xiao G, Zheng Z, Sui Y. Enhancing Traceability Link Recovery with Unlabeled Data. In: 2022 IEEE 33rd International Symposium on Software Reli- ability Engineering (ISSRE); 2022. p. 446–457. ISSN: 2332-6549. Available from: https://ieeexplore.ieee.org/document/9978994
-
[18]
Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. Gpt- 4 technical report. arXiv preprint arXiv:230308774. 2023;https://doi.org/https: //doi.org/10.48550/arXiv.2303.08774
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
-
[19]
A general language assistant as a laboratory for alignment
Askell A, Bai Y, Chen A, Drain D, Ganguli D, Henighan T, et al. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:211200861. 2021
work page 2021
-
[20]
Improving requirements tracing via infor- mation retrieval
Hayes JH, Dekhtyar A, Osborne J. Improving requirements tracing via infor- mation retrieval. In: Proceedings. 11th IEEE International Requirements Engineering Conference, 2003. IEEE; 2003. p. 138–147
work page 2003
-
[21]
Lucia AD, Fasano F, Oliveto R, Tortora G. Recovering traceability links in software artifact management systems using information retrieval meth- ods. ACM Transactions on Software Engineering and Methodology (TOSEM). 2007;16(4):13–es
work page 2007
-
[22]
On the equivalence of infor- mation retrieval methods for automated traceability link recovery
Oliveto R, Gethers M, Poshyvanyk D, De Lucia A. On the equivalence of infor- mation retrieval methods for automated traceability link recovery. In: 2010 IEEE 18th International Conference on Program Comprehension. IEEE; 2010. p. 68–71. 44
work page 2010
-
[23]
A Machine Learning based Traceability Links Classi- fication: A Preliminary Investigation
Workneh H, Reddivari S. A Machine Learning based Traceability Links Classi- fication: A Preliminary Investigation. In: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE; 2023. p. 989–990
work page 2023
-
[24]
An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods
Wang B, Wang Z, Wan H, Li X, Deng Y. An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods. In: 2023 International Joint Conference on Neural Networks (IJCNN). IEEE; 2023. p. 1–8
work page 2023
-
[25]
On the effectiveness of auto- mated tracing from model changes to project issues
van Oosten W, Rasiman R, Dalpiaz F, Hurkmans T. On the effectiveness of auto- mated tracing from model changes to project issues. Information and Software Technology. 2023;160:107226
work page 2023
-
[26]
Improving the effectiveness of traceability link recovery using hierar- chical bayesian networks
Moran K, Palacio DN, Bernal-C´ ardenas C, McCrystal D, Poshyvanyk D, Shenefiel C, et al. Improving the effectiveness of traceability link recovery using hierar- chical bayesian networks. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering; 2020. p. 873–885
work page 2020
-
[27]
Automatic traceability maintenance via machine learning classification
Mills C, Escobar-Avila J, Haiduc S. Automatic traceability maintenance via machine learning classification. In: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE; 2018. p. 369–380
work page 2018
-
[28]
Traceability in the wild: automatically augmenting incomplete trace links
Rath M, Rendall J, Guo JL, Cleland-Huang J, M¨ ader P. Traceability in the wild: automatically augmenting incomplete trace links. In: Proceedings of the 40th International Conference on Software Engineering; 2018. p. 834–845
work page 2018
-
[29]
Automating traceability link recovery through classification
Mills C. Automating traceability link recovery through classification. In: Pro- ceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering
work page 2017
-
[30]
An Improved Approach to Traceability Recovery Based on Word Embeddings
Zhao T, Cao Q, Sun Q. An Improved Approach to Traceability Recovery Based on Word Embeddings. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC); 2017. p. 81–89. Available from: https://ieeexplore.ieee.org/document/ 8305930
work page 2017
-
[31]
Adapting word embeddings to traceability recovery
Tian Q, Cao Q, Sun Q. Adapting word embeddings to traceability recovery. In: 2018 International Conference on Information Systems and Computer Aided Education (ICISCAE). IEEE; 2018
work page 2018
-
[32]
Automatic traceability link recovery via active learning
Du Tb, Shen Gh, Huang Zq, Yu Ys, Wu Dx. Automatic traceability link recovery via active learning. Frontiers of Information Technology & Electronic Engineering. 2020 Aug;21(8):1217–1225. https://doi.org/10.1631/FITEE.1900222
-
[33]
Enhancing unsupervised requirements trace- ability with sequential semantics
Chen L, Wang D, Wang J, Wang Q. Enhancing unsupervised requirements trace- ability with sequential semantics. In: 2019 26th Asia-Pacific Software Engineering Conference (APSEC). IEEE; 2019. p. 23–30. 45
work page 2019
-
[34]
Classification or Prompting: A Case Study on Legal Requirements Traceability
Etezadi R, Abualhaija S, Arora C, Briand LC. Classification or Prompting: A Case Study on Legal Requirements Traceability. CoRR. 2025;abs/2502.04916. https://doi.org/10.48550/ARXIV.2502.04916. 2502.04916
-
[35]
Requirements Traceability Link Recovery via Retrieval-Augmented Generation
Hey T, Fuchß D, Keim J, Koziolek A. Requirements Traceability Link Recovery via Retrieval-Augmented Generation. In: International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer; 2025. p. 381–397
work page 2025
-
[36]
Masoudifard A, Sorond MM, Madadi M, Sabokrou M, Habibi E. Lever- aging Graph-RAG and Prompt Engineering to Enhance LLM-Based Auto- mated Requirement Traceability and Compliance Checks. arXiv preprint arXiv:241208593. 2024
work page 2024
-
[37]
An LLM-based approach to recover traceability links between secu- rity requirements and goal models
Hassine J. An LLM-based approach to recover traceability links between secu- rity requirements and goal models. In: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering; 2024. p. 643–651
work page 2024
-
[38]
Niu F, Pan R, Briand LC, Hu H, Koravadi K. TVR: Automotive System Require- ment Traceability Validation and Recovery Through Retrieval-Augmented Gen- eration. arXiv preprint arXiv:250415427. 2025
work page 2025
-
[39]
Enabling architecture traceabil- ity by llm-based architecture component name extraction
Fuchß D, Liu H, Hey T, Keim J, Koziolek A. Enabling architecture traceabil- ity by llm-based architecture component name extraction. In: 2025 IEEE 22nd International Conference on Software Architecture (ICSA). IEEE; 2025. p. 1–12
work page 2025
-
[40]
LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation
Fuchß D, Hey T, Keim J, Liu H, Ewald N, Thirolf T, et al. LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation. In: Proceedings of the IEEE/ACM 47th International Conference on Software Engineering. ICSE. vol. 25; 2025
work page 2025
-
[41]
Supporting high-level to low- level requirements coverage reviewing with large language models
Preda AR, Mayr-Dorn C, Mashkoor A, Egyed A. Supporting high-level to low- level requirements coverage reviewing with large language models. In: Proceedings of the 21st International Conference on Mining Software Repositories; 2024. p. 242–253
work page 2024
-
[42]
On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability
Vogelsang A, Korn A, Broccia G, Ferrari A, Fischbach J, Arora C. On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability. arXiv preprint arXiv:250104810. 2025
work page 2025
-
[43]
The prompt report: a systematic survey of prompt engineering techniques
Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, et al. The prompt report: a systematic survey of prompt engineering techniques. arXiv preprint arXiv:240606608. 2024
work page 2024
-
[44]
Prompt programming for large language models: Beyond the few-shot paradigm
Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In: Extended abstracts of the 2021 CHI conference 46 on human factors in computing systems; 2021. p. 1–7
work page 2021
-
[45]
Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and pre- dict: A systematic survey of prompting methods in natural language processing. ACM computing surveys. 2023;55(9):1–35
work page 2023
-
[46]
A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT
White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. In: Proceedings of the 30th Conference on Pattern Languages of Programs. PLoP ’23. The Hillside Group; 2023
work page 2023
-
[47]
Semantically enhanced software traceability using deep learning techniques
Guo J, Cheng J, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th International Confer- ence on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017; 2017. p. 3–14
work page 2017
-
[48]
Dong L, Zhang H, Liu W, Weng Z, Kuang H. Semi-supervised pre-processing for learning-based traceability framework on real-world software projects. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering; 2022. p. 570–582
work page 2022
-
[49]
Available from: https://platform.openai
OpenAI.: Prompt Engineering Guide. Available from: https://platform.openai. com/docs/guides/prompt-engineering
-
[50]
Large language models are zero-shot reasoners
Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. Advances in neural information processing systems. 2022;35:22199–22213
work page 2022
-
[51]
Chain-of- thought prompting elicits reasoning in large language models
Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of- thought prompting elicits reasoning in large language models. Advances in neural information processing systems. 2022;35:24824–24837
work page 2022
-
[52]
Revisiting Demon- stration Selection Strategies in In-Context Learning
Peng K, Ding L, Yuan Y, Liu X, Zhang M, Ouyang Y, et al. Revisiting Demon- stration Selection Strategies in In-Context Learning. In: Ku LW, Martins A, Srikumar V, editors. Proceedings of the 62nd Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics; 2024. p....
work page 2024
-
[53]
Which examples to annotate for in-context learning? towards effective and efficient selection
Mavromatis C, Srinivasan B, Shen Z, Zhang J, Rangwala H, Faloutsos C, et al. Which examples to annotate for in-context learning? towards effective and efficient selection. arXiv preprint arXiv:231020046. 2023
work page 2023
-
[54]
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensi- tivity
Lu Y, Bartolo M, Moore A, Riedel S, Stenetorp P. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensi- tivity. In: Muresan S, Nakov P, Villavicencio A, editors. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: 47 Long Papers). Dublin, Ireland: Association for Computa...
work page 2022
-
[55]
Active Example Selection for In-Context Learning
Zhang Y, Feng S, Tan C. Active Example Selection for In-Context Learning. In: Goldberg Y, Kozareva Z, Zhang Y, editors. Proceedings of the 2022 Confer- ence on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics; 2022. p. 9134–9148. Available from: https://aclanthology.org/2022.emnlp...
work page 2022
-
[56]
Generalizing from a few examples: A survey on few-shot learning
Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur). 2020;53(3):1–34
work page 2020
-
[57]
Active learning literature survey
Settles B. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences. 2009
work page 2009
-
[58]
Exploring Imbalanced Annotations for Effective In-Context Learning
Gao H, Zhang F, Zeng H, Meng D, Jing B, Wei H. Exploring Imbalanced Annotations for Effective In-Context Learning. arXiv preprint arXiv:250204037. 2025
work page 2025
-
[59]
Mitigating Label Biases for In-context Learn- ing
Fei Y, Hou Y, Chen Z, Bosselut A. Mitigating Label Biases for In-context Learn- ing. In: Rogers A, Boyd-Graber J, Okazaki N, editors. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics; 2023. p. 14014–14031. Available from: https://acla...
work page 2023
-
[60]
Cross-Domain Requirements Linking via Adversarial-based Domain Adaptation
Chang Z, Li M, Wang Q, Li S, Wang J. Cross-Domain Requirements Linking via Adversarial-based Domain Adaptation. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE; 2023. p. 1596–1608
work page 2023
-
[61]
A machine learn- ing approach for tracing regulatory codes to product specific requirements
Cleland-Huang J, Czauderna A, Gibiec M, Emenecker J. A machine learn- ing approach for tracing regulatory codes to product specific requirements. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1; 2010. p. 155–164
work page 2010
-
[62]
Large language models for software engineering: A systematic literature review
Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, et al. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology. 2023
work page 2023
-
[63]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Inui K, Jiang J, Ng V, Wan X, editors. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP). Hong Kong, China: Association for Computational L...
work page 2019
-
[64]
Guidelines for empirical studies in software engineering involving large language models
Baltes S, Angermeir F, Arora C, Bar´ on MM, Chen C, B¨ ohme L, et al. Guidelines for empirical studies in software engineering involving large language models. 48 arXiv preprint arXiv:250815503. 2025
work page 2025
-
[65]
OpenRouter.: OpenRouter: Unified API for Large Language Models. Accessed: 2026-01-14. https://openrouter.ai
work page 2026
-
[66]
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Yang C, Hong Y, Lewis G, Wu T, K¨ astner C. What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing. In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering; 2024. p. 306–318
work page 2024
-
[67]
BERT: Pre-training of Deep Bidi- rectional Transformers for Language Understanding
Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidi- rectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. p....
work page 2019
-
[68]
Goal-Centric Traceability for Man- aging Non-Functional Requirements
Cleland-Huang J, Settimi R, BenKhadra O. Goal-Centric Traceability for Man- aging Non-Functional Requirements. Proceedings of the IEEE International Conference on Requirements Engineering. 2007;p. 57–66. https://doi.org/10. 1109/RE.2007.61
work page 2007
-
[69]
Adams Re-Trace: Traceability Recovery in Software Artifacts
De Lucia A, Oliveto R, Sgueglia P. Adams Re-Trace: Traceability Recovery in Software Artifacts. IEEE Transactions on Software Engineering. 2008;34(5):668–
work page 2008
-
[70]
https://doi.org/10.1109/TSE.2008.43
-
[71]
Toward Reference Models for Requirements Traceability
Ramesh B, Jarke M. Toward Reference Models for Requirements Traceability. IEEE Transactions on Software Engineering. 2001;27(1):58–93. https://doi.org/ 10.1109/32.895989
-
[72]
Aizenbud-Reshef N, Nolan BT, Rubin J, Shaham-Gafni Y. Model Traceability. IBM Systems Journal. 2006;45(3):515–526. https://doi.org/10.1147/sj.453.0515
-
[73]
Software and systems traceability
Cleland-Huang J, Gotel O, Zisman A, et al. Software and systems traceability. vol. 2. Springer; 2012. 49
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.