TraceLLM: Leveraging Large Language Models with Prompt Engineering for Enhanced Requirements Traceability

Irfan Ahmad; Jameleddine Hassine; Nouf Alturayeif

arxiv: 2602.01253 · v1 · pith:XDMA35LSnew · submitted 2026-02-01 · 💻 cs.SE

TraceLLM: Leveraging Large Language Models with Prompt Engineering for Enhanced Requirements Traceability

Nouf Alturayeif , Irfan Ahmad , Jameleddine Hassine This is my paper

Pith reviewed 2026-05-25 06:56 UTC · model grok-4.3

classification 💻 cs.SE

keywords requirements traceabilityprompt engineeringlarge language modelsdemonstration selectionsoftware engineeringtrace linksfew-shot promptingbenchmark evaluation

0 comments

The pith

Systematic prompt engineering with LLMs produces state-of-the-art requirements traceability links on four benchmark datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TraceLLM as a framework that applies iterative prompt refinement, contextual role enrichment, domain knowledge injection, and label-aware diversity sampling to guide large language models in recovering trace links between requirements and other artifacts. It evaluates the approach across zero-shot and few-shot regimes using eight different LLMs on four datasets drawn from aerospace and healthcare domains that include requirements, design elements, test cases, and regulations. The central finding is that these prompt choices raise F2 scores above those of information-retrieval baselines, fine-tuned models, and earlier LLM methods. A sympathetic reader would care because the work shows that traceability performance hinges on prompt quality at least as much as on model scale, opening a path to less manual and more reliable link maintenance across the software lifecycle.

Core claim

TraceLLM is a systematic framework that combines rigorous dataset splitting, iterative prompt refinement, enrichment with contextual roles and domain knowledge, and evaluation across zero- and few-shot settings. When paired with label-aware diversity-based demonstration selection, the framework produces state-of-the-art F2 scores on eight LLMs across four benchmark datasets, outperforming traditional IR baselines, fine-tuned models, and prior LLM-based methods. The results indicate that traceability performance depends on both model capacity and the quality of prompt engineering, and that the achieved scores support semi-automated workflows in which humans review candidate links.

What carries the argument

The TraceLLM framework, which carries the argument through iterative prompt refinement combined with label-aware diversity-based demonstration selection to steer LLMs toward accurate trace-link generation.

If this is right

Traceability performance is shown to depend on prompt engineering quality in addition to model capacity.
Label-aware diversity sampling emerges as an effective strategy for choosing demonstrations.
The method supports semi-automated workflows in which analysts review and validate candidate links.
Performance gains hold across zero-shot and few-shot regimes and across diverse artifact types.
The approach generalizes within the tested aerospace and healthcare domains and artifact categories.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prompt patterns could be tested on other software engineering tasks such as defect localization or change impact analysis.
Releasing the refined prompts would let practitioners replicate or extend the results on their own datasets.
Smaller or open-source LLMs might reach usable accuracy with the same prompt discipline, lowering compute requirements.
Combining the prompt framework with lightweight fine-tuning on project-specific data could further raise precision.

Load-bearing premise

The prompt refinement and demonstration selection strategies developed on these four datasets will transfer to new domains and artifact types without substantial re-tuning.

What would settle it

Apply the published TraceLLM prompts without modification to a fresh dataset from an untested domain such as automotive control software and measure whether the resulting F2 scores fall below the reported state-of-the-art values.

read the original abstract

Requirements traceability, the process of establishing and maintaining relationships between requirements and various software development artifacts, is paramount for ensuring system integrity and fulfilling requirements throughout the Software Development Life Cycle (SDLC). Traditional methods, including manual and information retrieval models, are labor-intensive, error-prone, and limited by low precision. Recently, Large Language Models (LLMs) have demonstrated potential for supporting software engineering tasks through advanced language comprehension. However, a substantial gap exists in the systematic design and evaluation of prompts tailored to extract accurate trace links. This paper introduces TraceLLM, a systematic framework for enhancing requirements traceability through prompt engineering and demonstration selection. Our approach incorporates rigorous dataset splitting, iterative prompt refinement, enrichment with contextual roles and domain knowledge, and evaluation across zero- and few-shot settings. We assess prompt generalization and robustness using eight state-of-the-art LLMs on four benchmark datasets representing diverse domains (aerospace, healthcare) and artifact types (requirements, design elements, test cases, regulations). TraceLLM achieves state-of-the-art F2 scores, outperforming traditional IR baselines, fine-tuned models, and prior LLM-based methods. We also explore the impact of demonstration selection strategies, identifying label-aware, diversity-based sampling as particularly effective. Overall, our findings highlight that traceability performance depends not only on model capacity but also critically on the quality of prompt engineering. In addition, the achieved performance suggests that TraceLLM can support semi-automated traceability workflows in which candidate links are reviewed and validated by human analysts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TraceLLM gives a usable structured prompt framework for traceability across multiple LLMs and domains, but the SOTA F2 claims rest on an evaluation process whose separation from test data is not clearly confirmed.

read the letter

The paper's main contribution is a framework called TraceLLM that combines role enrichment, iterative prompt refinement, and label-aware diversity-based demonstration selection for requirements traceability. They apply it across eight LLMs and four benchmark datasets from aerospace and healthcare, covering requirements, design elements, test cases, and regulations. The evaluation includes both zero-shot and few-shot settings and compares against traditional IR methods, fine-tuned models, and earlier LLM approaches. What stands out is the explicit testing of different demonstration selection strategies, with the label-aware approach showing better results. That part gives concrete guidance on what works in practice for this task. The paper also notes that prompt quality matters more than raw model size, which aligns with other recent observations in LLM applications to SE tasks. The scale of the LLM comparison is larger than most prior traceability studies I have seen. The soft spot is the iterative refinement process. The abstract states that rigorous dataset splitting was done, but it does not explicitly say that refinement rounds and selection steps were confined to a validation partition with prompts frozen before final test evaluation. If any tuning had access to test labels, the reported F2 scores would be inflated and the outperformance claims would not hold as a fair comparison. No details appear on statistical significance or exact split ratios either. This leaves the central performance result harder to trust without further checks. The work is aimed at software engineering researchers and practitioners who need traceability support in regulated domains. A reader looking for prompt patterns to adapt could extract some value from the selection strategy results. It is worth sending for peer review so the methods section can be examined for leakage risks and the numbers can be verified.

Referee Report

2 major / 2 minor

Summary. The paper introduces TraceLLM, a framework that applies prompt engineering (including contextual roles, domain knowledge enrichment, iterative refinement, and label-aware diversity-based demonstration selection) to LLMs for requirements traceability. It evaluates the approach in zero- and few-shot settings across eight LLMs and four benchmark datasets spanning aerospace, healthcare, and other domains with varying artifact types, claiming state-of-the-art F2 scores that outperform traditional IR baselines, fine-tuned models, and prior LLM-based methods.

Significance. If the reported performance reflects genuine generalization without evaluation bias, the work would be significant for software engineering by showing that systematic prompt design can deliver high traceability performance without model fine-tuning or large labeled datasets, supporting semi-automated workflows. The identification of effective demonstration selection strategies provides actionable insight into prompt robustness.

major comments (2)

[Abstract/Methods] Abstract and Methods sections: The description of 'rigorous dataset splitting' followed by 'iterative prompt refinement' does not explicitly confirm that all refinement steps (including any performance-based adjustments) were restricted to a held-out validation partition with prompts frozen before test-set evaluation. This detail is load-bearing for the central SOTA F2 claim, as access to test labels or examples during refinement would introduce optimistic bias and invalidate comparisons to baselines.
[Evaluation] Evaluation section: The abstract reports SOTA F2 scores but provides no numerical values, confidence intervals, statistical significance tests (e.g., McNemar or paired t-tests), or per-dataset split ratios. Without these, the outperformance claims over IR baselines and fine-tuned models cannot be assessed for robustness or practical effect size.

minor comments (2)

[Abstract] Abstract: The choice of F2 (beta=2) over F1 is not motivated; a brief justification for emphasizing recall in traceability would clarify the metric selection.
[Methods] The paper mentions 'parameter-free' aspects of the approach in places but does not clarify whether demonstration selection involves any tunable hyperparameters that could affect reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. The comments highlight critical aspects of methodological transparency that we will address to strengthen the paper.

read point-by-point responses

Referee: [Abstract/Methods] Abstract and Methods sections: The description of 'rigorous dataset splitting' followed by 'iterative prompt refinement' does not explicitly confirm that all refinement steps (including any performance-based adjustments) were restricted to a held-out validation partition with prompts frozen before test-set evaluation. This detail is load-bearing for the central SOTA F2 claim, as access to test labels or examples during refinement would introduce optimistic bias and invalidate comparisons to baselines.

Authors: We confirm that all iterative prompt refinement, including any performance-based adjustments, was performed exclusively on a held-out validation partition. Prompts were frozen prior to any test-set evaluation, ensuring no access to test labels or examples. We will revise the Methods section to explicitly document the split ratios, the validation-only refinement protocol, and confirmation that test data remained untouched until final evaluation. revision: yes
Referee: [Evaluation] Evaluation section: The abstract reports SOTA F2 scores but provides no numerical values, confidence intervals, statistical significance tests (e.g., McNemar or paired t-tests), or per-dataset split ratios. Without these, the outperformance claims over IR baselines and fine-tuned models cannot be assessed for robustness or practical effect size.

Authors: The Evaluation section already reports the full numerical F2 scores per dataset, confidence intervals, McNemar's test results for statistical significance, and the exact train/validation/test split ratios. To make these immediately visible without requiring readers to reach the body, we will revise the abstract to include representative F2 values, note the use of statistical tests, and reference the split ratios. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation with no fitted predictions or self-referential derivations

full rationale

The paper presents an empirical framework for prompt engineering in requirements traceability, reporting F2 scores from evaluations across eight LLMs and four datasets. No equations, parameters fitted to subsets then renamed as predictions, or self-citation chains supporting uniqueness theorems appear in the provided text. The central claims rest on direct experimental comparisons rather than any derivation that reduces to author-defined inputs by construction. Iterative prompt refinement is described as part of the method but does not trigger any of the enumerated circularity patterns (self-definitional, fitted-input-called-prediction, etc.). This is a standard empirical SE paper whose results are externally falsifiable via replication on the same benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions about LLM prompt sensitivity and the representativeness of the four benchmark datasets; no free parameters, invented entities, or ad-hoc axioms are visible from the abstract.

axioms (1)

domain assumption Large language models can reliably follow enriched prompts that include domain roles and contextual knowledge for traceability tasks.
Invoked implicitly in the description of prompt enrichment and evaluation across zero- and few-shot settings.

pith-pipeline@v0.9.0 · 5810 in / 1247 out tokens · 30051 ms · 2026-05-25T06:56:14.674351+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Neuro-Symbolic Agents for Hallucination-Free Requirements Reuse
cs.SE 2026-05 unverdicted novelty 6.0

A neuro-symbolic agent system for requirements reuse achieves 100% coverage and 0.2% constraint violations by construction through symbolic enforcement of an OOMRAM lattice.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

An analysis of the requirements traceability problem

Gotel OC, Finkelstein C. An analysis of the requirements traceability problem. In: Proceedings of ieee international conference on requirements engineering. IEEE

work page
[2]

Toward reference models for requirements traceability

Ramesh B, Jarke M. Toward reference models for requirements traceability. IEEE transactions on software engineering. 2002;27(1):58–93

work page 2002
[3]

Machine learning approaches for automated software traceability: A systematic literature review

Alturayeif N, Hassine J, Ahmad I. Machine learning approaches for automated software traceability: A systematic literature review. Journal of Systems and Software. 2025;p. 112536

work page 2025
[4]

Traceability transformed: Generating more accurate links with pre-trained BERT models

Lin J, Liu Y, Zeng Q, Jiang M, Cleland-Huang J. Traceability transformed: Generating more accurate links with pre-trained BERT models. Proceedings - International Conference on Software Engineering. 2021;p. 324–335. https://doi. org/10.1109/ICSE43902.2021.00040

work page doi:10.1109/icse43902.2021.00040 2021
[5]

Information retrieval versus deep learning approaches for generating traceability links in bilingual projects

Lin J, Liu Y, Cleland-Huang J. Information retrieval versus deep learning approaches for generating traceability links in bilingual projects. Empirical Software Engineering. 2022;27(1):5

work page 2022
[6]

Prompts matter: Insights and strategies for prompt engineering in automated software traceability

Rodriguez AD, Dearstyne KR, Cleland-Huang J. Prompts matter: Insights and strategies for prompt engineering in automated software traceability. In: IEEE 31st International Requirements Engineering Conference Workshops (REW). IEEE; 2023

work page 2023
[7]

Software traceability: trends and future directions

Cleland-Huang J, Gotel OCZ, Huffman Hayes J, M¨ ader P, Zisman A. Software traceability: trends and future directions. In: Future of Software Engineer- ing Proceedings. FOSE 2014. New York, NY, USA: Association for Computing Machinery; 2014. p. 55–69. Available from: https://doi.org/10.1145/2593882. 2593891

work page doi:10.1145/2593882 2014
[8]

A systematic literature review of issue-based requirement traceability

Lyu Y, Cho H, Jung P, Lee S. A systematic literature review of issue-based requirement traceability. Ieee Access. 2023;11:13334–13348

work page 2023
[9]

Enhancing Auto- mated Software Traceability by Transfer Learning from Open-World Data

Lin J, Poudel A, Yu W, Zeng Q, Jiang M, Cleland-Huang J. Enhancing Auto- mated Software Traceability by Transfer Learning from Open-World Data. CoRR. 2022;abs/2207.01084. https://doi.org/10.48550/ARXIV.2207.01084

work page doi:10.48550/arxiv.2207.01084 2022
[10]

Advancing candidate link generation for requirements tracing: The study of methods

Hayes JH, Dekhtyar A, Sundaram SK. Advancing candidate link generation for requirements tracing: The study of methods. IEEE Transactions on Software Engineering. 2006;32(1):4–19

work page 2006
[11]

Recovering trace- ability links between code and documentation

Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E. Recovering trace- ability links between code and documentation. IEEE transactions on software engineering. 2002;28(10):970–983. 43

work page 2002
[12]

Automated techniques for capturing custom traceabil- ity links across heterogeneous artifacts

Asuncion HU, Taylor RN. Automated techniques for capturing custom traceabil- ity links across heterogeneous artifacts. In: Software and systems traceability. Springer; 2011. p. 129–146

work page 2011
[13]

Rclinker: Automated linking of issue reports and commits leveraging rich contextual information

Le TDB, Linares-V´ asquez M, Lo D, Poshyvanyk D. Rclinker: Automated linking of issue reports and commits leveraging rich contextual information. In: 2015 IEEE 23rd international conference on program comprehension. IEEE; 2015. p. 36–47

work page 2015
[14]

Frlink: Improving the recovery of missing issue- commit links by revisiting file relevance

Sun Y, Wang Q, Yang Y. Frlink: Improving the recovery of missing issue- commit links by revisiting file relevance. Information and Software Technology. 2017;84:33–47

work page 2017
[15]

Improving missing issue-commit link recov- ery using positive and unlabeled data

Sun Y, Chen C, Wang Q, Boehm B. Improving missing issue-commit link recov- ery using positive and unlabeled data. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE; 2017. p. 147–152

work page 2017
[16]

BTLink: automatic link recovery between issues and commits based on pre-trained BERT model

Lan J, Gong L, Zhang J, Zhang H. BTLink: automatic link recovery between issues and commits based on pre-trained BERT model. Empirical Software Engineering. 2023;28(4):103

work page 2023
[17]

Enhancing Traceability Link Recovery with Unlabeled Data

Zhu J, Xiao G, Zheng Z, Sui Y. Enhancing Traceability Link Recovery with Unlabeled Data. In: 2022 IEEE 33rd International Symposium on Software Reli- ability Engineering (ISSRE); 2022. p. 446–457. ISSN: 2332-6549. Available from: https://ieeexplore.ieee.org/document/9978994

work page arXiv 2022
[18]

GPT-4 Technical Report

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. Gpt- 4 technical report. arXiv preprint arXiv:230308774. 2023;https://doi.org/https: //doi.org/10.48550/arXiv.2303.08774

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023
[19]

A general language assistant as a laboratory for alignment

Askell A, Bai Y, Chen A, Drain D, Ganguli D, Henighan T, et al. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:211200861. 2021

work page 2021
[20]

Improving requirements tracing via infor- mation retrieval

Hayes JH, Dekhtyar A, Osborne J. Improving requirements tracing via infor- mation retrieval. In: Proceedings. 11th IEEE International Requirements Engineering Conference, 2003. IEEE; 2003. p. 138–147

work page 2003
[21]

Recovering traceability links in software artifact management systems using information retrieval meth- ods

Lucia AD, Fasano F, Oliveto R, Tortora G. Recovering traceability links in software artifact management systems using information retrieval meth- ods. ACM Transactions on Software Engineering and Methodology (TOSEM). 2007;16(4):13–es

work page 2007
[22]

On the equivalence of infor- mation retrieval methods for automated traceability link recovery

Oliveto R, Gethers M, Poshyvanyk D, De Lucia A. On the equivalence of infor- mation retrieval methods for automated traceability link recovery. In: 2010 IEEE 18th International Conference on Program Comprehension. IEEE; 2010. p. 68–71. 44

work page 2010
[23]

A Machine Learning based Traceability Links Classi- fication: A Preliminary Investigation

Workneh H, Reddivari S. A Machine Learning based Traceability Links Classi- fication: A Preliminary Investigation. In: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE; 2023. p. 989–990

work page 2023
[24]

An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods

Wang B, Wang Z, Wan H, Li X, Deng Y. An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods. In: 2023 International Joint Conference on Neural Networks (IJCNN). IEEE; 2023. p. 1–8

work page 2023
[25]

On the effectiveness of auto- mated tracing from model changes to project issues

van Oosten W, Rasiman R, Dalpiaz F, Hurkmans T. On the effectiveness of auto- mated tracing from model changes to project issues. Information and Software Technology. 2023;160:107226

work page 2023
[26]

Improving the effectiveness of traceability link recovery using hierar- chical bayesian networks

Moran K, Palacio DN, Bernal-C´ ardenas C, McCrystal D, Poshyvanyk D, Shenefiel C, et al. Improving the effectiveness of traceability link recovery using hierar- chical bayesian networks. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering; 2020. p. 873–885

work page 2020
[27]

Automatic traceability maintenance via machine learning classification

Mills C, Escobar-Avila J, Haiduc S. Automatic traceability maintenance via machine learning classification. In: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE; 2018. p. 369–380

work page 2018
[28]

Traceability in the wild: automatically augmenting incomplete trace links

Rath M, Rendall J, Guo JL, Cleland-Huang J, M¨ ader P. Traceability in the wild: automatically augmenting incomplete trace links. In: Proceedings of the 40th International Conference on Software Engineering; 2018. p. 834–845

work page 2018
[29]

Automating traceability link recovery through classification

Mills C. Automating traceability link recovery through classification. In: Pro- ceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

work page 2017
[30]

An Improved Approach to Traceability Recovery Based on Word Embeddings

Zhao T, Cao Q, Sun Q. An Improved Approach to Traceability Recovery Based on Word Embeddings. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC); 2017. p. 81–89. Available from: https://ieeexplore.ieee.org/document/ 8305930

work page 2017
[31]

Adapting word embeddings to traceability recovery

Tian Q, Cao Q, Sun Q. Adapting word embeddings to traceability recovery. In: 2018 International Conference on Information Systems and Computer Aided Education (ICISCAE). IEEE; 2018

work page 2018
[32]

Automatic traceability link recovery via active learning

Du Tb, Shen Gh, Huang Zq, Yu Ys, Wu Dx. Automatic traceability link recovery via active learning. Frontiers of Information Technology & Electronic Engineering. 2020 Aug;21(8):1217–1225. https://doi.org/10.1631/FITEE.1900222

work page doi:10.1631/fitee.1900222 2020
[33]

Enhancing unsupervised requirements trace- ability with sequential semantics

Chen L, Wang D, Wang J, Wang Q. Enhancing unsupervised requirements trace- ability with sequential semantics. In: 2019 26th Asia-Pacific Software Engineering Conference (APSEC). IEEE; 2019. p. 23–30. 45

work page 2019
[34]

Classification or Prompting: A Case Study on Legal Requirements Traceability

Etezadi R, Abualhaija S, Arora C, Briand LC. Classification or Prompting: A Case Study on Legal Requirements Traceability. CoRR. 2025;abs/2502.04916. https://doi.org/10.48550/ARXIV.2502.04916. 2502.04916

work page doi:10.48550/arxiv.2502.04916 2025
[35]

Requirements Traceability Link Recovery via Retrieval-Augmented Generation

Hey T, Fuchß D, Keim J, Koziolek A. Requirements Traceability Link Recovery via Retrieval-Augmented Generation. In: International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer; 2025. p. 381–397

work page 2025
[36]

Lever- aging Graph-RAG and Prompt Engineering to Enhance LLM-Based Auto- mated Requirement Traceability and Compliance Checks

Masoudifard A, Sorond MM, Madadi M, Sabokrou M, Habibi E. Lever- aging Graph-RAG and Prompt Engineering to Enhance LLM-Based Auto- mated Requirement Traceability and Compliance Checks. arXiv preprint arXiv:241208593. 2024

work page 2024
[37]

An LLM-based approach to recover traceability links between secu- rity requirements and goal models

Hassine J. An LLM-based approach to recover traceability links between secu- rity requirements and goal models. In: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering; 2024. p. 643–651

work page 2024
[38]

TVR: Automotive System Require- ment Traceability Validation and Recovery Through Retrieval-Augmented Gen- eration

Niu F, Pan R, Briand LC, Hu H, Koravadi K. TVR: Automotive System Require- ment Traceability Validation and Recovery Through Retrieval-Augmented Gen- eration. arXiv preprint arXiv:250415427. 2025

work page 2025
[39]

Enabling architecture traceabil- ity by llm-based architecture component name extraction

Fuchß D, Liu H, Hey T, Keim J, Koziolek A. Enabling architecture traceabil- ity by llm-based architecture component name extraction. In: 2025 IEEE 22nd International Conference on Software Architecture (ICSA). IEEE; 2025. p. 1–12

work page 2025
[40]

LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation

Fuchß D, Hey T, Keim J, Liu H, Ewald N, Thirolf T, et al. LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation. In: Proceedings of the IEEE/ACM 47th International Conference on Software Engineering. ICSE. vol. 25; 2025

work page 2025
[41]

Supporting high-level to low- level requirements coverage reviewing with large language models

Preda AR, Mayr-Dorn C, Mashkoor A, Egyed A. Supporting high-level to low- level requirements coverage reviewing with large language models. In: Proceedings of the 21st International Conference on Mining Software Repositories; 2024. p. 242–253

work page 2024
[42]

On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability

Vogelsang A, Korn A, Broccia G, Ferrari A, Fischbach J, Arora C. On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability. arXiv preprint arXiv:250104810. 2025

work page 2025
[43]

The prompt report: a systematic survey of prompt engineering techniques

Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, et al. The prompt report: a systematic survey of prompt engineering techniques. arXiv preprint arXiv:240606608. 2024

work page 2024
[44]

Prompt programming for large language models: Beyond the few-shot paradigm

Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In: Extended abstracts of the 2021 CHI conference 46 on human factors in computing systems; 2021. p. 1–7

work page 2021
[45]

Pre-train, prompt, and pre- dict: A systematic survey of prompting methods in natural language processing

Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and pre- dict: A systematic survey of prompting methods in natural language processing. ACM computing surveys. 2023;55(9):1–35

work page 2023
[46]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. In: Proceedings of the 30th Conference on Pattern Languages of Programs. PLoP ’23. The Hillside Group; 2023

work page 2023
[47]

Semantically enhanced software traceability using deep learning techniques

Guo J, Cheng J, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th International Confer- ence on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017; 2017. p. 3–14

work page 2017
[48]

Semi-supervised pre-processing for learning-based traceability framework on real-world software projects

Dong L, Zhang H, Liu W, Weng Z, Kuang H. Semi-supervised pre-processing for learning-based traceability framework on real-world software projects. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering; 2022. p. 570–582

work page 2022
[49]

Available from: https://platform.openai

OpenAI.: Prompt Engineering Guide. Available from: https://platform.openai. com/docs/guides/prompt-engineering

work page
[50]

Large language models are zero-shot reasoners

Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. Advances in neural information processing systems. 2022;35:22199–22213

work page 2022
[51]

Chain-of- thought prompting elicits reasoning in large language models

Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of- thought prompting elicits reasoning in large language models. Advances in neural information processing systems. 2022;35:24824–24837

work page 2022
[52]

Revisiting Demon- stration Selection Strategies in In-Context Learning

Peng K, Ding L, Yuan Y, Liu X, Zhang M, Ouyang Y, et al. Revisiting Demon- stration Selection Strategies in In-Context Learning. In: Ku LW, Martins A, Srikumar V, editors. Proceedings of the 62nd Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics; 2024. p....

work page 2024
[53]

Which examples to annotate for in-context learning? towards effective and efficient selection

Mavromatis C, Srinivasan B, Shen Z, Zhang J, Rangwala H, Faloutsos C, et al. Which examples to annotate for in-context learning? towards effective and efficient selection. arXiv preprint arXiv:231020046. 2023

work page 2023
[54]

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensi- tivity

Lu Y, Bartolo M, Moore A, Riedel S, Stenetorp P. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensi- tivity. In: Muresan S, Nakov P, Villavicencio A, editors. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: 47 Long Papers). Dublin, Ireland: Association for Computa...

work page 2022
[55]

Active Example Selection for In-Context Learning

Zhang Y, Feng S, Tan C. Active Example Selection for In-Context Learning. In: Goldberg Y, Kozareva Z, Zhang Y, editors. Proceedings of the 2022 Confer- ence on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics; 2022. p. 9134–9148. Available from: https://aclanthology.org/2022.emnlp...

work page 2022
[56]

Generalizing from a few examples: A survey on few-shot learning

Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur). 2020;53(3):1–34

work page 2020
[57]

Active learning literature survey

Settles B. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences. 2009

work page 2009
[58]

Exploring Imbalanced Annotations for Effective In-Context Learning

Gao H, Zhang F, Zeng H, Meng D, Jing B, Wei H. Exploring Imbalanced Annotations for Effective In-Context Learning. arXiv preprint arXiv:250204037. 2025

work page 2025
[59]

Mitigating Label Biases for In-context Learn- ing

Fei Y, Hou Y, Chen Z, Bosselut A. Mitigating Label Biases for In-context Learn- ing. In: Rogers A, Boyd-Graber J, Okazaki N, editors. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics; 2023. p. 14014–14031. Available from: https://acla...

work page 2023
[60]

Cross-Domain Requirements Linking via Adversarial-based Domain Adaptation

Chang Z, Li M, Wang Q, Li S, Wang J. Cross-Domain Requirements Linking via Adversarial-based Domain Adaptation. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE; 2023. p. 1596–1608

work page 2023
[61]

A machine learn- ing approach for tracing regulatory codes to product specific requirements

Cleland-Huang J, Czauderna A, Gibiec M, Emenecker J. A machine learn- ing approach for tracing regulatory codes to product specific requirements. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1; 2010. p. 155–164

work page 2010
[62]

Large language models for software engineering: A systematic literature review

Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, et al. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology. 2023

work page 2023
[63]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Inui K, Jiang J, Ng V, Wan X, editors. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP). Hong Kong, China: Association for Computational L...

work page 2019
[64]

Guidelines for empirical studies in software engineering involving large language models

Baltes S, Angermeir F, Arora C, Bar´ on MM, Chen C, B¨ ohme L, et al. Guidelines for empirical studies in software engineering involving large language models. 48 arXiv preprint arXiv:250815503. 2025

work page 2025
[65]

Accessed: 2026-01-14

OpenRouter.: OpenRouter: Unified API for Large Language Models. Accessed: 2026-01-14. https://openrouter.ai

work page 2026
[66]

What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing

Yang C, Hong Y, Lewis G, Wu T, K¨ astner C. What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing. In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering; 2024. p. 306–318

work page 2024
[67]

BERT: Pre-training of Deep Bidi- rectional Transformers for Language Understanding

Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidi- rectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. p....

work page 2019
[68]

Goal-Centric Traceability for Man- aging Non-Functional Requirements

Cleland-Huang J, Settimi R, BenKhadra O. Goal-Centric Traceability for Man- aging Non-Functional Requirements. Proceedings of the IEEE International Conference on Requirements Engineering. 2007;p. 57–66. https://doi.org/10. 1109/RE.2007.61

work page 2007
[69]

Adams Re-Trace: Traceability Recovery in Software Artifacts

De Lucia A, Oliveto R, Sgueglia P. Adams Re-Trace: Traceability Recovery in Software Artifacts. IEEE Transactions on Software Engineering. 2008;34(5):668–

work page 2008
[70]

https://doi.org/10.1109/TSE.2008.43

work page doi:10.1109/tse.2008.43 2008
[71]

Toward Reference Models for Requirements Traceability

Ramesh B, Jarke M. Toward Reference Models for Requirements Traceability. IEEE Transactions on Software Engineering. 2001;27(1):58–93. https://doi.org/ 10.1109/32.895989

work page doi:10.1109/32.895989 2001
[72]

Model Traceability

Aizenbud-Reshef N, Nolan BT, Rubin J, Shaham-Gafni Y. Model Traceability. IBM Systems Journal. 2006;45(3):515–526. https://doi.org/10.1147/sj.453.0515

work page doi:10.1147/sj.453.0515 2006
[73]

Software and systems traceability

Cleland-Huang J, Gotel O, Zisman A, et al. Software and systems traceability. vol. 2. Springer; 2012. 49

work page 2012

[1] [1]

An analysis of the requirements traceability problem

Gotel OC, Finkelstein C. An analysis of the requirements traceability problem. In: Proceedings of ieee international conference on requirements engineering. IEEE

work page

[2] [2]

Toward reference models for requirements traceability

Ramesh B, Jarke M. Toward reference models for requirements traceability. IEEE transactions on software engineering. 2002;27(1):58–93

work page 2002

[3] [3]

Machine learning approaches for automated software traceability: A systematic literature review

Alturayeif N, Hassine J, Ahmad I. Machine learning approaches for automated software traceability: A systematic literature review. Journal of Systems and Software. 2025;p. 112536

work page 2025

[4] [4]

Traceability transformed: Generating more accurate links with pre-trained BERT models

Lin J, Liu Y, Zeng Q, Jiang M, Cleland-Huang J. Traceability transformed: Generating more accurate links with pre-trained BERT models. Proceedings - International Conference on Software Engineering. 2021;p. 324–335. https://doi. org/10.1109/ICSE43902.2021.00040

work page doi:10.1109/icse43902.2021.00040 2021

[5] [5]

Information retrieval versus deep learning approaches for generating traceability links in bilingual projects

Lin J, Liu Y, Cleland-Huang J. Information retrieval versus deep learning approaches for generating traceability links in bilingual projects. Empirical Software Engineering. 2022;27(1):5

work page 2022

[6] [6]

Prompts matter: Insights and strategies for prompt engineering in automated software traceability

Rodriguez AD, Dearstyne KR, Cleland-Huang J. Prompts matter: Insights and strategies for prompt engineering in automated software traceability. In: IEEE 31st International Requirements Engineering Conference Workshops (REW). IEEE; 2023

work page 2023

[7] [7]

Software traceability: trends and future directions

Cleland-Huang J, Gotel OCZ, Huffman Hayes J, M¨ ader P, Zisman A. Software traceability: trends and future directions. In: Future of Software Engineer- ing Proceedings. FOSE 2014. New York, NY, USA: Association for Computing Machinery; 2014. p. 55–69. Available from: https://doi.org/10.1145/2593882. 2593891

work page doi:10.1145/2593882 2014

[8] [8]

A systematic literature review of issue-based requirement traceability

Lyu Y, Cho H, Jung P, Lee S. A systematic literature review of issue-based requirement traceability. Ieee Access. 2023;11:13334–13348

work page 2023

[9] [9]

Enhancing Auto- mated Software Traceability by Transfer Learning from Open-World Data

Lin J, Poudel A, Yu W, Zeng Q, Jiang M, Cleland-Huang J. Enhancing Auto- mated Software Traceability by Transfer Learning from Open-World Data. CoRR. 2022;abs/2207.01084. https://doi.org/10.48550/ARXIV.2207.01084

work page doi:10.48550/arxiv.2207.01084 2022

[10] [10]

Advancing candidate link generation for requirements tracing: The study of methods

Hayes JH, Dekhtyar A, Sundaram SK. Advancing candidate link generation for requirements tracing: The study of methods. IEEE Transactions on Software Engineering. 2006;32(1):4–19

work page 2006

[11] [11]

Recovering trace- ability links between code and documentation

Antoniol G, Canfora G, Casazza G, De Lucia A, Merlo E. Recovering trace- ability links between code and documentation. IEEE transactions on software engineering. 2002;28(10):970–983. 43

work page 2002

[12] [12]

Automated techniques for capturing custom traceabil- ity links across heterogeneous artifacts

Asuncion HU, Taylor RN. Automated techniques for capturing custom traceabil- ity links across heterogeneous artifacts. In: Software and systems traceability. Springer; 2011. p. 129–146

work page 2011

[13] [13]

Rclinker: Automated linking of issue reports and commits leveraging rich contextual information

Le TDB, Linares-V´ asquez M, Lo D, Poshyvanyk D. Rclinker: Automated linking of issue reports and commits leveraging rich contextual information. In: 2015 IEEE 23rd international conference on program comprehension. IEEE; 2015. p. 36–47

work page 2015

[14] [14]

Frlink: Improving the recovery of missing issue- commit links by revisiting file relevance

Sun Y, Wang Q, Yang Y. Frlink: Improving the recovery of missing issue- commit links by revisiting file relevance. Information and Software Technology. 2017;84:33–47

work page 2017

[15] [15]

Improving missing issue-commit link recov- ery using positive and unlabeled data

Sun Y, Chen C, Wang Q, Boehm B. Improving missing issue-commit link recov- ery using positive and unlabeled data. In: 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE; 2017. p. 147–152

work page 2017

[16] [16]

BTLink: automatic link recovery between issues and commits based on pre-trained BERT model

Lan J, Gong L, Zhang J, Zhang H. BTLink: automatic link recovery between issues and commits based on pre-trained BERT model. Empirical Software Engineering. 2023;28(4):103

work page 2023

[17] [17]

Enhancing Traceability Link Recovery with Unlabeled Data

Zhu J, Xiao G, Zheng Z, Sui Y. Enhancing Traceability Link Recovery with Unlabeled Data. In: 2022 IEEE 33rd International Symposium on Software Reli- ability Engineering (ISSRE); 2022. p. 446–457. ISSN: 2332-6549. Available from: https://ieeexplore.ieee.org/document/9978994

work page arXiv 2022

[18] [18]

GPT-4 Technical Report

Achiam J, Adler S, Agarwal S, Ahmad L, Akkaya I, Aleman FL, et al. Gpt- 4 technical report. arXiv preprint arXiv:230308774. 2023;https://doi.org/https: //doi.org/10.48550/arXiv.2303.08774

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774 2023

[19] [19]

A general language assistant as a laboratory for alignment

Askell A, Bai Y, Chen A, Drain D, Ganguli D, Henighan T, et al. A general language assistant as a laboratory for alignment. arXiv preprint arXiv:211200861. 2021

work page 2021

[20] [20]

Improving requirements tracing via infor- mation retrieval

Hayes JH, Dekhtyar A, Osborne J. Improving requirements tracing via infor- mation retrieval. In: Proceedings. 11th IEEE International Requirements Engineering Conference, 2003. IEEE; 2003. p. 138–147

work page 2003

[21] [21]

Recovering traceability links in software artifact management systems using information retrieval meth- ods

Lucia AD, Fasano F, Oliveto R, Tortora G. Recovering traceability links in software artifact management systems using information retrieval meth- ods. ACM Transactions on Software Engineering and Methodology (TOSEM). 2007;16(4):13–es

work page 2007

[22] [22]

On the equivalence of infor- mation retrieval methods for automated traceability link recovery

Oliveto R, Gethers M, Poshyvanyk D, De Lucia A. On the equivalence of infor- mation retrieval methods for automated traceability link recovery. In: 2010 IEEE 18th International Conference on Program Comprehension. IEEE; 2010. p. 68–71. 44

work page 2010

[23] [23]

A Machine Learning based Traceability Links Classi- fication: A Preliminary Investigation

Workneh H, Reddivari S. A Machine Learning based Traceability Links Classi- fication: A Preliminary Investigation. In: 2023 IEEE 47th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE; 2023. p. 989–990

work page 2023

[24] [24]

An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods

Wang B, Wang Z, Wan H, Li X, Deng Y. An Empirical Study on Data Balancing in Machine Learning Based Software Traceability Methods. In: 2023 International Joint Conference on Neural Networks (IJCNN). IEEE; 2023. p. 1–8

work page 2023

[25] [25]

On the effectiveness of auto- mated tracing from model changes to project issues

van Oosten W, Rasiman R, Dalpiaz F, Hurkmans T. On the effectiveness of auto- mated tracing from model changes to project issues. Information and Software Technology. 2023;160:107226

work page 2023

[26] [26]

Improving the effectiveness of traceability link recovery using hierar- chical bayesian networks

Moran K, Palacio DN, Bernal-C´ ardenas C, McCrystal D, Poshyvanyk D, Shenefiel C, et al. Improving the effectiveness of traceability link recovery using hierar- chical bayesian networks. In: Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering; 2020. p. 873–885

work page 2020

[27] [27]

Automatic traceability maintenance via machine learning classification

Mills C, Escobar-Avila J, Haiduc S. Automatic traceability maintenance via machine learning classification. In: 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME). IEEE; 2018. p. 369–380

work page 2018

[28] [28]

Traceability in the wild: automatically augmenting incomplete trace links

Rath M, Rendall J, Guo JL, Cleland-Huang J, M¨ ader P. Traceability in the wild: automatically augmenting incomplete trace links. In: Proceedings of the 40th International Conference on Software Engineering; 2018. p. 834–845

work page 2018

[29] [29]

Automating traceability link recovery through classification

Mills C. Automating traceability link recovery through classification. In: Pro- ceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering

work page 2017

[30] [30]

An Improved Approach to Traceability Recovery Based on Word Embeddings

Zhao T, Cao Q, Sun Q. An Improved Approach to Traceability Recovery Based on Word Embeddings. In: 2017 24th Asia-Pacific Software Engineering Conference (APSEC); 2017. p. 81–89. Available from: https://ieeexplore.ieee.org/document/ 8305930

work page 2017

[31] [31]

Adapting word embeddings to traceability recovery

Tian Q, Cao Q, Sun Q. Adapting word embeddings to traceability recovery. In: 2018 International Conference on Information Systems and Computer Aided Education (ICISCAE). IEEE; 2018

work page 2018

[32] [32]

Automatic traceability link recovery via active learning

Du Tb, Shen Gh, Huang Zq, Yu Ys, Wu Dx. Automatic traceability link recovery via active learning. Frontiers of Information Technology & Electronic Engineering. 2020 Aug;21(8):1217–1225. https://doi.org/10.1631/FITEE.1900222

work page doi:10.1631/fitee.1900222 2020

[33] [33]

Enhancing unsupervised requirements trace- ability with sequential semantics

Chen L, Wang D, Wang J, Wang Q. Enhancing unsupervised requirements trace- ability with sequential semantics. In: 2019 26th Asia-Pacific Software Engineering Conference (APSEC). IEEE; 2019. p. 23–30. 45

work page 2019

[34] [34]

Classification or Prompting: A Case Study on Legal Requirements Traceability

Etezadi R, Abualhaija S, Arora C, Briand LC. Classification or Prompting: A Case Study on Legal Requirements Traceability. CoRR. 2025;abs/2502.04916. https://doi.org/10.48550/ARXIV.2502.04916. 2502.04916

work page doi:10.48550/arxiv.2502.04916 2025

[35] [35]

Requirements Traceability Link Recovery via Retrieval-Augmented Generation

Hey T, Fuchß D, Keim J, Koziolek A. Requirements Traceability Link Recovery via Retrieval-Augmented Generation. In: International Working Conference on Requirements Engineering: Foundation for Software Quality. Springer; 2025. p. 381–397

work page 2025

[36] [36]

Lever- aging Graph-RAG and Prompt Engineering to Enhance LLM-Based Auto- mated Requirement Traceability and Compliance Checks

Masoudifard A, Sorond MM, Madadi M, Sabokrou M, Habibi E. Lever- aging Graph-RAG and Prompt Engineering to Enhance LLM-Based Auto- mated Requirement Traceability and Compliance Checks. arXiv preprint arXiv:241208593. 2024

work page 2024

[37] [37]

An LLM-based approach to recover traceability links between secu- rity requirements and goal models

Hassine J. An LLM-based approach to recover traceability links between secu- rity requirements and goal models. In: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering; 2024. p. 643–651

work page 2024

[38] [38]

TVR: Automotive System Require- ment Traceability Validation and Recovery Through Retrieval-Augmented Gen- eration

Niu F, Pan R, Briand LC, Hu H, Koravadi K. TVR: Automotive System Require- ment Traceability Validation and Recovery Through Retrieval-Augmented Gen- eration. arXiv preprint arXiv:250415427. 2025

work page 2025

[39] [39]

Enabling architecture traceabil- ity by llm-based architecture component name extraction

Fuchß D, Liu H, Hey T, Keim J, Koziolek A. Enabling architecture traceabil- ity by llm-based architecture component name extraction. In: 2025 IEEE 22nd International Conference on Software Architecture (ICSA). IEEE; 2025. p. 1–12

work page 2025

[40] [40]

LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation

Fuchß D, Hey T, Keim J, Liu H, Ewald N, Thirolf T, et al. LiSSA: Toward Generic Traceability Link Recovery through Retrieval-Augmented Generation. In: Proceedings of the IEEE/ACM 47th International Conference on Software Engineering. ICSE. vol. 25; 2025

work page 2025

[41] [41]

Supporting high-level to low- level requirements coverage reviewing with large language models

Preda AR, Mayr-Dorn C, Mashkoor A, Egyed A. Supporting high-level to low- level requirements coverage reviewing with large language models. In: Proceedings of the 21st International Conference on Mining Software Repositories; 2024. p. 242–253

work page 2024

[42] [42]

On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability

Vogelsang A, Korn A, Broccia G, Ferrari A, Fischbach J, Arora C. On the Impact of Requirements Smells in Prompts: The Case of Automated Traceability. arXiv preprint arXiv:250104810. 2025

work page 2025

[43] [43]

The prompt report: a systematic survey of prompt engineering techniques

Schulhoff S, Ilie M, Balepur N, Kahadze K, Liu A, Si C, et al. The prompt report: a systematic survey of prompt engineering techniques. arXiv preprint arXiv:240606608. 2024

work page 2024

[44] [44]

Prompt programming for large language models: Beyond the few-shot paradigm

Reynolds L, McDonell K. Prompt programming for large language models: Beyond the few-shot paradigm. In: Extended abstracts of the 2021 CHI conference 46 on human factors in computing systems; 2021. p. 1–7

work page 2021

[45] [45]

Pre-train, prompt, and pre- dict: A systematic survey of prompting methods in natural language processing

Liu P, Yuan W, Fu J, Jiang Z, Hayashi H, Neubig G. Pre-train, prompt, and pre- dict: A systematic survey of prompting methods in natural language processing. ACM computing surveys. 2023;55(9):1–35

work page 2023

[46] [46]

A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT

White J, Fu Q, Hays S, Sandborn M, Olea C, Gilbert H, et al. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. In: Proceedings of the 30th Conference on Pattern Languages of Programs. PLoP ’23. The Hillside Group; 2023

work page 2023

[47] [47]

Semantically enhanced software traceability using deep learning techniques

Guo J, Cheng J, Cleland-Huang J. Semantically enhanced software traceability using deep learning techniques. In: Proceedings of the 39th International Confer- ence on Software Engineering, ICSE 2017, Buenos Aires, Argentina, May 20-28, 2017; 2017. p. 3–14

work page 2017

[48] [48]

Semi-supervised pre-processing for learning-based traceability framework on real-world software projects

Dong L, Zhang H, Liu W, Weng Z, Kuang H. Semi-supervised pre-processing for learning-based traceability framework on real-world software projects. In: Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering; 2022. p. 570–582

work page 2022

[49] [49]

Available from: https://platform.openai

OpenAI.: Prompt Engineering Guide. Available from: https://platform.openai. com/docs/guides/prompt-engineering

work page

[50] [50]

Large language models are zero-shot reasoners

Kojima T, Gu SS, Reid M, Matsuo Y, Iwasawa Y. Large language models are zero-shot reasoners. Advances in neural information processing systems. 2022;35:22199–22213

work page 2022

[51] [51]

Chain-of- thought prompting elicits reasoning in large language models

Wei J, Wang X, Schuurmans D, Bosma M, Xia F, Chi E, et al. Chain-of- thought prompting elicits reasoning in large language models. Advances in neural information processing systems. 2022;35:24824–24837

work page 2022

[52] [52]

Revisiting Demon- stration Selection Strategies in In-Context Learning

Peng K, Ding L, Yuan Y, Liu X, Zhang M, Ouyang Y, et al. Revisiting Demon- stration Selection Strategies in In-Context Learning. In: Ku LW, Martins A, Srikumar V, editors. Proceedings of the 62nd Annual Meeting of the Associa- tion for Computational Linguistics (Volume 1: Long Papers). Bangkok, Thailand: Association for Computational Linguistics; 2024. p....

work page 2024

[53] [53]

Which examples to annotate for in-context learning? towards effective and efficient selection

Mavromatis C, Srinivasan B, Shen Z, Zhang J, Rangwala H, Faloutsos C, et al. Which examples to annotate for in-context learning? towards effective and efficient selection. arXiv preprint arXiv:231020046. 2023

work page 2023

[54] [54]

Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensi- tivity

Lu Y, Bartolo M, Moore A, Riedel S, Stenetorp P. Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensi- tivity. In: Muresan S, Nakov P, Villavicencio A, editors. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: 47 Long Papers). Dublin, Ireland: Association for Computa...

work page 2022

[55] [55]

Active Example Selection for In-Context Learning

Zhang Y, Feng S, Tan C. Active Example Selection for In-Context Learning. In: Goldberg Y, Kozareva Z, Zhang Y, editors. Proceedings of the 2022 Confer- ence on Empirical Methods in Natural Language Processing. Abu Dhabi, United Arab Emirates: Association for Computational Linguistics; 2022. p. 9134–9148. Available from: https://aclanthology.org/2022.emnlp...

work page 2022

[56] [56]

Generalizing from a few examples: A survey on few-shot learning

Wang Y, Yao Q, Kwok JT, Ni LM. Generalizing from a few examples: A survey on few-shot learning. ACM computing surveys (csur). 2020;53(3):1–34

work page 2020

[57] [57]

Active learning literature survey

Settles B. Active learning literature survey. University of Wisconsin-Madison Department of Computer Sciences. 2009

work page 2009

[58] [58]

Exploring Imbalanced Annotations for Effective In-Context Learning

Gao H, Zhang F, Zeng H, Meng D, Jing B, Wei H. Exploring Imbalanced Annotations for Effective In-Context Learning. arXiv preprint arXiv:250204037. 2025

work page 2025

[59] [59]

Mitigating Label Biases for In-context Learn- ing

Fei Y, Hou Y, Chen Z, Bosselut A. Mitigating Label Biases for In-context Learn- ing. In: Rogers A, Boyd-Graber J, Okazaki N, editors. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Toronto, Canada: Association for Computational Linguistics; 2023. p. 14014–14031. Available from: https://acla...

work page 2023

[60] [60]

Cross-Domain Requirements Linking via Adversarial-based Domain Adaptation

Chang Z, Li M, Wang Q, Li S, Wang J. Cross-Domain Requirements Linking via Adversarial-based Domain Adaptation. In: 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE; 2023. p. 1596–1608

work page 2023

[61] [61]

A machine learn- ing approach for tracing regulatory codes to product specific requirements

Cleland-Huang J, Czauderna A, Gibiec M, Emenecker J. A machine learn- ing approach for tracing regulatory codes to product specific requirements. In: Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1; 2010. p. 155–164

work page 2010

[62] [62]

Large language models for software engineering: A systematic literature review

Hou X, Zhao Y, Liu Y, Yang Z, Wang K, Li L, et al. Large language models for software engineering: A systematic literature review. ACM Transactions on Software Engineering and Methodology. 2023

work page 2023

[63] [63]

Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

Reimers N, Gurevych I. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. In: Inui K, Jiang J, Ng V, Wan X, editors. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP- IJCNLP). Hong Kong, China: Association for Computational L...

work page 2019

[64] [64]

Guidelines for empirical studies in software engineering involving large language models

Baltes S, Angermeir F, Arora C, Bar´ on MM, Chen C, B¨ ohme L, et al. Guidelines for empirical studies in software engineering involving large language models. 48 arXiv preprint arXiv:250815503. 2025

work page 2025

[65] [65]

Accessed: 2026-01-14

OpenRouter.: OpenRouter: Unified API for Large Language Models. Accessed: 2026-01-14. https://openrouter.ai

work page 2026

[66] [66]

What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing

Yang C, Hong Y, Lewis G, Wu T, K¨ astner C. What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing. In: Proceed- ings of the 39th IEEE/ACM International Conference on Automated Software Engineering; 2024. p. 306–318

work page 2024

[67] [67]

BERT: Pre-training of Deep Bidi- rectional Transformers for Language Understanding

Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of Deep Bidi- rectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics; 2019. p....

work page 2019

[68] [68]

Goal-Centric Traceability for Man- aging Non-Functional Requirements

Cleland-Huang J, Settimi R, BenKhadra O. Goal-Centric Traceability for Man- aging Non-Functional Requirements. Proceedings of the IEEE International Conference on Requirements Engineering. 2007;p. 57–66. https://doi.org/10. 1109/RE.2007.61

work page 2007

[69] [69]

Adams Re-Trace: Traceability Recovery in Software Artifacts

De Lucia A, Oliveto R, Sgueglia P. Adams Re-Trace: Traceability Recovery in Software Artifacts. IEEE Transactions on Software Engineering. 2008;34(5):668–

work page 2008

[70] [70]

https://doi.org/10.1109/TSE.2008.43

work page doi:10.1109/tse.2008.43 2008

[71] [71]

Toward Reference Models for Requirements Traceability

Ramesh B, Jarke M. Toward Reference Models for Requirements Traceability. IEEE Transactions on Software Engineering. 2001;27(1):58–93. https://doi.org/ 10.1109/32.895989

work page doi:10.1109/32.895989 2001

[72] [72]

Model Traceability

Aizenbud-Reshef N, Nolan BT, Rubin J, Shaham-Gafni Y. Model Traceability. IBM Systems Journal. 2006;45(3):515–526. https://doi.org/10.1147/sj.453.0515

work page doi:10.1147/sj.453.0515 2006

[73] [73]

Software and systems traceability

Cleland-Huang J, Gotel O, Zisman A, et al. Software and systems traceability. vol. 2. Springer; 2012. 49

work page 2012