A systematic review of 50 studies identifies 69 LLM-assisted tasks in empirical software engineering, concentrated in data processing and analysis with gaps in human-centered integration and reproducibility reporting.
Large language models (LLMs) for requirements engineering (RE): A systematic literature review,
11 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
R2Code improves requirement-to-code traceability with a bidirectional alignment network, self-reflective consistency verification, and dynamic context-adaptive retrieval, yielding 7.4% average F1 gain and up to 41.7% lower token use on five datasets.
BT-APE automates prompt engineering for requirements classification using backtracking search and dynamic examples, matching PE2 accuracy while using 72% fewer tokens and 66% less time than that baseline.
A systematic mapping study of 45 LLM-based RE papers identifies and characterizes 62 public datasets, revealing imbalances in open-science practices, elicitation support, and socio-technical diversity.
A clustering-based pipeline generates individual and integration-level test specifications from thousands of automotive requirements by grouping embeddings, summarizing clusters, and applying LLM calls with bounded context and standards grounding.
LLMs can detect usability content in user reviews with F-scores comparable to humans, though performance depends strongly on prompt design.
LLM pipeline with generation-critic feedback reaches 61% accuracy on low-level goal extraction from requirements documents and outperforms standalone few-shot prompting, yet remains best suited as an accelerator for manual work.
An agentic LLM pipeline extracts and translates unstructured requirements into syntactically and semantically aligned formal properties, achieving 77.8% accuracy across three scenarios.
Design-OS is a specification-driven five-stage framework for engineering system design that maintains traceability from intent to implementation and supports human-AI collaboration, demonstrated on rotary inverted pendulum control cases.
LoRA fine-tuning enables open-source LLMs such as Ministral-8B to generate requirement-based test cases at a level comparable to pre-tuned proprietary GPT-4.1 models.
ProReFiCIA uses LLMs with tailored prompts to identify impacted requirements, achieving 85.7% recall on unseen industrial data while requiring review of only 3% of requirements, rising to 95.7% recall with RAG at 3.6% review cost.
citing papers explorer
No citing papers match the current filters.