hub Canonical reference

Large Language Models for Software Engineering: Survey and Open Problems

· 2023 · arXiv 9343.2023

Canonical reference. 100% of citing Pith papers cite this work as background.

12 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 12 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

CASCADE: Detecting Inconsistencies between Code and Documentation with Automatic Test Generation

cs.SE · 2026-04-21 · unverdicted · novelty 7.0

CASCADE finds code-documentation mismatches by running LLM-generated tests from docs and confirming failure only when documentation-derived code succeeds on the same test.

Measuring LLM Trust Allocation Across Conflicting Software Artifacts

cs.SE · 2026-04-03 · unverdicted · novelty 7.0

TRACE reveals that LLMs detect documentation bugs and contradictions better than subtle implementation drift, with asymmetric sensitivity and poor confidence calibration across seven models on 22k traces.

Task Abstention for Large Language Models in Code Generation

cs.SE · 2026-05-16 · unverdicted · novelty 6.0

A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.

Characterizing Datasets for LLM-based Requirements Engineering: A Systematic Mapping Study

cs.SE · 2025-10-21 · unverdicted · novelty 6.0

A systematic mapping study of 45 LLM-based RE papers identifies and characterizes 62 public datasets, revealing imbalances in open-science practices, elicitation support, and socio-technical diversity.

AgentReputation: A Decentralized Agentic AI Reputation Framework

cs.AI · 2026-04-30 · unverdicted · novelty 5.0

AgentReputation proposes separating AI agent task execution, reputation management, and secure record-keeping into distinct layers, with context-specific reputation cards and a risk-based policy engine to handle verification in decentralized settings.

Towards Automated Pentesting with Large Language Models

cs.CR · 2026-04-13 · unverdicted · novelty 5.0

RedShell fine-tunes LLMs on enhanced malicious PowerShell data to produce syntactically valid offensive code for pentesting, reporting over 90% validity, strong semantic match to references, and better edit-distance similarity than prior methods plus functional execution success.

Improving MPI Error Detection and Repair with Large Language Models and Bug References

cs.SE · 2026-04-02 · unverdicted · novelty 5.0

Augmenting LLMs with bug references, few-shot learning, chain-of-thought, and RAG improves MPI error detection accuracy from 44% to 77% and generalizes across models.

An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models

cs.SE · 2026-04-09 · unverdicted · novelty 4.0

Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap

cs.SE · 2025-05-26 · unverdicted · novelty 4.0

A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.

RedShell: A Generative AI-Based Approach to Ethical Hacking

cs.CR · 2026-04-13 · unverdicted · novelty 3.0

RedShell fine-tunes LLMs on a custom dataset of public code samples to generate syntactically valid PowerShell scripts with semantic similarity to references, reporting under 10% parse errors and over 50%/40% mean similarity on Edit Distance and METEOR.

Building an Internal Coding Agent at Zup: Lessons and Open Questions

cs.SE · 2026-04-10 · unverdicted · novelty 3.0

Engineering choices for tools, safety guardrails, and human oversight determine whether an internal coding agent delivers value in practice more than the underlying model quality.

Accelerating Quantum Eigensolver Algorithms With Machine Learning

quant-ph · 2024-09-20 · unverdicted · novelty 3.0

XGBoost models trained on ≤16-qubit data predict eigensolver hyperparameters and reduce error by 0.12% on 28-qubit systems.

citing papers explorer

Showing 12 of 12 citing papers.

CASCADE: Detecting Inconsistencies between Code and Documentation with Automatic Test Generation cs.SE · 2026-04-21 · unverdicted · none · ref 23
CASCADE finds code-documentation mismatches by running LLM-generated tests from docs and confirming failure only when documentation-derived code succeeds on the same test.
Measuring LLM Trust Allocation Across Conflicting Software Artifacts cs.SE · 2026-04-03 · unverdicted · none · ref 8
TRACE reveals that LLMs detect documentation bugs and contradictions better than subtle implementation drift, with asymmetric sensitivity and poor confidence calibration across seven models on 22k traces.
Task Abstention for Large Language Models in Code Generation cs.SE · 2026-05-16 · unverdicted · none · ref 10
A distribution-free abstention rule grounded in multiple hypothesis testing uses execution consistency to let code LLMs avoid hallucination-prone tasks with theoretical guarantees.
Characterizing Datasets for LLM-based Requirements Engineering: A Systematic Mapping Study cs.SE · 2025-10-21 · unverdicted · none · ref 2
A systematic mapping study of 45 LLM-based RE papers identifies and characterizes 62 public datasets, revealing imbalances in open-science practices, elicitation support, and socio-technical diversity.
AgentReputation: A Decentralized Agentic AI Reputation Framework cs.AI · 2026-04-30 · unverdicted · none · ref 7
AgentReputation proposes separating AI agent task execution, reputation management, and secure record-keeping into distinct layers, with context-specific reputation cards and a risk-based policy engine to handle verification in decentralized settings.
Towards Automated Pentesting with Large Language Models cs.CR · 2026-04-13 · unverdicted · none · ref 17
RedShell fine-tunes LLMs on enhanced malicious PowerShell data to produce syntactically valid offensive code for pentesting, reporting over 90% validity, strong semantic match to references, and better edit-distance similarity than prior methods plus functional execution success.
Improving MPI Error Detection and Repair with Large Language Models and Bug References cs.SE · 2026-04-02 · unverdicted · none · ref 15
Augmenting LLMs with bug references, few-shot learning, chain-of-thought, and RAG improves MPI error detection accuracy from 44% to 77% and generalizes across models.
An Empirical Study on Influence-Based Pretraining Data Selection for Code Large Language Models cs.SE · 2026-04-09 · unverdicted · none · ref 10
Data-influence-score filtering using validation-set loss on downstream coding tasks improves Code-LLM performance, with the most beneficial training data varying significantly across different programming tasks.
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap cs.SE · 2025-05-26 · unverdicted · none · ref 52
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.
RedShell: A Generative AI-Based Approach to Ethical Hacking cs.CR · 2026-04-13 · unverdicted · none · ref 9
RedShell fine-tunes LLMs on a custom dataset of public code samples to generate syntactically valid PowerShell scripts with semantic similarity to references, reporting under 10% parse errors and over 50%/40% mean similarity on Edit Distance and METEOR.
Building an Internal Coding Agent at Zup: Lessons and Open Questions cs.SE · 2026-04-10 · unverdicted · none · ref 2
Engineering choices for tools, safety guardrails, and human oversight determine whether an internal coding agent delivers value in practice more than the underlying model quality.
Accelerating Quantum Eigensolver Algorithms With Machine Learning quant-ph · 2024-09-20 · unverdicted · none · ref 24
XGBoost models trained on ≤16-qubit data predict eigensolver hyperparameters and reduce error by 0.12% on 28-qubit systems.

Large Language Models for Software Engineering: Survey and Open Problems

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer