hub Canonical reference

Large language model for vulnerability detection: Emerging results and future directions

· 2024 · arXiv 9476.363976

Canonical reference. 100% of citing Pith papers cite this work as background.

15 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 15 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5

citation-polarity summary

background 5

representative citing papers

An Empirical Study of Security Calibration in Large Language Models for Code

cs.SE · 2026-06-30 · unverdicted · novelty 7.0

Empirical evaluation of three LLMs finds prevalent overconfidence in insecure code generation, with security calibration outperforming functional calibration but both degrading in repository-level settings.

What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants

cs.SE · 2026-05-29 · unverdicted · novelty 7.0

An empirical study of 547 confirmed safety incidents from GitHub and literature derives a 33-type taxonomy showing constraint violations, destructive actions, and deception dominate in everyday coding-agent use.

Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review

cs.SE · 2026-03-19 · accept · novelty 7.0

LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.

A Methodological Analysis of Empirical Studies in Quantum Software Testing

quant-ph · 2026-01-13 · accept · novelty 7.0

A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.

Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries

cs.SE · 2025-09-26 · unverdicted · novelty 7.0

A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

cs.SE · 2024-12-15 · conditional · novelty 7.0

ML4AVD research remains locked into binary function-level classification of C/C++ vulnerabilities because twelve pain points in the pipeline reinforce each other through feedback loops.

Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs

cs.SE · 2026-04-09 · unverdicted · novelty 6.0

Adding interprocedural context from callers or callees enables LLMs to detect vulnerabilities more effectively, with Gemini 3 Flash achieving F1 scores of at least 0.978 for C at low cost and Claude Haiku 4.5 excelling at explanations.

Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition

cs.SE · 2025-11-04 · unverdicted · novelty 6.0

LLM approaches ExArch and ArTEMiS reach F1 scores of 0.86 and 0.81 for architecture entity recognition and traceability, matching or approaching baselines that require manual models.

A Study of LLMs' Preferences for Libraries and Programming Languages

cs.SE · 2025-03-21 · unverdicted · novelty 6.0

Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.

Evaluating LLMs for Real-World Web Vulnerability Detection

cs.CR · 2026-06-19 · unverdicted · novelty 5.0

Frontier LLMs detect up to 63% of web vulnerabilities in WordPress plugins with scoped prompts outperforming open-ended ones, but all show low consistency across runs and miss some baseline issues.

"Like Taking the Path of Least Resistance": Exploring the Impact of LLM Interaction on the Creative Process of Programming

cs.HC · 2026-05-13 · conditional · novelty 5.0

LLM assistance shortens idea-generation periods and reduces creative moments during programming tasks while yielding solutions with comparable idea counts and greater functional correctness.

Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models

cs.SE · 2023-10-17 · unverdicted · novelty 5.0

bLLMs achieve state-of-the-art results on limited and imbalanced SE sentiment datasets even in zero-shot settings, but fine-tuned sLLMs outperform when ample balanced training data is available.

Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap

cs.SE · 2025-05-26 · unverdicted · novelty 4.0 · 2 refs

A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models

cs.SE · 2026-06-03 · unverdicted · novelty 3.0

Reproducibility study of Vul-RAG confirms original findings in a fully local open-weights setting but identifies a persistent performance plateau at approximately 0.30 pairwise accuracy across diverse recent open-weight LLMs.

Nix: A Solution With Problems

cs.SE · 2026-04-13 · unverdicted · novelty 2.0

A literature review of Nix's functional package management solutions to software deployment problems alongside the new and unsolved issues it introduces.

citing papers explorer

Showing 15 of 15 citing papers.

An Empirical Study of Security Calibration in Large Language Models for Code cs.SE · 2026-06-30 · unverdicted · none · ref 45
Empirical evaluation of three LLMs finds prevalent overconfidence in insecure code generation, with security calibration outperforming functional calibration but both degrading in repository-level settings.
What Breaks When LLMs Code? Characterizing Operational Safety Failures of Agentic Code Assistants cs.SE · 2026-05-29 · unverdicted · none · ref 75
An empirical study of 547 confirmed safety incidents from GitHub and literature derives a 33-type taxonomy showing constraint violations, destructive actions, and deception dominate in everyday coding-agent use.
Measuring and Exploiting Contextual Bias in LLM-Assisted Security Code Review cs.SE · 2026-03-19 · accept · none · ref 90
LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
A Methodological Analysis of Empirical Studies in Quantum Software Testing quant-ph · 2026-01-13 · accept · none · ref 103
A systematic analysis of 59 quantum software testing empirical studies reveals highly diverse designs, inconsistent reporting, and open methodological challenges, leading to recommendations for future work.
Library Hallucinations in LLM-Generated Code: A Risk Analysis Grounded in Developer Queries cs.SE · 2025-09-26 · unverdicted · none · ref 51
A study of seven LLMs finds that realistic prompt variations such as one-character misspellings trigger library hallucinations in up to 26% of cases, fabricated names in up to 99%, and time-based prompts in up to 85%, and introduces LibHalluBench for evaluation.
Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points cs.SE · 2024-12-15 · conditional · none · ref 9
ML4AVD research remains locked into binary function-level classification of C/C++ vulnerabilities because twelve pain points in the pipeline reinforce each other through feedback loops.
Vulnerability Detection with Interprocedural Context in Multiple Languages: Assessing Effectiveness and Cost of Modern LLMs cs.SE · 2026-04-09 · unverdicted · none · ref 27
Adding interprocedural context from callers or callees enables LLMs to detect vulnerabilities more effectively, with Gemini 3 Flash achieving F1 scores of at least 0.978 for C at low cost and Claude Haiku 4.5 excelling at explanations.
Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition cs.SE · 2025-11-04 · unverdicted · none · ref 54
LLM approaches ExArch and ArTEMiS reach F1 scores of 0.86 and 0.81 for architecture entity recognition and traceability, matching or approaching baselines that require manual models.
A Study of LLMs' Preferences for Libraries and Programming Languages cs.SE · 2025-03-21 · unverdicted · none · ref 66
Empirical study of eight LLMs finds overuse of popular libraries like NumPy in up to 45% of unnecessary cases and strong default preference for Python even when suboptimal.
Evaluating LLMs for Real-World Web Vulnerability Detection cs.CR · 2026-06-19 · unverdicted · none · ref 33
Frontier LLMs detect up to 63% of web vulnerabilities in WordPress plugins with scoped prompts outperforming open-ended ones, but all show low consistency across runs and miss some baseline issues.
"Like Taking the Path of Least Resistance": Exploring the Impact of LLM Interaction on the Creative Process of Programming cs.HC · 2026-05-13 · conditional · none · ref 4
LLM assistance shortens idea-generation periods and reduces creative moments during programming tasks while yielding solutions with comparable idea counts and greater functional correctness.
Revisiting Sentiment Analysis for Software Engineering in the Era of Large Language Models cs.SE · 2023-10-17 · unverdicted · none · ref 77
bLLMs achieve state-of-the-art results on limited and imbalanced SE sentiment datasets even in zero-shot settings, but fine-tuned sLLMs outperform when ample balanced training data is available.
Search-Based Software Engineering and AI Foundation Models: Current Landscape and Future Roadmap cs.SE · 2025-05-26 · unverdicted · none · ref 95 · 2 links
A research roadmap analyzing the current state of search-based software engineering with foundation models, outlining challenges and directions across three integration aspects.
Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models cs.SE · 2026-06-03 · unverdicted · none · ref 17
Reproducibility study of Vul-RAG confirms original findings in a fully local open-weights setting but identifies a persistent performance plateau at approximately 0.30 pairwise accuracy across diverse recent open-weight LLMs.
Nix: A Solution With Problems cs.SE · 2026-04-13 · unverdicted · none · ref 61
A literature review of Nix's functional package management solutions to software deployment problems alongside the new and unsolved issues it introduces.

Large language model for vulnerability detection: Emerging results and future directions

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer