Controlled corpus testing shows that fixed allclose oracles in LLM kernel benchmarks certify transcription-buggy kernels as correct while seeded fuzzing with fp64 references does not.
Automating code review activities by large-scale pre- training
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
EvoGraph turns linear AI-assisted programming into a manipulable graph of branching histories, reducing cognitive load and enabling better iteration according to a user study with 20 developers.
LLM-based security code review is vulnerable to framing bias, with a novel iterative refinement attack achieving 100% success in reintroducing vulnerabilities across real projects.
Within-reviewer analysis of 11,429 reviews shows AI code approval rising from 30.1% to 36.8% with experience, with reduced inline comments and increased latency, consistent with habituation.
A multi-agent LLM framework with Behavioral Specification Graphs preserves business logic in legacy modernization, achieving non-zero mean BER on all tested scenarios where baseline LLM approaches scored zero.
SAILOR combines static analysis and LLM-orchestrated synthesis to automatically generate symbolic execution harnesses, discovering 379 previously unknown memory-safety vulnerabilities across 10 large open-source C/C++ projects where the strongest baseline found only 12.
Boundary shape sampling for tensor kernel testing achieves 78% recall on seeded bugs with 0% false positives on correct kernels, while adversarial value sampling reaches 99% recall at the cost of 94% false positives.
Empirical study finds Git references enable over 86% success in mapping NVD records to vulnerability-fixing commits while non-Git references succeed under 14%, yielding an automated pipeline and external mining that together cover only 11.3% of records at 87% precision.
citing papers explorer
No citing papers match the current filters.