First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
Taxonomy of real faults in deep learning systems,
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
DDOR is a delta-debugging framework that localizes minimal refusal-triggering fragments for explainable overrefusal testing and targeted prompt repair in black-box LLMs.
A Lean library called Palamedes uses synthesis rules from generator semantics and catamorphism-anamorphism rewriting to automatically produce correct constrained random generators.
CodeCureAgent achieves 96.8% plausible fixes and 86.3% correct fixes for 1,000 SonarQube warnings across 106 Java projects using an agentic LLM framework.
Proposes data-aware static analysis combining data/control flow and API contracts to detect semantic faults in ML code early, shown on sample real-world notebooks.
dille detects silent semantic faults in random forest ML pipelines with 91% precision via data-informed static analysis on Kaggle notebooks, finding 12-18% of scripts affected.
SelfHeal uses two ReAct agents and empirical fix patterns to repair bugs in LLM agents, outperforming baselines on a new 37-instance benchmark.
citing papers explorer
-
Demystifying the Silence of Correctness Bugs in PyTorch Compiler
First empirical study of correctness bugs in torch.compile characterizes their patterns and proposes AlignGuard, which found 23 confirmed new bugs via LLM-guided test mutation.
-
CodeCureAgent: Automatic Classification and Repair of Static Analysis Warnings
CodeCureAgent achieves 96.8% plausible fixes and 86.3% correct fixes for 1,000 SonarQube warnings across 106 Java projects using an agentic LLM framework.