LLM-based Java program repair models lose over 50% of their bug-fixing success rate when presented with equivalent but syntactically varied buggy code.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.SE 3years
2026 3roles
background 2polarities
background 2representative citing papers
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.
citing papers explorer
-
HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair
LLM-based Java program repair models lose over 50% of their bug-fixing success rate when presented with equivalent but syntactically varied buggy code.
-
Social Bias in LLM-Generated Code: Benchmark and Mitigation
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
-
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation
A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.