arXiv preprint arXiv:2510.07189 , year=

· 2025 · arXiv 2510.07189

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair

cs.SE · 2026-05-04 · accept · novelty 7.0 · 2 refs

LLM-based Java program repair models lose over 50% of their bug-fixing success rate when presented with equivalent but syntactically varied buggy code.

Social Bias in LLM-Generated Code: Benchmark and Mitigation

cs.SE · 2026-05-01 · unverdicted · novelty 7.0

LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.

Enhancing Reliability in LLM-Based Secure Code Generation

cs.CR · 2026-05-22 · conditional · novelty 6.0

MA-CoT prompting reduces security findings in LLM-generated code by 57.6% on a 200-task dataset and 94.5% on LLMSecEval across C, Java, and Python, outperforming vanilla, zero-shot, and standard CoT strategies.

Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation

cs.SE · 2026-05-04 · unverdicted · novelty 5.0 · 2 refs

A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.

citing papers explorer

Showing 4 of 4 citing papers.

HEJ-Robust: A Robustness Benchmark for LLM-Based Automated Program Repair cs.SE · 2026-05-04 · accept · none · ref 19 · 2 links
LLM-based Java program repair models lose over 50% of their bug-fixing success rate when presented with equivalent but syntactically varied buggy code.
Social Bias in LLM-Generated Code: Benchmark and Mitigation cs.SE · 2026-05-01 · unverdicted · none · ref 170
LLMs show up to 60.58% social bias in generated code; a new Fairness Monitor Agent cuts bias by 65.1% and raises functional correctness from 75.80% to 83.97%.
Enhancing Reliability in LLM-Based Secure Code Generation cs.CR · 2026-05-22 · conditional · none · ref 32
MA-CoT prompting reduces security findings in LLM-generated code by 57.6% on a 200-task dataset and 94.5% on LLMSecEval across C, Java, and Python, outperforming vanilla, zero-shot, and standard CoT strategies.
Beyond Translation Accuracy: Addressing False Failures in LLM-Based Code Translation cs.SE · 2026-05-04 · unverdicted · none · ref 13 · 2 links
A large-scale study finds that many LLM code translation failures are false negatives due to improper evaluation configurations rather than incorrect translations.

arXiv preprint arXiv:2510.07189 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer