hub

Bissyandé, Yang Liu, and Haoye Tian

· 2025 · arXiv 2506.23749

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3

citation-polarity summary

background 3

representative citing papers

Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems

cs.SE · 2025-11-02 · unverdicted · novelty 7.0

Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.

BLAgent: Agentic RAG for File-Level Bug Localization

cs.SE · 2026-05-18 · unverdicted · novelty 6.0

BLAgent achieves over 78% Top-1 accuracy on SWE-bench Lite for file-level bug localization using agentic RAG, at 18x lower cost than baselines, and boosts end-to-end APR success by over 20%.

On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation

cs.SE · 2026-04-15 · unverdicted · novelty 6.0

Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.

Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis

cs.SE · 2026-04-12 · unverdicted · novelty 6.0

A framework combining universal AST normalization, hybrid graph-LLM embeddings, and strict execution-grounded validation achieves 89-92% intra-language accuracy and 74-80% cross-language F1 while resolving 70% of vulnerabilities at 12% failure rate.

PAFT: Preservation Aware Fine-Tuning for Minimal-Edit Program Repair

cs.SE · 2026-04-03 · unverdicted · novelty 6.0

PAFT improves LLM-based program repair pass rates by up to 65.6% while cutting average edit distance by up to 32.6% through explicit preservation signals and curriculum training.

How Robustly do LLMs Understand Execution Semantics?

cs.SE · 2026-02-24 · unverdicted · novelty 6.0

Frontier LLMs like GPT-5.2 show large accuracy drops on perturbed program-output prediction tasks while open-source reasoning models remain more stable, exposing limits in code semantics understanding.

A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback

cs.SE · 2026-05-18 · unverdicted · novelty 5.0

A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.

Sustainable Code Generation Using Large Language Models: A Systematic Literature Review

cs.SE · 2026-03-01 · unverdicted · novelty 3.0

A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.

Fairness in Multi-Agent Systems for Software Engineering: An SDLC-Oriented Rapid Review

cs.SE · 2026-04-10 · unverdicted · novelty 2.0

A rapid review of fairness in LLM-enabled multi-agent systems for the software development lifecycle concludes that the field lacks standardized evaluations, broad coverage, and effective governance, leaving it unprepared for deployable fair systems.

HELO-APR: Enhancing Low-Resource Program Repair through Cross-Lingual Knowledge Transfer

cs.SE · 2026-04-18

citing papers explorer

Showing 10 of 10 citing papers.

Can Language Models Go Beyond Coding? Assessing the Capability of Language Models to Build Real-World Systems cs.SE · 2025-11-02 · unverdicted · none · ref 61
Build-bench is the first architecture-aware benchmark that evaluates LLMs on repairing cross-ISA build failures via iterative tool-augmented reasoning, with the best model reaching 63.19% success.
BLAgent: Agentic RAG for File-Level Bug Localization cs.SE · 2026-05-18 · unverdicted · none · ref 59
BLAgent achieves over 78% Top-1 accuracy on SWE-bench Lite for file-level bug localization using agentic RAG, at 18x lower cost than baselines, and boosts end-to-end APR success by over 20%.
On the Effectiveness of Context Compression for Repository-Level Tasks: An Empirical Investigation cs.SE · 2026-04-15 · unverdicted · none · ref 39
Continuous latent-vector compression improves BLEU scores on repository-level code tasks by up to 28.3% at 4x compression while cutting inference latency.
Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis cs.SE · 2026-04-12 · unverdicted · none · ref 62
A framework combining universal AST normalization, hybrid graph-LLM embeddings, and strict execution-grounded validation achieves 89-92% intra-language accuracy and 74-80% cross-language F1 while resolving 70% of vulnerabilities at 12% failure rate.
PAFT: Preservation Aware Fine-Tuning for Minimal-Edit Program Repair cs.SE · 2026-04-03 · unverdicted · none · ref 33
PAFT improves LLM-based program repair pass rates by up to 65.6% while cutting average edit distance by up to 32.6% through explicit preservation signals and curriculum training.
How Robustly do LLMs Understand Execution Semantics? cs.SE · 2026-02-24 · unverdicted · none · ref 49
Frontier LLMs like GPT-5.2 show large accuracy drops on perturbed program-output prediction tasks while open-source reasoning models remain more stable, exposing limits in code semantics understanding.
A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback cs.SE · 2026-05-18 · unverdicted · none · ref 18
A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.
Sustainable Code Generation Using Large Language Models: A Systematic Literature Review cs.SE · 2026-03-01 · unverdicted · none · ref 67
A systematic review finds research on the sustainability of LLM-generated code to be limited, fragmented, and without accepted frameworks for measurement or benchmarking.
Fairness in Multi-Agent Systems for Software Engineering: An SDLC-Oriented Rapid Review cs.SE · 2026-04-10 · unverdicted · none · ref 62
A rapid review of fairness in LLM-enabled multi-agent systems for the software development lifecycle concludes that the field lacks standardized evaluations, broad coverage, and effective governance, leaving it unprepared for deployable fair systems.
HELO-APR: Enhancing Low-Resource Program Repair through Cross-Lingual Knowledge Transfer cs.SE · 2026-04-18 · unreviewed · ref 27

Bissyandé, Yang Liu, and Haoye Tian

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer