Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages =

Xie, Mulong, Feng, Sidong, Xing, Zhenchang, Chen, Jieshan, Chen, Chunyang , title = · 2020 · arXiv 8089.341794

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

method 1

citation-polarity summary

use method 1

representative citing papers

PBT-Bench: Benchmarking AI Agents on Property-Based Testing

cs.SE · 2026-05-13 · unverdicted · novelty 7.0 · 3 refs

PBT-Bench is a new benchmark with 100 property-based testing problems across 40 Python libraries that measures LLM bug recall rates of 42.1-83.4% under guided prompting versus 31.4-76.7% in baseline.

UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization

cs.CV · 2026-06-30 · unverdicted · novelty 6.0

UniCoder applies symbolic attribute alignment via an auxiliary LLM and reference-guided optimization in RL to achieve SOTA visual-to-code generation on ChartMimic, UniSVG, Design2Code, and ScreenBench.

CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents

cs.CR · 2026-06-10 · unverdicted · novelty 6.0

CAPED reduces incidental visual privacy leakage in mobile GUI agents from 0.766 to 0.268 on seeded AndroidWorld tasks by selectively exposing only task-relevant screen content.

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.

All Green, Still Broken: Real-Flow Verification Lessons from an LLM-Integrated, Multi-Market Web Application

cs.SE · 2026-06-21 · unverdicted · novelty 5.0

Analysis of 252 bug fixes in an LLM-powered multi-market web app found 44% escaped through four seams invisible to component unit tests, motivating a four-seam verification framework.

LLM vs. Human Unit Tests: Fault Detection on Real Python Bugs

cs.SE · 2026-06-07 · unverdicted · novelty 5.0

LLM-generated unit tests with retrieval-augmented context detect faults in 69% of real Python bugs versus 17.2% for general-purpose human-written tests, with similar coverage levels.

MultiMend: Multilingual Program Repair with Context Augmentation and Multi-Hunk Patch Generation

cs.SE · 2025-01-27 · unverdicted · novelty 4.0

MultiMend augments buggy function context via retrieval and generates multi-hunk patches, fixing 2,227 of 5,501 bugs across six benchmarks in four languages.

citing papers explorer

Showing 6 of 6 citing papers after filters.

PBT-Bench: Benchmarking AI Agents on Property-Based Testing cs.SE · 2026-05-13 · unverdicted · none · ref 16 · 3 links
PBT-Bench is a new benchmark with 100 property-based testing problems across 40 Python libraries that measures LLM bug recall rates of 42.1-83.4% under guided prompting versus 31.4-76.7% in baseline.
UniCoder: Unified Visual-to-Code Generation via Symbolic Rewards and Reference-Guided Code Optimization cs.CV · 2026-06-30 · unverdicted · none · ref 84
UniCoder applies symbolic attribute alignment via an auxiliary LLM and reference-guided optimization in RL to achieve SOTA visual-to-code generation on ChartMimic, UniSVG, Design2Code, and ScreenBench.
CAPED: Context-Aware Privacy Exposure Defense for Mobile GUI Agents cs.CR · 2026-06-10 · unverdicted · none · ref 31
CAPED reduces incidental visual privacy leakage in mobile GUI agents from 0.766 to 0.268 on seeded AndroidWorld tasks by selectively exposing only task-relevant screen content.
Similar Pattern Annotation via Retrieval Knowledge for LLM-Based Test Code Fault Localization cs.SE · 2026-05-08 · unverdicted · none · ref 82
SPARK improves LLM-based test code fault localization by retrieving similar past faults and selectively annotating suspicious lines in new failing tests.
All Green, Still Broken: Real-Flow Verification Lessons from an LLM-Integrated, Multi-Market Web Application cs.SE · 2026-06-21 · unverdicted · none · ref 11
Analysis of 252 bug fixes in an LLM-powered multi-market web app found 44% escaped through four seams invisible to component unit tests, motivating a four-seam verification framework.
LLM vs. Human Unit Tests: Fault Detection on Real Python Bugs cs.SE · 2026-06-07 · unverdicted · none · ref 14
LLM-generated unit tests with retrieval-augmented context detect faults in 69% of real Python bugs versus 17.2% for general-purpose human-written tests, with similar coverage levels.

Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering , pages =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer