pith. sign in

super hub Canonical reference

In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023, Melbourne, Australia, May 14-20

Canonical reference. 76% of citing Pith papers cite this work as background.

115 Pith papers citing it
Background 76% of classified citations

hub tools

citation-role summary

background 32 baseline 2 method 2 extension 1 other 1

citation-polarity summary

authors

co-cited works

clear filters

representative citing papers

RepairAgent: An Autonomous, LLM-Based Agent for Program Repair

cs.SE · 2024-03-25 · conditional · novelty 8.0 · 2 refs

RepairAgent autonomously repairs 164 bugs on Defects4J including 39 not fixed by prior techniques by treating an LLM as an agent that invokes tools via a finite state machine and dynamic prompts.

Knowledge Over Parameters: Evolving Smart Contract Vulnerability Detection

cs.CR · 2026-07-02 · unverdicted · novelty 7.0

EvoVuln evolves executable detection policies for five smart-contract vulnerability types using cold-start synthetic testing followed by few-shot refinement on five vulnerable and five safe contracts, reaching 71% macro F1 and enabling a small model to beat a large zero-shot model by 19 points at un

Quantum Mutant Equivalence via Transpilation

cs.SE · 2026-06-25 · unverdicted · novelty 7.0

TBE identifies 32.1% of 92,011 equivalent surviving quantum mutants (29,536) via OpenQASM comparison after transpilation, reporting 100% precision and 82% accuracy on 348,299 mutants.

The Correctness Illusion in LLM-Generated GPU Kernels

cs.SE · 2026-06-18 · accept · novelty 7.0

Controlled corpus testing shows that fixed allclose oracles in LLM kernel benchmarks certify transcription-buggy kernels as correct while seeded fuzzing with fp64 references does not.

Code Generation by Differential Test Time Scaling

cs.SE · 2026-05-19 · unverdicted · novelty 7.0

DiffCodeGen clusters code candidates by behavioral similarity from fuzzing-synthesized inputs and selects the largest cluster's medoid, matching or exceeding prior test-time scaling methods with far less token and time cost.

PBT-Bench: Benchmarking AI Agents on Property-Based Testing

cs.SE · 2026-05-13 · unverdicted · novelty 7.0 · 6 refs

PBT-Bench is a new benchmark with 100 property-based testing problems across 40 Python libraries that measures LLM bug recall rates of 42.1-83.4% under guided prompting versus 31.4-76.7% in baseline.

SmellBench: Evaluating LLM Agents on Architectural Code Smell Repair

cs.SE · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

SmellBench is the first benchmark showing LLM agents resolve 47.7% of architectural code smells while accurately spotting false positives, but aggressive repairs often introduce new smells and degrade overall quality.

citing papers explorer

Showing 1 of 1 citing paper after filters.