pith. sign in

hub

Ernst, Reid Holmes, and Gordon Fraser

16 Pith papers cite this work. Polarity classification is still indexing.

16 Pith papers citing it

hub tools

citation-role summary

background 4

citation-polarity summary

fields

cs.SE 16

years

2026 14 2025 2

roles

background 4

polarities

background 4

clear filters

representative citing papers

PBT-Bench: Benchmarking AI Agents on Property-Based Testing

cs.SE · 2026-05-13 · unverdicted · novelty 7.0 · 3 refs

PBT-Bench is a new benchmark with 100 property-based testing problems across 40 Python libraries that measures LLM bug recall rates of 42.1-83.4% under guided prompting versus 31.4-76.7% in baseline.

Reproduction Test Generation for Java SWE Issues

cs.SE · 2026-05-05 · unverdicted · novelty 6.0

Introduces the first benchmark for Java reproduction test generation from repository issues and adapts a prior Python tool to produce high performance on it.

citing papers explorer

Showing 14 of 14 citing papers after filters.