← back to paper
arxiv: 2605.15229 · 2 revisions
PBT-Bench: Benchmarking AI Agents on Property-Based Testing