Can large language models write good property-based tests?

URL https://arxiv · 2023 · arXiv 2307.04346

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

PBT-Bench: Benchmarking AI Agents on Property-Based Testing

cs.SE · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

PBT-Bench is a new benchmark of 100 property-based testing problems with 365 injected semantic bugs across 40 Python libraries that measures LLMs on deriving invariants and precise input-generation strategies.

From Exploration to Specification: LLM-Based Property Generation for Mobile App Testing

cs.SE · 2026-04-15 · unverdicted · novelty 7.0

PropGen automates property generation for Android app testing via LLM synthesis from guided exploration and feedback refinement, yielding 912 valid properties and 25 previously unknown bugs across 12 apps.

VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

cs.SE · 2026-05-08 · unverdicted · novelty 6.0

VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.

Generalizing Test Cases for Comprehensive Test Scenario Coverage

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

TestGeneralizer generalizes an initial test into a set of executable tests covering more diverse scenarios, delivering +31.66% mutation-based and +23.08% LLM-assessed scenario coverage gains over ChatTester on 12 open-source Java projects.

Enhancing Program Repair with Specification Guidance and Intermediate Behavioral Signals

cs.SE · 2026-04-13 · unverdicted · novelty 6.0

SpecTune improves LLM-based automated program repair by deriving localized postconditions at execution checkpoints and using alpha and beta signals to produce precise fault-localization and patch-generation guidance.

Decision-Oriented Programming with Aporia

cs.HC · 2026-04-06 · conditional · novelty 6.0

Aporia makes design decisions explicit and interactive in AI-assisted programming, leading to higher engagement and 5x fewer mental model disagreements with code in a 14-person user study compared to a baseline agent.

Effective LLM Code Refinement via Property-Oriented and Structurally Minimal Feedback

cs.SE · 2025-06-23 · unverdicted · novelty 6.0

PGS generates property-oriented, structurally minimal feedback from high-level program properties to refine LLM code, yielding up to 13.4% pass@1 gains and 1.4-1.6x higher bug-fix rates than prior TDD and debugging baselines.

citing papers explorer

Showing 7 of 7 citing papers.

PBT-Bench: Benchmarking AI Agents on Property-Based Testing cs.SE · 2026-05-13 · unverdicted · none · ref 15 · 2 links
PBT-Bench is a new benchmark of 100 property-based testing problems with 365 injected semantic bugs across 40 Python libraries that measures LLMs on deriving invariants and precise input-generation strategies.
From Exploration to Specification: LLM-Based Property Generation for Mobile App Testing cs.SE · 2026-04-15 · unverdicted · none · ref 60
PropGen automates property generation for Android app testing via LLM synthesis from guided exploration and feedback refinement, yielding 912 valid properties and 25 previously unknown bugs across 12 apps.
VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation cs.SE · 2026-05-08 · unverdicted · none · ref 45
VeriContest supplies 946 problems with specs, code, proofs, and tests to benchmark verifiable code generation in Rust/Verus, showing models reach 92% on code but only 5% end-to-end on full verifiable synthesis.
Generalizing Test Cases for Comprehensive Test Scenario Coverage cs.SE · 2026-04-23 · unverdicted · none · ref 40
TestGeneralizer generalizes an initial test into a set of executable tests covering more diverse scenarios, delivering +31.66% mutation-based and +23.08% LLM-assessed scenario coverage gains over ChatTester on 12 open-source Java projects.
Enhancing Program Repair with Specification Guidance and Intermediate Behavioral Signals cs.SE · 2026-04-13 · unverdicted · none · ref 43
SpecTune improves LLM-based automated program repair by deriving localized postconditions at execution checkpoints and using alpha and beta signals to produce precise fault-localization and patch-generation guidance.
Decision-Oriented Programming with Aporia cs.HC · 2026-04-06 · conditional · none · ref 55
Aporia makes design decisions explicit and interactive in AI-assisted programming, leading to higher engagement and 5x fewer mental model disagreements with code in a 14-person user study compared to a baseline agent.
Effective LLM Code Refinement via Property-Oriented and Structurally Minimal Feedback cs.SE · 2025-06-23 · unverdicted · none · ref 18
PGS generates property-oriented, structurally minimal feedback from high-level program properties to refine LLM code, yielding up to 13.4% pass@1 gains and 1.4-1.6x higher bug-fix rates than prior TDD and debugging baselines.

Can large language models write good property-based tests?

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer