T2J-Bench shows top coding agents achieve only 26.7-28.9% pass rate on codebase conversion under a three-stage observational equivalence check, with agents overestimating success by 66.6-97.8 points.
URL https://aclanthology.org/2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Converted, Not Equivalent: Benchmarking Codebase Conversion via Observational Equivalence
T2J-Bench shows top coding agents achieve only 26.7-28.9% pass rate on codebase conversion under a three-stage observational equivalence check, with agents overestimating success by 66.6-97.8 points.