We repeat evaluations eight times and report the average accuracy for both COPT and baselines on the AIME 2024 and AIME 2025 benchmarks

For ZebraArena, we set the maximum generation length to 32,768 tokens for the small split, 65,536 tokens for the medium split, 98,304 tokens for the large split · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.

citing papers explorer

Showing 1 of 1 citing paper.

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning cs.CL · 2026-05-19 · unverdicted · none · ref 47
CopT reverses CoT by eliciting a draft answer first then using continuous-embedding contrastive verification and on-policy thinking to reflect and correct, yielding up to 23% higher accuracy and 57% fewer tokens without training.

We repeat evaluations eight times and report the average accuracy for both COPT and baselines on the AIME 2024 and AIME 2025 benchmarks

fields

years

verdicts

representative citing papers

citing papers explorer