CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
The art of efficient reasoning: Data, reward, and optimization
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
HMPO is a single-stage RL framework for CoT compression that reports 19-46% token reduction with negligible accuracy loss on models from 9B to 122B parameters across math, code, science, and instruction tasks.
citing papers explorer
-
HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression
HMPO is a single-stage RL framework for CoT compression that reports 19-46% token reduction with negligible accuracy loss on models from 9B to 122B parameters across math, code, science, and instruction tasks.