CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
The art of efficient reasoning: Data, reward, and optimization
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
HMPO is a single-stage RL framework for CoT compression that reports 19-46% token reduction with negligible accuracy loss on models from 9B to 122B parameters across math, code, science, and instruction tasks.
citing papers explorer
-
CLORE: Content-Level Optimization for Reasoning Efficiency
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
-
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
-
HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression
HMPO is a single-stage RL framework for CoT compression that reports 19-46% token reduction with negligible accuracy loss on models from 9B to 122B parameters across math, code, science, and instruction tasks.