The art of efficient reasoning: Data, reward, and optimization

The art of efficient reasoning: Data, reward, optimization · 2025 · arXiv 2602.20945

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

CLORE: Content-Level Optimization for Reasoning Efficiency

cs.AI · 2026-05-21 · unverdicted · novelty 6.0

CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.

HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.

HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression

cs.LG · 2026-06-01 · unverdicted · novelty 5.0

HMPO is a single-stage RL framework for CoT compression that reports 19-46% token reduction with negligible accuracy loss on models from 9B to 122B parameters across math, code, science, and instruction tasks.

citing papers explorer

Showing 3 of 3 citing papers.

CLORE: Content-Level Optimization for Reasoning Efficiency cs.AI · 2026-05-21 · unverdicted · none · ref 49
CLORE augments correct on-policy rollouts by deleting repetitive and irrelevant segments then optimizes with auxiliary DPO to improve accuracy-efficiency trade-off on math benchmarks.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering cs.AI · 2026-04-22 · unverdicted · none · ref 54
HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
HMPO: Hybrid Median-length Policy Optimization for Chain-of-Thought Compression cs.LG · 2026-06-01 · unverdicted · none · ref 5
HMPO is a single-stage RL framework for CoT compression that reports 19-46% token reduction with negligible accuracy loss on models from 9B to 122B parameters across math, code, science, and instruction tasks.

The art of efficient reasoning: Data, reward, and optimization

fields

years

verdicts

representative citing papers

citing papers explorer