Unlocking efficient long-to-short llm reasoning with model merging.arXiv preprint arXiv:2503.20641

Han Wu, Yuxuan Yao, Shuqi Liu, Zehua Liu, Xiaojin Fu, Xiongwei Han, Xing Li, Hui-Ling Zhen, Tao Zhong, Mingxuan Yuan · 2025 · arXiv 2503.20641

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

method 1

citation-polarity summary

background 1

representative citing papers

Scalable Token-Level Hallucination Detection in Large Language Models

cs.CL · 2026-05-12 · unverdicted · novelty 6.0

TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reasoning models.

Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs

cs.SE · 2026-05-07 · unverdicted · novelty 6.0

ASTOR improves a single code LLM across four tasks by 9.0-9.5% over the best specialist and 7.5-12.8% over prior multi-task RL baselines via utility-driven data scheduling and adaptive KL regularization.

Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression

cs.CL · 2026-04-05 · unverdicted · novelty 6.0

CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller trustworthiness loss.

citing papers explorer

Showing 3 of 3 citing papers.

Scalable Token-Level Hallucination Detection in Large Language Models cs.CL · 2026-05-12 · unverdicted · none · ref 37
TokenHD uses a scalable data synthesis engine and importance-weighted training to create token-level hallucination detectors that work on free-form text and scale from 0.6B to 8B parameters, outperforming larger reasoning models.
Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs cs.SE · 2026-05-07 · unverdicted · none · ref 34
ASTOR improves a single code LLM across four tasks by 9.0-9.5% over the best specialist and 7.5-12.8% over prior multi-task RL baselines via utility-driven data scheduling and adaptive KL regularization.
Shorter, but Still Trustworthy? An Empirical Study of Chain-of-Thought Compression cs.CL · 2026-04-05 · unverdicted · none · ref 15
CoT compression frequently introduces trustworthiness regressions with method-specific degradation profiles; a proposed normalized efficiency score and alignment-aware DPO variant reduce length by 19.3% with smaller trustworthiness loss.

Unlocking efficient long-to-short llm reasoning with model merging.arXiv preprint arXiv:2503.20641

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer