LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.
Language models are hid- 41 den reasoners: Unlocking latent reasoning capabilities via self-rewarding
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 2representative citing papers
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.
citing papers explorer
-
LASAR: Latent Adaptive Semantic Aligned Reasoning for Generative Recommendation
LASAR uses two-stage supervised training plus reinforcement learning to ground semantic IDs, align latent reasoning trajectories to CoT hidden states via KL divergence, and adaptively choose reasoning depth, halving average steps while improving quality on three datasets.
-
Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models
The paper unifies perspectives on Long CoT in reasoning LLMs by introducing a taxonomy, detailing characteristics of deep reasoning and reflection, and discussing emergence phenomena and future directions.