LRS trains a latent reward model on final-answer correctness to steer SAE states during inference, improving reasoning performance and implicitly encouraging better cognitive behaviors.
Latent thinking optimiza- tion: Your latent reasoning language model secretly en- codes reward signals in its latent thoughts.arXiv preprint arXiv:2509.26314,
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
LLM hidden states contain a sparse reward subsystem consisting of value neurons that predict state value and dopamine neurons that encode step-level temporal difference errors.
STARS trains looped language models with Jacobian spectral radius regularization and random loop sampling to drive latent states toward asymptotically stable fixed points, yielding reliable test-time scaling on arithmetic and mathematical reasoning tasks.
citing papers explorer
-
Latent Reward Steering: An Adaptive Inference-Time Framework that Implicitly Promotes Cognitive Behaviors in Reasoning LLMs
LRS trains a latent reward model on final-answer correctness to steer SAE states during inference, improving reasoning performance and implicitly encouraging better cognitive behaviors.
-
Sparse Reward Subsystem in Large Language Models
LLM hidden states contain a sparse reward subsystem consisting of value neurons that predict state value and dopamine neurons that encode step-level temporal difference errors.
-
Stabilizing Recurrent Dynamics for Test-Time Scalable Latent Reasoning in Looped Language Models
STARS trains looped language models with Jacobian spectral radius regularization and random loop sampling to drive latent states toward asymptotically stable fixed points, yielding reliable test-time scaling on arithmetic and mathematical reasoning tasks.