LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

URL https://arxiv · 2025 · arXiv 2510.14943

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Sparse Reward Subsystem in Large Language Models

cs.CL · 2026-02-01 · unverdicted · novelty 6.0

LLM hidden states contain a sparse reward subsystem consisting of value neurons that predict state value and dopamine neurons that encode step-level temporal difference errors.

OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning

cs.AI · 2026-04-20 · unverdicted · novelty 5.0

OGER adds an auxiliary exploration reward built from offline trajectories and model entropy to hybrid RL training, yielding gains on math reasoning benchmarks and out-of-domain generalization.

citing papers explorer

Showing 2 of 2 citing papers.

Sparse Reward Subsystem in Large Language Models cs.CL · 2026-02-01 · unverdicted · none · ref 23
LLM hidden states contain a sparse reward subsystem consisting of value neurons that predict state value and dopamine neurons that encode step-level temporal difference errors.
OGER: A Robust Offline-Guided Exploration Reward for Hybrid Reinforcement Learning cs.AI · 2026-04-20 · unverdicted · none · ref 32
OGER adds an auxiliary exploration reward built from offline trajectories and model entropy to hybrid RL training, yielding gains on math reasoning benchmarks and out-of-domain generalization.

LaSeR: Reinforcement Learning with Last-Token Self-Rewarding

fields

years

verdicts

representative citing papers

citing papers explorer