Toward agents that reason about their computation.arXiv preprint arXiv:2510.22833

Adrian Orenstein, Jessica Chen, Gwyneth Anne Delos Santos, Bayley Sapara, Michael Bowling · 2025 · arXiv 2510.22833

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Do Not Imitate, Reinforce: Iterative Classification via Belief Refinement

cs.LG · 2026-04-23 · unverdicted · novelty 6.0

RIC replaces single-pass label imitation with RL-driven iterative belief refinement, recovering cross-entropy optima while enabling adaptive halting via a value function.

Revisiting Adam for Streaming Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.

citing papers explorer

Showing 2 of 2 citing papers.

Do Not Imitate, Reinforce: Iterative Classification via Belief Refinement cs.LG · 2026-04-23 · unverdicted · none · ref 11
RIC replaces single-pass label imitation with RL-driven iterative belief refinement, recovering cross-entropy optima while enabling adaptive halting via a value function.
Revisiting Adam for Streaming Reinforcement Learning cs.LG · 2026-05-07 · unverdicted · none · ref 25
C51 matches StreamQ in streaming RL on 55 Atari games while a new Adaptive Q(λ) algorithm based on bounded derivatives and variance-adjusted updates reaches nearly double the human baseline.

Toward agents that reason about their computation.arXiv preprint arXiv:2510.22833

fields

years

verdicts

representative citing papers

citing papers explorer