1000 layer networks for self-supervised rl: Scaling depth can enable new goal-reaching capabilities

Kevin Wang, Ishaan Javali, Michał Bortkiewicz, Tomasz Trzci´nski, Benjamin Eysenbach · 2026 · arXiv 2503.14858

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

GIANTS: Generative Insight Anticipation from Scientific Literature

cs.CL · 2026-04-10 · unverdicted · novelty 8.0

GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.

Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots

cs.RO · 2026-04-28 · unverdicted · novelty 7.0

Reference-augmented learning with RNN surrogate and stochastic perturbations cuts average position error by 50.9% for 6-DOF tracking on a three-section TDCR compared to non-augmented baselines.

Abstraction for Offline Goal-Conditioned Reinforcement Learning

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.

Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling

stat.ML · 2026-05-20 · unverdicted · novelty 5.0

Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.

Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning

cs.RO · 2026-05-15 · unverdicted · novelty 5.0

An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03

citing papers explorer

Showing 7 of 7 citing papers.

GIANTS: Generative Insight Anticipation from Scientific Literature cs.CL · 2026-04-10 · unverdicted · none · ref 27
GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots cs.RO · 2026-04-28 · unverdicted · none · ref 14
Reference-augmented learning with RNN surrogate and stochastic perturbations cuts average position error by 50.9% for 6-DOF tracking on a three-section TDCR compared to non-augmented baselines.
Abstraction for Offline Goal-Conditioned Reinforcement Learning cs.LG · 2026-05-21 · unverdicted · none · ref 55
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling stat.ML · 2026-05-20 · unverdicted · none · ref 17
Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.
Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning cs.RO · 2026-05-15 · unverdicted · none · ref 3
An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.
The Serial Scaling Hypothesis cs.LG · 2025-07-16 · unverdicted · none · ref 53
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unreviewed · ref 72

1000 layer networks for self-supervised rl: Scaling depth can enable new goal-reaching capabilities

fields

years

verdicts

representative citing papers

citing papers explorer