1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities, February 2026

10 Scalable Reinforcement Learning via Adaptive Batch Scaling Wang, K · 2026 · arXiv 2503.14858

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

representative citing papers

GIANTS: Generative Insight Anticipation from Scientific Literature

cs.CL · 2026-04-10 · unverdicted · novelty 8.0

GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.

Scalable Reinforcement Learning via Adaptive Batch Scaling

stat.ML · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

ABS uses Behavioral Divergence to adaptively scale batch sizes in RL according to policy volatility, enabling effective large-batch large-network training on ALE benchmarks.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.

Abstraction for Offline Goal-Conditioned Reinforcement Learning

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.

Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning

cs.RO · 2026-05-15 · unverdicted · novelty 5.0

An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.

The Serial Scaling Hypothesis

cs.LG · 2025-07-16 · unverdicted · novelty 5.0

The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.

Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots

cs.RO · 2026-04-28 · unverdicted · novelty 4.0

A reference-augmented offline learning framework for 6-DOF tracking control of tendon-driven continuum robots achieves 50.9% lower average position error than non-augmented baselines.

citing papers explorer

Showing 7 of 7 citing papers after filters.

GIANTS: Generative Insight Anticipation from Scientific Literature cs.CL · 2026-04-10 · unverdicted · none · ref 27
GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
Scalable Reinforcement Learning via Adaptive Batch Scaling stat.ML · 2026-05-20 · unverdicted · none · ref 17 · 2 links
ABS uses Behavioral Divergence to adaptively scale batch sizes in RL according to policy volatility, enabling effective large-batch large-network training on ALE benchmarks.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unverdicted · none · ref 72
FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.
Abstraction for Offline Goal-Conditioned Reinforcement Learning cs.LG · 2026-05-21 · unverdicted · none · ref 55
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning cs.RO · 2026-05-15 · unverdicted · none · ref 3
An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.
The Serial Scaling Hypothesis cs.LG · 2025-07-16 · unverdicted · none · ref 53
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots cs.RO · 2026-04-28 · unverdicted · none · ref 14
A reference-augmented offline learning framework for 6-DOF tracking control of tendon-driven continuum robots achieves 50.9% lower average position error than non-augmented baselines.

1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities, February 2026

fields

years

verdicts

representative citing papers

citing papers explorer