GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities, February 2026
7 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 7representative citing papers
ABS uses Behavioral Divergence to adaptively scale batch sizes in RL according to policy volatility, enabling effective large-batch large-network training on ALE benchmarks.
FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
A reference-augmented offline learning framework for 6-DOF tracking control of tendon-driven continuum robots achieves 50.9% lower average position error than non-augmented baselines.
citing papers explorer
-
GIANTS: Generative Insight Anticipation from Scientific Literature
GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
-
Scalable Reinforcement Learning via Adaptive Batch Scaling
ABS uses Behavioral Divergence to adaptively scale batch sizes in RL according to policy volatility, enabling effective large-batch large-network training on ALE benchmarks.
-
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.
-
Abstraction for Offline Goal-Conditioned Reinforcement Learning
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
-
Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning
An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.
-
The Serial Scaling Hypothesis
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
-
Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots
A reference-augmented offline learning framework for 6-DOF tracking control of tendon-driven continuum robots achieves 50.9% lower average position error than non-augmented baselines.