GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
1000 layer networks for self-supervised rl: Scaling depth can enable new goal-reaching capabilities
7 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Reference-augmented learning with RNN surrogate and stochastic perturbations cuts average position error by 50.9% for 6-DOF tracking on a three-section TDCR compared to non-augmented baselines.
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.
An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
citing papers explorer
-
GIANTS: Generative Insight Anticipation from Scientific Literature
GIANTS-4B, trained with RL on a new 17k-example benchmark of parent-to-child paper insights, achieves 34% relative improvement over gemini-3-pro in LM-judge similarity and is rated higher-impact by a citation predictor.
-
Reference-Augmented Learning for Precise Tracking Policy of Tendon-Driven Continuum Robots
Reference-augmented learning with RNN surrogate and stochastic perturbations cuts average position error by 50.9% for 6-DOF tracking on a three-section TDCR compared to non-augmented baselines.
-
Abstraction for Offline Goal-Conditioned Reinforcement Learning
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
-
Scalable On-Policy Reinforcement Learning via Adaptive Batch Scaling
Adaptive Batch Scaling dynamically increases batch size in on-policy RL as policy volatility drops, measured by a new Behavioral Divergence metric, and shows larger networks plus larger batches outperform on ALE with PQN.
-
Adaptive Outer-Loop Control of Quadrotors via Reinforcement Learning
An RL-based outer-loop quadrotor controller augmented with an online Residual Dynamics Predictor for disturbance estimation and a data-efficient sim-to-real calibration bridge.
-
The Serial Scaling Hypothesis
The serial scaling hypothesis formalizes inherently serial problems in complexity theory and demonstrates that diffusion models cannot solve them.
- Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning