Advances in neural information processing systems , volume=

When to trust your model: Model-based policy optimization , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.

On Training in Imagination

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

The work derives the optimal ratio of dynamics-to-reward samples that minimizes a bound on return error and characterizes the tradeoff between noisy but cheap rewards versus accurate but expensive ones in imagination-based policy optimization.

Offline Reinforcement Learning for Rotation Profile Control in Tokamaks

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Offline RL policies trained solely on DIII-D historical data were deployed on the tokamak and produced promising real-world control of the plasma rotation profile.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

citing papers explorer

Showing 4 of 4 citing papers.

LeapTS: Rethinking Time Series Forecasting as Adaptive Multi-Horizon Scheduling cs.LG · 2026-05-11 · unverdicted · none · ref 25
LeapTS reformulates forecasting as adaptive multi-horizon scheduling via hierarchical control and NCDEs, delivering at least 7.4% better performance and 2.6-5.3x faster inference than Transformer baselines while adapting to non-stationary dynamics.
On Training in Imagination cs.LG · 2026-05-07 · unverdicted · none · ref 5
The work derives the optimal ratio of dynamics-to-reward samples that minimizes a bound on return error and characterizes the tradeoff between noisy but cheap rewards versus accurate but expensive ones in imagination-based policy optimization.
Offline Reinforcement Learning for Rotation Profile Control in Tokamaks cs.LG · 2026-05-07 · unverdicted · none · ref 13
Offline RL policies trained solely on DIII-D historical data were deployed on the tokamak and produced promising real-world control of the plasma rotation profile.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 127
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

Advances in neural information processing systems , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer