DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
Mopo: Model-Based Offline Policy Optimization
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
JD-BP jointly generates bids and pricing corrections via generative models, memory-less return-to-go, trajectory augmentation, and energy-based DPO to improve auto-bidding performance despite prediction errors and latency.
World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.
A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all datasets and code.
CCSS-IX is a context-conditioned structured simulator for wastewater digital twins that uses adaptive expert mixing and self-falsifying conformal decision rules to reduce unsafe actions while maintaining low prediction error on real plant and benchmark data.
citing papers explorer
-
Mastering Atari with Discrete World Models
DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
-
JD-BP: A Joint-Decision Generative Framework for Auto-Bidding and Pricing
JD-BP jointly generates bids and pricing corrections via generative models, memory-less return-to-go, trajectory augmentation, and energy-based DPO to improve auto-bidding performance despite prediction errors and latency.
-
Safety, Security, and Cognitive Risks in World Models
World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.
-
What Matters in Learning from Offline Human Demonstrations for Robot Manipulation
A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all datasets and code.
-
Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support
CCSS-IX is a context-conditioned structured simulator for wastewater digital twins that uses adaptive expert mixing and self-falsifying conformal decision rules to reduce unsafe actions while maintaining low prediction error on real plant and benchmark data.