DreamerV2 reaches human-level performance on 55 Atari games by learning behaviors inside a separately trained discrete-latent world model.
Mopo: Model-Based Offline Policy Optimization
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
SC3-Eval enforces three consistencies on a video model to produce policy rollouts that correlate 0.929 with real-world performance across seven vision-language-action policies and reproduce observed failure modes.
The Inverter framework formalizes inverse learning to generate coherent multi-step trajectories, outperforming offline RL and diffusion baselines on D4RL maze tasks by 24% on average with 10-100x less inference time while also matching GRAPE fidelity on single-qubit gates at >1000x speed.
JD-BP jointly generates bids and pricing corrections via generative models, memory-less return-to-go, trajectory augmentation, and energy-based DPO to improve auto-bidding performance despite prediction errors and latency.
World models enable efficient AI planning but create risks from adversarial corruption, goal misgeneralization, and human bias, demonstrated via attacks that amplify errors and reduce rewards on models like RSSM and DreamerV3.
A comprehensive benchmark study of offline imitation learning methods on multi-stage robot manipulation tasks identifies key sensitivities to algorithm design, data quality, and stopping criteria while releasing all datasets and code.
CCSS-IX is a context-conditioned structured simulator for wastewater digital twins that uses adaptive expert mixing and self-falsifying conformal decision rules to reduce unsafe actions while maintaining low prediction error on real plant and benchmark data.
citing papers explorer
No citing papers match the current filters.