{"total":15,"items":[{"citing_arxiv_id":"2607.00811","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"From Pixels to Temporal Correlations: Learning Informative Representations for Reinforcement Learning Pre-training","primary_cat":"cs.LG","submitted_at":"2026-07-01T11:38:58+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"MTCL learns multi-scale temporal correlations in videos via contrastive learning to produce more informative representations that improve sample efficiency and performance in downstream RL tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.20781","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Action Models: A Survey","primary_cat":"cs.RO","submitted_at":"2026-06-18T17:05:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"A survey that clarifies boundaries and organizes World Action Models by generation requirements and predictive substrates, identifying a trend toward generating less of the future.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"where o′ is written as an observation for clarity. The symbol is deliberately broad in this section. Depending on the method, it may be rendered pixels, a hidden feature, a geometric state, an affordance map, an audio cue, a symbolic state, or a token-level description. Section 4 later separates these cases. In latent model-based reinforcement learning, PlaNet [55], DreamerV3 [57], TransDreamer [18], and Dreamer 4 [58] instantiate this idea with compact dynamics states used for imagination and planning. V-JEPA [7] and V-JEPA 2 [4] show the same predictive idea in feature space, while iVideoGPT [174], RoboDreamer [210], EnerVerse [66], and InteractiveWorldSimulator [166] express it through token or video prediction. 4 Notation o observation l language or goal a action o' future observation p(x | y) conditional"},{"citing_arxiv_id":"2606.01027","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"$\\tau_0$-WM: A Unified Video-Action World Model for Robotic Manipulation","primary_cat":"cs.RO","submitted_at":"2026-05-31T05:35:36+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A shared video diffusion backbone jointly predicts future latents and continuous actions while also rolling out candidate actions to predict dense task-progress scores, trained on 27,300 hours of mixed robot and human data.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00780","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Behavior-Invariant Task Representation Learning with Transformer-based World Models for Offline Meta-Reinforcement Learning","primary_cat":"cs.LG","submitted_at":"2026-05-30T15:53:03+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The work introduces behavior-invariant latent task representations via information-theoretic learning in a Transformer world model plus conservative penalties on imagined rollouts to improve generalization in offline meta-RL.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.00133","ref_index":153,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Models: A Comprehensive Survey of Architectures, Methodologies, Reasoning Paradigms, and Applications","primary_cat":"cs.LG","submitted_at":"2026-05-28T21:23:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"The paper delivers a multi-axis taxonomy for world models that maps architectures, training families, reasoning strategies, and domains from early cognitive foundations through systems such as Dreamer, MuZero, and Sora while noting evaluation gaps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.05208","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks","primary_cat":"eess.SP","submitted_at":"2026-05-26T06:44:07+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":1.0,"formal_verification":"none","one_line_summary":"A survey of Transformer-enhanced reinforcement learning fundamentals and applications in communication networks covering resource allocation, computation offloading, routing, trajectory control, and security.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.12090","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"World Action Models: The Next Frontier in Embodied AI","primary_cat":"cs.RO","submitted_at":"2026-05-12T13:10:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"(VLMs) to map perceptual inputs directly to the action space. Formally , the VLA objective is defined by the conditional probability of actions given the multimodal context: 3 Roadmap to W AM Background VLAs World Model Action-Conditioned iVideoGPT [23], FlowDreamer [24], EnerVerse [25], PlaNet [26], TransDreamer [27], V-JEP A [28]. . . Langugae-Conditoned MoCoGAN [29], U-Net [30], Latte [ 31], Wan [32], Sora 2 [ 33]. . . Embodied World Model SWIM [34], DreamDojo [ 35], RoboDreamer [36], RoboScape [37]. . . WM for VLA Imitation Learning Ctrl-World [38], RoboScape [37], DREMA [ 39] Reinforcement Learning Dreamer to Control [ 40] DreamerV2 [ 41], Dreamer 4 [ 42], RISE [ 43] DreamerV3 [44], DayDreamer [45], World-Env [46], RoboScape-R [47]"},{"citing_arxiv_id":"2605.08578","ref_index":49,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Probing the Impact of Scale on Data-Efficient, Generalist Transformer World Models for Atari","primary_cat":"cs.LG","submitted_at":"2026-05-09T00:43:39+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Transformer world models on Atari exhibit game-specific scaling regimes, but joint training on 26 environments produces consistent monotonic gains that improve downstream control policies to a median normalized score of 0.770.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.01950","ref_index":8,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TRAP: Tail-aware Ranking Attack for World-Model Planning","primary_cat":"cs.LG","submitted_at":"2026-05-03T16:19:45+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"TRAP is a tail-aware ranking attack that plants a backdoor in world models so that a trigger causes the model to reorder a few critical imagined trajectories and redirect planning while preserving normal behavior on clean inputs.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"(4)We validate TRAP on representative world models and control tasks, demonstrating strong attack effectiveness, clean-condition fidelity, and good transferability across planning paradigms. 2 Related Work 2.1 World Models World models have become a central paradigm for model-based decision making by learning environment dynamics to support future prediction and planning [ 8, 15, 17, 18, 28, 31, 34, 40, 44]. Early work such as World Models [15] demonstrated that learned generative models could support policy learning, inspiring a broad line of research in model-based reinforcement learning. Among recent representative approaches, the Dreamer family [17, 19, 20], culminating in DreamerV3 [20], performs long-horizon policy opti-"},{"citing_arxiv_id":"2604.22748","ref_index":48,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Agentic World Modeling: Foundations, Capabilities, Laws, and Beyond","primary_cat":"cs.AI","submitted_at":"2026-04-24T17:48:47+00:00","verdict":null,"verdict_confidence":null,"novelty_score":null,"formal_verification":null,"one_line_summary":null,"context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08780","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning","primary_cat":"cs.RO","submitted_at":"2026-04-09T21:31:24+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Morphology-conditioned quadrupedal world model enables zero-shot generalization to new robot embodiments for locomotion tasks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.08199","ref_index":9,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation","primary_cat":"cs.NI","submitted_at":"2026-04-09T12:56:55+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"MobiWM is a multimodal world model for mobile networks that learns state-action dynamics to enable unlimited-horizon counterfactual traffic simulations and optimization.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"toencoder with a recurrent network to train policies entirely within imagined rollouts. Hafneret al.introduce DreamerV1-V3 [ 12-14], proposing the Recurrent State-Space Model (RSSM) to learn latent dynamics and train actor-critic policies from imagination alone. More recently, Transformer-based world models have shown ad- vantages in capturing long-range dependencies: TransDreamer [9] replaces recurrent dynamics with a Transformer State-Space Model, Beyond Static Forecasting: Unleashing the Power of World Models for Mobile Traffic Extrapolation Conference'17, July 2017, Washington, DC, USA POI OD flow facility map Prediction head 𝒄𝒏 𝒂𝒕 Mobile Network World Model (MobiWM) … 𝒔𝒕+𝑷𝒔𝒕+𝟏 … 𝒔𝒕𝒔𝒕−𝑯+𝟏 … 𝒂𝒕+𝑷𝒂𝒕−𝑯+𝟏"},{"citing_arxiv_id":"2603.15759","ref_index":5,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Simulation Distillation: Pretraining World Models in Simulation for Rapid Real-World Adaptation","primary_cat":"cs.RO","submitted_at":"2026-03-16T18:00:23+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SimDist pretrains world models in simulation and adapts them to real-world robots by updating only the latent dynamics model, enabling rapid improvement on contact-rich tasks where prior methods fail.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.03438","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Multimodal Reinforcement Learning with Adaptive Verifier for AI Agents","primary_cat":"cs.AI","submitted_at":"2025-12-03T04:42:47+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Argos is an agentic verifier that adaptively picks scoring functions to evaluate accuracy, localization, and reasoning quality, enabling stronger multimodal RL training for AI agents.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2310.06114","ref_index":248,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Learning Interactive Real-World Simulators","primary_cat":"cs.AI","submitted_at":"2023-10-09T19:42:22+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":7.0,"formal_verification":"none","one_line_summary":"UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}