pith. sign in

Mixed citations

arXiv preprint arXiv:2601.05230 (2026)

Mixed citation behavior. Most common role is background (60%).

9 Pith papers citing it
Background 60% of classified citations

citation-role summary

background 5

citation-polarity summary

years

2026 9

verdicts

UNVERDICTED 9

roles

background 5

polarities

background 3 unclear 2

representative citing papers

DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos

cs.RO · 2026-02-06 · unverdicted · novelty 7.0

DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

DiLA: Disentangled Latent Action World Models

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.

Why Latent Actions Fail, and How to Prevent It

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Extending linear LAMs to model exogenous state shows standard reconstruction encodes future exogenous info in latent actions, while endogenous-focused spaces and auxiliary objectives like action-supervision enforce consistency across noise.

GazeVLA: Learning Human Intention for Robotic Manipulation

cs.RO · 2026-04-24 · unverdicted · novelty 6.0

GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

Human Cognition in Machines: A Unified Perspective of World Models

cs.RO · 2026-04-17 · unverdicted · novelty 6.0

The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

PhyWorld: Physics-Faithful World Model for Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.

citing papers explorer

Showing 9 of 9 citing papers.

  • Render, Don't Decode: Weight-Space World Models with Latent Structural Disentanglement cs.CV · 2026-05-07 · unverdicted · none · ref 7 · 2 links

    NOVA represents world states as INR weights for decoder-free rendering, compactness, and unsupervised disentanglement of background, foreground, and motion in video world models.

  • Latent State Design for World Models under Sufficiency Constraints cs.AI · 2026-05-03 · unverdicted · none · ref 21

    World models succeed when their latent states are built to meet task-specific sufficiency constraints rather than preserving the maximum amount of information.

  • DreamDojo: A Generalist Robot World Model from Large-Scale Human Videos cs.RO · 2026-02-06 · unverdicted · none · ref 24

    DreamDojo is a foundation world model pretrained on the largest human video dataset to date that uses continuous latent actions to transfer interaction knowledge and achieves controllable physics simulation after robot post-training.

  • DiLA: Disentangled Latent Action World Models cs.CV · 2026-05-15 · unverdicted · none · ref 7

    DiLA uses content-structure disentanglement driven by predictive bottlenecks to create semantically structured latent actions for high-fidelity video world models.

  • SCAR: Self-Supervised Continuous Action Representation Learning cs.RO · 2026-05-13 · unverdicted · none · ref 17

    SCAR proposes a joint inverse-forward dynamics framework to learn transferable continuous action representations across embodiments from visual data using regularization and adversarial invariance.

  • Why Latent Actions Fail, and How to Prevent It cs.CV · 2026-05-13 · unverdicted · none · ref 19

    Extending linear LAMs to model exogenous state shows standard reconstruction encodes future exogenous info in latent actions, while endogenous-focused spaces and auxiliary objectives like action-supervision enforce consistency across noise.

  • GazeVLA: Learning Human Intention for Robotic Manipulation cs.RO · 2026-04-24 · unverdicted · none · ref 24

    GazeVLA pretrains on large human egocentric datasets to capture gaze-based intention, then finetunes on limited robot data with chain-of-thought reasoning to achieve better robotic manipulation performance than baselines.

  • Human Cognition in Machines: A Unified Perspective of World Models cs.RO · 2026-04-17 · unverdicted · none · ref 52

    The paper introduces a unified framework for world models that fully incorporates all cognitive functions from Cognitive Architecture Theory, highlights under-researched areas in motivation and meta-cognition, and proposes Epistemic World Models as a new category for scientific discovery agents.

  • PhyWorld: Physics-Faithful World Model for Video Generation cs.CV · 2026-05-19 · unverdicted · none · ref 22

    PhyWorld improves temporal consistency and physical plausibility in video world models via flow matching fine-tuning followed by DPO on physics preference pairs, with reported gains on VBench and a custom physical-faithfulness benchmark.