pith. sign in

and Hartley, Richard I

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it
abstract

Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: https://steve-zeyu-zhang.github.io/GeoWorld.

citation-role summary

background 2

citation-polarity summary

years

2026 4

roles

background 2

polarities

background 2

representative citing papers

HSG: Hyperbolic Scene Graph

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.

citing papers explorer

Showing 4 of 4 citing papers.

  • How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction cs.CV · 2026-05-19 · unverdicted · none · ref 33 · internal anchor

    TrajPilot predicts candidate future trajectories from egocentric context and uses them to condition action prediction in an embedding space, outperforming VLM and planner baselines on Ego-Exo4D, Ego4D, and other datasets with gains increasing at longer horizons.

  • Recovering Physical Dynamics from Discrete Observations via Intrinsic Differential Consistency cs.LG · 2026-05-08 · unverdicted · none · ref 45 · internal anchor

    Enforcing semi-group consistency on a time-conditioned secant velocity field via Symmetry Rupture improves rollout accuracy and efficiency when learning physical dynamics from discrete observations.

  • HSG: Hyperbolic Scene Graph cs.CV · 2026-04-19 · unverdicted · none · ref 60 · internal anchor

    Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.

  • Grounded World Model for Semantically Generalizable Planning cs.RO · 2026-04-13 · conditional · none · ref 65 · internal anchor

    A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.