and Hartley, Richard I

Zhang, Zeyu, Li, Danning, Reid, Ian D · 2026 · cs.CV · arXiv 2602.23058

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Energy-based predictive world models provide a powerful approach for multi-step visual planning by reasoning over latent energy landscapes rather than generating pixels. However, existing approaches face two major challenges: (i) their latent representations are typically learned in Euclidean space, neglecting the underlying geometric and hierarchical structure among states, and (ii) they struggle with long-horizon prediction, which leads to rapid degradation across extended rollouts. To address these challenges, we introduce GeoWorld, a geometric world model that preserves geometric structure and hierarchical relations through a Hyperbolic JEPA, which maps latent representations from Euclidean space onto hyperbolic manifolds. We further introduce Geometric Reinforcement Learning for energy-based optimization, enabling stable multi-step planning in hyperbolic latent space. Extensive experiments on CrossTask and COIN demonstrate around 3% SR improvement in 3-step planning and 2% SR improvement in 4-step planning compared to the state-of-the-art V-JEPA 2. Project website: https://steve-zeyu-zhang.github.io/GeoWorld.

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

TrajPilot predicts candidate future trajectories from egocentric context and uses them to condition action prediction in an embedding space, outperforming VLM and planner baselines on Ego-Exo4D, Ego4D, and other datasets with gains increasing at longer horizons.

Recovering Physical Dynamics from Discrete Observations via Intrinsic Differential Consistency

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

Enforcing semi-group consistency on a time-conditioned secant velocity field via Symmetry Rupture improves rollout accuracy and efficiency when learning physical dynamics from discrete observations.

HSG: Hyperbolic Scene Graph

cs.CV · 2026-04-19 · unverdicted · novelty 6.0

Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.

Grounded World Model for Semantically Generalizable Planning

cs.RO · 2026-04-13 · conditional · novelty 6.0

A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.

citing papers explorer

Showing 4 of 4 citing papers.

How You Move Tells What You'll Do: Trajectory-Conditioned Egocentric Prediction cs.CV · 2026-05-19 · unverdicted · none · ref 33 · internal anchor
TrajPilot predicts candidate future trajectories from egocentric context and uses them to condition action prediction in an embedding space, outperforming VLM and planner baselines on Ego-Exo4D, Ego4D, and other datasets with gains increasing at longer horizons.
Recovering Physical Dynamics from Discrete Observations via Intrinsic Differential Consistency cs.LG · 2026-05-08 · unverdicted · none · ref 45 · internal anchor
Enforcing semi-group consistency on a time-conditioned secant velocity field via Symmetry Rupture improves rollout accuracy and efficiency when learning physical dynamics from discrete observations.
HSG: Hyperbolic Scene Graph cs.CV · 2026-04-19 · unverdicted · none · ref 60 · internal anchor
Hyperbolic Scene Graph (HSG) learns embeddings in hyperbolic space for better hierarchical structure in scene graphs, achieving graph IoU of 33.51 versus 25.37 for the best Euclidean baseline.
Grounded World Model for Semantically Generalizable Planning cs.RO · 2026-04-13 · conditional · none · ref 65 · internal anchor
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.

and Hartley, Richard I

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer