A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
Megasam: Accurate, fast, and robust structure and motion from casual dynamic videos
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A generative model produces realistic and coherent 360 panoramic videos from in-the-wild perspective videos via curated online data and geometry-motion aware operations.
PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.
Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.
MoGe-2 recovers metric-scale 3D point maps with fine details from single images via data refinement and extension of affine-invariant predictions.
citing papers explorer
-
Large Video Planner Enables Generalizable Robot Control
A video foundation model trained on human demonstrations generates zero-shot plans that convert to executable robot actions on novel scenes and tasks.
-
Beyond the Frame: Generating 360 Panoramic Videos from Perspective Videos
A generative model produces realistic and coherent 360 panoramic videos from in-the-wild perspective videos via curated online data and geometry-motion aware operations.
-
Quantitative Video World Model Evaluation for Geometric-Consistency
PDI-Bench computes 3D projective residuals from segmented and tracked points to quantify geometric inconsistency in AI-generated videos.
-
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence
Spatial-MLLM adds a 3D spatial encoder initialized from a visual geometry model and space-aware frame sampling to MLLMs to improve spatial understanding and reasoning from purely 2D visual inputs.
-
MoGe-2: Accurate Monocular Geometry with Metric Scale and Sharp Details
MoGe-2 recovers metric-scale 3D point maps with fine details from single images via data refinement and extension of affine-invariant predictions.