EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.
Ho-cap: A capture system and dataset for 3d reconstruction and pose tracking of hand-object interaction
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
CoLA-World jointly trains latent action models and world models with a warm-up phase to achieve co-evolution, matching or exceeding prior two-stage methods in video simulation quality and visual planning performance.
villa-X enhances latent action modeling in VLA models to support zero-shot action planning for unseen robot embodiments and open-vocabulary instructions, yielding better manipulation results in simulation and real-world tests.
citing papers explorer
-
EgoTL: Egocentric Think-Aloud Chains for Long-Horizon Tasks
EgoTL provides a new egocentric dataset with think-aloud chains and metric labels that benchmarks VLMs on long-horizon tasks and improves their planning, reasoning, and spatial grounding after finetuning.
-
Event6D: Event-based Novel Object 6D Pose Tracking
EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
-
Co-Evolving Latent Action World Models
CoLA-World jointly trains latent action models and world models with a warm-up phase to achieve co-evolution, matching or exceeding prior two-stage methods in video simulation quality and visual planning performance.
-
villa-X: Enhancing Latent Action Modeling in Vision-Language-Action Models
villa-X enhances latent action modeling in VLA models to support zero-shot action planning for unseen robot embodiments and open-vocabulary instructions, yielding better manipulation results in simulation and real-world tests.