Intro- ducing hot3d: An egocentric dataset for 3d hand and object tracking

· 2024 · arXiv 2406.09598

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2 dataset 2

citation-polarity summary

background 2 use dataset 2

representative citing papers

StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

StableHand introduces a quality-aware flow matching framework conditioned on predicted four-channel per-frame hand observation quality to estimate dual-hand world-space motion from egocentric video, achieving SOTA results with 20-25% W-MPJPE reduction on HOT3D and ARCTIC benchmarks.

Event6D: Event-based Novel Object 6D Pose Tracking

cs.CV · 2026-03-30 · conditional · novelty 7.0

EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.

EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices

cs.CV · 2026-05-16 · unverdicted · novelty 6.0

EgoKit is a new toolkit and accessory set that unifies egocentric video collection with wrist views across heterogeneous consumer devices using a consistent interface and log format.

Hierarchical and Holistic Open-Vocabulary Functional 3D Scene Graphs for Indoor Spaces

cs.RO · 2026-05-15 · unverdicted · novelty 6.0

An open-vocabulary pipeline anchors functional edges via 2D visual grounding then uses temporal 3D graph optimization with evidence accumulation and entropy regularization to build hierarchical scene graphs for dense indoor scenes.

EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

EgoForce recovers absolute camera-space 3D hand pose from monocular egocentric images using forearm guidance, a unified arm-hand transformer, and a closed-form ray-space solver that handles fisheye, perspective, and wide-FOV cameras.

DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

cs.RO · 2026-05-07 · unverdicted · novelty 6.0

DexSynRefine synthesizes HOI motions with an extended manifold method, refines them via task-space residual RL, and adapts for sim-to-real transfer, outperforming kinematic retargeting by 50-70 percentage points on five dexterous tasks.

EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World

cs.RO · 2026-04-08 · unverdicted · novelty 6.0

EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.

HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis

cs.CV · 2026-03-31 · unverdicted · novelty 6.0

HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

cs.RO · 2025-07-16 · conditional · novelty 6.0

EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.

citing papers explorer

Showing 9 of 9 citing papers.

StableHand: Quality-Aware Flow Matching for World-Space Dual-Hand Motion Estimation from Egocentric Video cs.CV · 2026-05-18 · unverdicted · none · ref 2
StableHand introduces a quality-aware flow matching framework conditioned on predicted four-channel per-frame hand observation quality to estimate dual-hand world-space motion from egocentric video, achieving SOTA results with 20-25% W-MPJPE reduction on HOT3D and ARCTIC benchmarks.
Event6D: Event-based Novel Object 6D Pose Tracking cs.CV · 2026-03-30 · conditional · none · ref 3
EventTrack6D tracks 6D poses of unseen objects from event cameras by reconstructing dense intensity and depth cues between frames, generalizing from synthetic training to real data at high speed.
EgoKit: Towards Unified Low-Cost Egocentric Data Collection with Heterogeneous Devices cs.CV · 2026-05-16 · unverdicted · none · ref 33
EgoKit is a new toolkit and accessory set that unifies egocentric video collection with wrist views across heterogeneous consumer devices using a consistent interface and log format.
Hierarchical and Holistic Open-Vocabulary Functional 3D Scene Graphs for Indoor Spaces cs.RO · 2026-05-15 · unverdicted · none · ref 4
An open-vocabulary pipeline anchors functional edges via 2D visual grounding then uses temporal 3D graph optimization with evidence accumulation and entropy regularization to build hierarchical scene graphs for dense indoor scenes.
EgoForce: Forearm-Guided Camera-Space 3D Hand Pose from a Monocular Egocentric Camera cs.CV · 2026-05-12 · unverdicted · none · ref 1
EgoForce recovers absolute camera-space 3D hand pose from monocular egocentric images using forearm guidance, a unified arm-hand transformer, and a closed-form ray-space solver that handles fisheye, perspective, and wide-FOV cameras.
DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions cs.RO · 2026-05-07 · unverdicted · none · ref 11
DexSynRefine synthesizes HOI motions with an extended manifold method, refines them via task-space residual RL, and adapts for sim-to-real transfer, outperforming kinematic retargeting by 50-70 percentage points on five dexterous tasks.
EgoVerse: An Egocentric Human Dataset for Robot Learning from Around the World cs.RO · 2026-04-08 · unverdicted · none · ref 3
EgoVerse releases 1,362 hours of standardized egocentric human data across 1,965 tasks and shows via multi-lab experiments that robot policy performance scales with human data volume when the data aligns with robot objectives.
HVG-3D: Bridging Real and Simulation Domains for 3D-Conditional Hand-Object Interaction Video Synthesis cs.CV · 2026-03-31 · unverdicted · none · ref 2
HVG-3D uses a 3D-aware diffusion architecture with ControlNet to synthesize high-fidelity hand-object interaction videos from 3D control signals, achieving state-of-the-art spatial fidelity and temporal coherence on the TASTE-Rob dataset.
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos cs.RO · 2025-07-16 · conditional · none · ref 79
EgoVLA pretrains VLA models on egocentric human videos, retargets predicted actions to robots via IK, and fine-tunes on few robot demos to improve bimanual manipulation performance on a new simulation benchmark.

Intro- ducing hot3d: An egocentric dataset for 3d hand and object tracking

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer