EgoWalk supplies 50 hours of real-world multimodal human navigation data in varied indoor/outdoor settings together with open pipelines that auto-generate language goal annotations and traversability masks.
Grounding dino: Marrying dino with grounded pre-training for open-set object detection
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2025 2roles
background 1polarities
unclear 1representative citing papers
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.
citing papers explorer
-
EgoWalk: A Multimodal Dataset for Robot Navigation in the Wild
EgoWalk supplies 50 hours of real-world multimodal human navigation data in varied indoor/outdoor settings together with open pipelines that auto-generate language goal annotations and traversability masks.
-
Seed1.5-VL Technical Report
Seed1.5-VL is a compact multimodal model that sets new records on dozens of vision-language benchmarks and outperforms prior systems on agent-style tasks.