AwareVLN introduces a structural reasoning module and automatic data engine with progress division to equip VLN agents with self-awareness of agent state and task progress, outperforming prior methods on Habitat datasets.
Nav-r1: Reasoning and navigation in embodied scenes
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7representative citing papers
PlatonicNav is a training-free framework using Platonic Topological Maps from a self-supervised visual encoder to unify vision-only ObjNav, cross-modal ObjNav, and VLN via blind matching on a shared semantic manifold.
Goal2Pixel grounds VLN-CE goals to image pixels via VLM prediction plus keyframe memory, reaching 54.1% SR on R2R-CE Val-Unseen with 7.75 calls per episode versus 46.62 for action prediction.
SpaAct activates spatial awareness in VLMs using action retrospection, future frame prediction, and progressive curriculum learning to reach SOTA on VLN-CE benchmarks.
GeoWorld applies hyperbolic geometry to JEPA world models and introduces geometric reinforcement learning, reporting modest success-rate gains of ~3% and ~2% on 3- and 4-step planning tasks versus V-JEPA 2.
Semantic progress reasoning predicts instruction-style advancement from visual history to guide policies, yielding state-of-the-art success and efficiency on R2R-CE and RxR-CE.
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.
citing papers explorer
-
PlatonicNav: Unveiling Semantic Correspondence in Navigation with Platonic Topological Maps
PlatonicNav is a training-free framework using Platonic Topological Maps from a self-supervised visual encoder to unify vision-only ObjNav, cross-modal ObjNav, and VLN via blind matching on a shared semantic manifold.
-
Goal2Pixel: Grounding Goals to Pixels for Vision-Language Navigation
Goal2Pixel grounds VLN-CE goals to image pixels via VLM prediction plus keyframe memory, reaching 54.1% SR on R2R-CE Val-Unseen with 7.75 calls per episode versus 46.62 for action prediction.
-
SpaAct: Spatially-Activated Transition Learning with Curriculum Adaptation for Vision-Language Navigation
SpaAct activates spatial awareness in VLMs using action retrospection, future frame prediction, and progressive curriculum learning to reach SOTA on VLN-CE benchmarks.
-
GeoWorld: Geometric World Models
GeoWorld applies hyperbolic geometry to JEPA world models and introduces geometric reinforcement learning, reporting modest success-rate gains of ~3% and ~2% on 3- and 4-step planning tasks versus V-JEPA 2.
-
UniMesh: Unifying 3D Mesh Understanding and Generation
UniMesh unifies 3D mesh generation and understanding in one model via a Mesh Head interface, Chain of Mesh iterative editing, and an Actor-Evaluator self-reflection loop.