hub Canonical reference

Objectnav revisited: On evaluation of embodied agents navigating to objects

· 2006 · arXiv 2006.13171

Canonical reference. 83% of citing Pith papers cite this work as background.

24 Pith papers citing it

Background 83% of classified citations

read on arXiv browse 24 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 5 baseline 1

citation-polarity summary

background 5 baseline 1

representative citing papers

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration?

cs.CV · 2026-05-31 · accept · novelty 8.0

Introduces the TVR active viewpoint-matching task and TVRBench indoor simulation benchmark, where foundation models start at low single-digit success rates but reach 51.4% after visual-action SFT and multi-turn GRPO post-training.

When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution

cs.AI · 2026-05-14 · unverdicted · novelty 8.0 · 2 refs

LongAct benchmark evaluates long-horizon household task execution from free-form instructions; HoloMind agent raises performance but top VLMs still reach only 59% goal completion and 16% full-task success.

SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning

cs.AI · 2026-05-10 · accept · novelty 8.0 · 2 refs

SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.

Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI

cs.CV · 2021-09-16 · accept · novelty 8.0

HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.

POINav: Benchmarking and Enhancing Final-Meters Arrival in Real-World Vision-Language Navigation

cs.RO · 2026-05-27 · unverdicted · novelty 7.0

POINav-Bench provides the first high-fidelity real-world benchmark for POI-goal VLN using 3DGS reconstructions of 126k m² with 163 POIs, supported by a Brain-Action framework and 70K real signage-entrance dataset.

IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

IntentionNav is a new benchmark showing that VLMs infer intended targets from implicit instructions in 48% of cases but achieve only 25% terminal success and 5.5% grounded success in active navigation.

Action-guided generation of 3D functionality segmentation data

cs.CV · 2025-11-28 · unverdicted · novelty 7.0

SynthFun3D generates synthetic 3D functionality segmentation data from action descriptions via object retrieval and scene arrangement, yielding consistent gains of +2.2 mAP, +6.3 mAR, and +5.7 mIoU when augmenting real data for VLM training.

Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation

cs.RO · 2026-05-26 · unverdicted · novelty 6.0

A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.

ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries

cs.AI · 2026-05-07 · unverdicted · novelty 6.0 · 3 refs

ProCompNav builds a candidate pool from ambiguous queries then uses pool-splitting binary questions for disambiguation, improving success rate and shortening responses on CoIN-Bench and TextNav.

An Efficient Beam Search Algorithm for Active Perception in Mobile Robotics

cs.RO · 2026-04-25 · unverdicted · novelty 6.0

Node-wise beam search with expected gain and RRAG graph construction outperforms prior active perception methods by at least 20% on representative tasks.

ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.

Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting

cs.RO · 2026-04-14 · unverdicted · novelty 6.0

Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.

Visually-grounded Humanoid Agents

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.

HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation

cs.AI · 2026-04-09 · unverdicted · novelty 6.0

HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.

ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation

cs.RO · 2026-03-25 · conditional · novelty 6.0

ReMemNav improves zero-shot object navigation success and efficiency by integrating episodic memory and rethinking with VLMs, achieving SR/SPL gains of 1.7%/7.0% on HM3D v0.1, 18.2%/11.1% on HM3D v0.2, and 8.7%/7.9% on MP3D.

MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation

cs.CV · 2026-02-05 · unverdicted · novelty 6.0

MerNav's Memory-Execute-Review framework improves success rates in zero-shot object goal navigation by 5-8% over baselines on four datasets while outperforming both training-free and supervised methods on key benchmarks.

C-NAV: Towards Self-Evolving Continual Object Navigation in Open World

cs.RO · 2025-10-23 · unverdicted · novelty 6.0

C-Nav is a continual visual navigation framework with dual-path anti-forgetting via feature distillation and replay plus adaptive sampling that outperforms baselines on a new continual object navigation benchmark while using less memory.

Personalized Embodied Navigation for Portable Object Finding

cs.RO · 2024-03-14 · unverdicted · novelty 6.0

Transit-Aware Planning (TAP) enriches navigation policies with object transit data on Dynamic Object Maps, raising success rates by 21.1% in MP3D simulation and 18.3% in real-world tests for finding non-stationary targets.

STEM: Semantic Target Search and Exploration using MAVs in Cluttered Environments

cs.RO · 2026-05-30 · unverdicted · novelty 5.0

STEM develops a semantically-guided combinatorial planner and active perception pipeline that propagates object priorities to frontier voxels, enabling MAVs to find targets faster than baselines in simulation and real-world tests.

TravExplorer: Cross-Floor Embodied Exploration via Traversability-Aware 3-D Planning

cs.RO · 2026-05-19 · unverdicted · novelty 5.0

TravExplorer couples zero-shot semantic guidance with traversability-aware 3-D planning to enable cross-floor object navigation in unseen indoor environments.

CLUE: Adaptively Prioritized Contextual Cues by Leveraging a Unified Semantic Map for Effective Zero-Shot Object-Goal Navigation

cs.RO · 2026-05-19 · unverdicted · novelty 5.0

CLUE adaptively weights room-type and object-co-location cues from an LLM to construct a unified semantic value map that improves success rate and efficiency in zero-shot object-goal navigation.

MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation

cs.RO · 2026-05-01 · unverdicted · novelty 5.0

MiniVLA-Nav v1 provides 1,174 episodes of language-instructed robot navigation in photorealistic simulations with RGB, depth, segmentation, and expert action data.

OpenFrontier: General Navigation with Visual-Language Grounded Frontiers

cs.RO · 2026-03-05 · unverdicted · novelty 5.0

OpenFrontier formulates robot navigation as sparse subgoal reaching via visual-language-grounded frontiers, achieving zero-shot performance without fine-tuning or dense semantic maps.

Agent AI: Surveying the Horizons of Multimodal Interaction

cs.AI · 2024-01-07 · unverdicted · novelty 4.0

The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

citing papers explorer

Showing 24 of 24 citing papers.

Where to Look: Can Foundation Models Reach a Target Viewpoint Through Active Exploration? cs.CV · 2026-05-31 · accept · none · ref 42
Introduces the TVR active viewpoint-matching task and TVRBench indoor simulation benchmark, where foundation models start at low single-digit success rates but reach 51.4% after visual-action SFT and multi-turn GRPO post-training.
When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution cs.AI · 2026-05-14 · unverdicted · none · ref 12 · 2 links
LongAct benchmark evaluates long-horizon household task execution from free-form instructions; HoloMind agent raises performance but top VLMs still reach only 59% goal completion and 16% full-task success.
SimWorld Studio: Automatic Environment Generation with Evolving Coding Agent for Embodied Agent Learning cs.AI · 2026-05-10 · accept · none · ref 9 · 2 links
SimWorld Studio deploys an evolving coding agent to create adaptive 3D environments that co-evolve with embodied learners, delivering 18-point success-rate gains over fixed environments in navigation benchmarks.
Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI cs.CV · 2021-09-16 · accept · none · ref 34
HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.
POINav: Benchmarking and Enhancing Final-Meters Arrival in Real-World Vision-Language Navigation cs.RO · 2026-05-27 · unverdicted · none · ref 4
POINav-Bench provides the first high-fidelity real-world benchmark for POI-goal VLN using 3DGS reconstructions of 126k m² with 163 POIs, supported by a Brain-Action framework and 70K real signage-entrance dataset.
IntentionNav: A Benchmark for Intent-Driven Object Navigation from Implicit Human Instruction cs.CV · 2026-05-22 · unverdicted · none · ref 4
IntentionNav is a new benchmark showing that VLMs infer intended targets from implicit instructions in 48% of cases but achieve only 25% terminal success and 5.5% grounded success in active navigation.
Action-guided generation of 3D functionality segmentation data cs.CV · 2025-11-28 · unverdicted · none · ref 5
SynthFun3D generates synthetic 3D functionality segmentation data from action descriptions via object retrieval and scene arrangement, yielding consistent gains of +2.2 mAP, +6.3 mAR, and +5.7 mIoU when augmenting real data for VLM training.
Uni-LaViRA: Language-Vision-Robot Actions Translation for Unified Embodied Navigation cs.RO · 2026-05-26 · unverdicted · none · ref 4
A zero-shot unified agent for VLN-CE, ObjectNav, EQA and Aerial-VLN on wheeled, quadruped, humanoid and UAV platforms that translates language and vision inputs into actions via MLLMs plus TDM and SCB mechanisms, matching trained foundation models on multiple benchmarks.
ProCompNav: Proactive Instance Navigation with Comparative Judgment for Ambiguous User Queries cs.AI · 2026-05-07 · unverdicted · none · ref 14 · 3 links
ProCompNav builds a candidate pool from ambiguous queries then uses pool-splitting binary questions for disambiguation, improving success rate and shortening responses on CoIN-Bench and TextNav.
An Efficient Beam Search Algorithm for Active Perception in Mobile Robotics cs.RO · 2026-04-25 · unverdicted · none · ref 1
Node-wise beam search with expected gain and RRAG graph construction outperforms prior active perception methods by at least 20% on representative tasks.
ESCAPE: Episodic Spatial Memory and Adaptive Execution Policy for Long-Horizon Mobile Manipulation cs.CV · 2026-04-15 · unverdicted · none · ref 1
ESCAPE combines spatio-temporal fusion mapping for depth-free 3D memory with a memory-driven grounding module and adaptive execution policy to reach 65.09% success on ALFRED test-seen long-horizon mobile manipulation tasks.
Habitat-GS: A High-Fidelity Navigation Simulator with Dynamic Gaussian Splatting cs.RO · 2026-04-14 · unverdicted · none · ref 3
Habitat-GS integrates 3D Gaussian Splatting scene rendering and Gaussian avatars into Habitat-Sim, yielding agents with stronger cross-domain generalization and effective human-aware navigation.
Visually-grounded Humanoid Agents cs.CV · 2026-04-09 · unverdicted · none · ref 6
A coupled world-agent framework uses 3D Gaussian reconstruction and first-person RGB-D perception with iterative planning to enable goal-directed, collision-avoiding humanoid behavior in novel reconstructed scenes.
HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation cs.AI · 2026-04-09 · unverdicted · none · ref 2
HiRO-Nav adaptively triggers reasoning only on high-entropy actions via a hybrid training pipeline and shows better success-token trade-offs than always-reason or never-reason baselines on the CHORES-S benchmark.
ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation cs.RO · 2026-03-25 · conditional · none · ref 1
ReMemNav improves zero-shot object navigation success and efficiency by integrating episodic memory and rethinking with VLMs, achieving SR/SPL gains of 1.7%/7.0% on HM3D v0.1, 18.2%/11.1% on HM3D v0.2, and 8.7%/7.9% on MP3D.
MerNav: A Highly Generalizable Memory-Execute-Review Framework for Zero-Shot Object Goal Navigation cs.CV · 2026-02-05 · unverdicted · none · ref 5
MerNav's Memory-Execute-Review framework improves success rates in zero-shot object goal navigation by 5-8% over baselines on four datasets while outperforming both training-free and supervised methods on key benchmarks.
C-NAV: Towards Self-Evolving Continual Object Navigation in Open World cs.RO · 2025-10-23 · unverdicted · none · ref 2
C-Nav is a continual visual navigation framework with dual-path anti-forgetting via feature distillation and replay plus adaptive sampling that outperforms baselines on a new continual object navigation benchmark while using less memory.
Personalized Embodied Navigation for Portable Object Finding cs.RO · 2024-03-14 · unverdicted · none · ref 4
Transit-Aware Planning (TAP) enriches navigation policies with object transit data on Dynamic Object Maps, raising success rates by 21.1% in MP3D simulation and 18.3% in real-world tests for finding non-stationary targets.
STEM: Semantic Target Search and Exploration using MAVs in Cluttered Environments cs.RO · 2026-05-30 · unverdicted · none · ref 37
STEM develops a semantically-guided combinatorial planner and active perception pipeline that propagates object priorities to frontier voxels, enabling MAVs to find targets faster than baselines in simulation and real-world tests.
TravExplorer: Cross-Floor Embodied Exploration via Traversability-Aware 3-D Planning cs.RO · 2026-05-19 · unverdicted · none · ref 38
TravExplorer couples zero-shot semantic guidance with traversability-aware 3-D planning to enable cross-floor object navigation in unseen indoor environments.
CLUE: Adaptively Prioritized Contextual Cues by Leveraging a Unified Semantic Map for Effective Zero-Shot Object-Goal Navigation cs.RO · 2026-05-19 · unverdicted · none · ref 1
CLUE adaptively weights room-type and object-co-location cues from an LLM to construct a unified semantic value map that improves success rate and efficiency in zero-shot object-goal navigation.
MiniVLA-Nav v1: A Multi-Scene Simulation Dataset for Language-Conditioned Robot Navigation cs.RO · 2026-05-01 · unverdicted · none · ref 11
MiniVLA-Nav v1 provides 1,174 episodes of language-instructed robot navigation in photorealistic simulations with RGB, depth, segmentation, and expert action data.
OpenFrontier: General Navigation with Visual-Language Grounded Frontiers cs.RO · 2026-03-05 · unverdicted · none · ref 1
OpenFrontier formulates robot navigation as sparse subgoal reaching via visual-language-grounded frontiers, achieving zero-shot performance without fine-tuning or dense semantic maps.
Agent AI: Surveying the Horizons of Multimodal Interaction cs.AI · 2024-01-07 · unverdicted · none · ref 8
The paper defines Agent AI as interactive multimodal systems that perceive grounded data and generate embodied actions, arguing this approach can mitigate hallucinations in foundation models.

Objectnav revisited: On evaluation of embodied agents navigating to objects

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer