CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

Geunwoo Kim; Haebin Seong; Jaemin Lee; Jaeyoon Jung; Jinmyung Kwak; Jiyong Youn; Minchan Kim; Minhyeok Oh; Myunchul Joe; Samwoo Seong

arxiv: 2511.20216 · v6 · pith:IQI2I3D7new · submitted 2025-11-25 · 💻 cs.AI · cs.CE· cs.CV· cs.LG· cs.RO

CostNav: A Navigation Benchmark for Real-World Economic-Cost Evaluation of Physical AI Agents

Haebin Seong , Sungmin Kim , Yongjun Cho , Myunchul Joe , Geunwoo Kim , Yubeen Park , Sunhoo Kim , Samwoo Seong

show 15 more authors

Yoonshik Kim Suhwan Choi Jaeyoon Jung Jiyong Youn Jinmyung Kwak Sunghee Ahn Jaemin Lee Younggil Do Seungyeop Yi Woojin Cheong Minhyeok Oh Minchan Kim Seongjae Kang Youngjae Yu Yunsung Lee

This is my paper

classification 💻 cs.AI cs.CEcs.CVcs.LGcs.RO

keywords costnaveconomicnavigationbenchmarkmethodsagentscompliancecost-revenue

0 comments

read the original abstract

Current navigation benchmarks focus on task success but do not capture the economic constraints essential for commercializing autonomous delivery systems. We introduce CostNav, an Economic Navigation Benchmark that evaluates physical AI agents on a cost-revenue and break-even analysis, pairing Isaac Sim's collision and cargo dynamics with industry-standard data such as Securities and Exchange Commission (SEC) filings and Abbreviated Injury Scale (AIS) injury reports. To our knowledge, CostNav is the first physics-grounded economic benchmark to use regulatory and financial data to quantify the gap between navigation metrics and commercial deployment, revealing that high task-success rates alone do not ensure economic viability. Evaluating seven baselines (two rule-based and five imitation-learning methods), we find no method economically viable: all yield negative contribution margins. CANVAS, using only an RGB camera and GPS, attains the highest task success and the least-negative margin among methods with non-zero Service-Level Agreement (SLA) compliance (-\$28.40/run), outperforming LiDAR-equipped Nav2 w/ GPS (-\$37.34/run). A sim-trained policy evaluated on a real delivery robot yields SLA compliance close to its simulation result, indicating that policy performance in CostNav's simulation transfers to real-world deployment. We challenge the community to achieve economic viability on CostNav, which scores methods by cost-revenue outcomes. All resources are available at https://github.com/worv-ai/CostNav.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Vision-Language-Action Safety: Threats, Challenges, Evaluations, and Mechanisms
cs.RO 2026-04 accept novelty 4.0

A literature survey that unifies fragmented work on attacks, defenses, evaluations, and deployment challenges for Vision-Language-Action models in robotics.