{"paper":{"title":"On Evaluation of Embodied Navigation Agents","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Embodied navigation research requires standardized evaluation measures and scenarios to allow direct comparison of agents.","cross_cats":["cs.CV","cs.LG","cs.RO"],"primary_cat":"cs.AI","authors_text":"Alexey Dosovitskiy, Amir R. Zamir, Angel Chang, Devendra Singh Chaplot, Jana Kosecka, Jitendra Malik, Manolis Savva, Peter Anderson, Roozbeh Mottaghi, Saurabh Gupta, Vladlen Koltun","submitted_at":"2018-07-18T03:28:02Z","abstract_excerpt":"Skillful mobile operation in three-dimensional environments is a primary topic of study in Artificial Intelligence. The past two years have seen a surge of creative work on navigation. This creative output has produced a plethora of sometimes incompatible task definitions and evaluation protocols. To coordinate ongoing and future research in this area, we have convened a working group to study empirical methodology in navigation research. The present document summarizes the consensus recommendations of this working group. We discuss different problem statements and the role of generalization, "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"To coordinate ongoing and future research in this area, we have convened a working group to study empirical methodology in navigation research. The present document summarizes the consensus recommendations of this working group. We discuss different problem statements and the role of generalization, present evaluation measures, and provide standard scenarios that can be used for benchmarking.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the research community will adopt the proposed evaluation measures and standard scenarios rather than continuing with incompatible custom protocols.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Consensus recommendations for standardized evaluation measures, problem statements, and benchmarking scenarios in embodied navigation research.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Embodied navigation research requires standardized evaluation measures and scenarios to allow direct comparison of agents.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c4122f11bae2377388bbe5eb90a6876e4710a278226db74966ca74ada34abb94"},"source":{"id":"1807.06757","kind":"arxiv","version":1},"verdict":{"id":"7637ed82-63be-4b48-883b-939a3500fb54","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-13T22:39:18.240039Z","strongest_claim":"To coordinate ongoing and future research in this area, we have convened a working group to study empirical methodology in navigation research. The present document summarizes the consensus recommendations of this working group. We discuss different problem statements and the role of generalization, present evaluation measures, and provide standard scenarios that can be used for benchmarking.","one_line_summary":"Consensus recommendations for standardized evaluation measures, problem statements, and benchmarking scenarios in embodied navigation research.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the research community will adopt the proposed evaluation measures and standard scenarios rather than continuing with incompatible custom protocols.","pith_extraction_headline":"Embodied navigation research requires standardized evaluation measures and scenarios to allow direct comparison of agents."},"references":{"count":32,"sample":[{"doi":"","year":2018,"title":"P. Anderson, Q. Wu, D. Teney, J. Bruce, M. Johnson, N. S ¨underhauf, I. Reid, S. Gould, and A. van den Hen- gel. Vision-and-language navigation: Interpreting visually- grounded navigation instructions","work_id":"0218f0cb-7cb9-4892-b121-4c95cbb87529","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"DeepMind Lab","work_id":"8a8d827f-5377-4733-bfe8-bc66c011d458","ref_index":2,"cited_arxiv_id":"1612.03801","is_internal_anchor":false},{"doi":"","year":2017,"title":"S. Brahmbhatt and J. Hays. DeepNav: Learning to navigate large cities. In CVPR, 2017","work_id":"944ac7b5-c225-47ea-b579-7895e72683d6","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"S. Brodeur, E. Perez, A. Anand, F. Golemo, L. Celotti, F. Strub, J. Rouat, H. Larochelle, and A. C. Courville. HoME: A household multimodal environment. arXiv:1711.11017, 2017","work_id":"c72fdedd-f8c1-43cb-9ad9-4fdc4d2a01cf","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1993,"title":"R. A. Brooks and M. J. Mataric. Real robots, real learning problems. In Robot Learning. 1993","work_id":"57766d11-a763-4d9f-9e11-5afbe33b5df7","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":32,"snapshot_sha256":"1c7dfb48d21ab7ff9f2b3fca3c4cc1390a2e8a39566dd09c9f7e3057aa249b24","internal_anchors":1},"formal_canon":{"evidence_count":2,"snapshot_sha256":"1b22cdc038cf28480ff88cb314ed6ad2c813b90af39eb2b8f1f55d8ccccc127b"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}