{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2021:NCIPT62RAWCW7N2FGHZVNSKEGH","short_pith_number":"pith:NCIPT62R","schema_version":"1.0","canonical_sha256":"6890f9fb5105856fb74531f356c94431f93bbe0d890b3f99a356cc95c76b3af0","source":{"kind":"arxiv","id":"2109.08238","version":1},"attestation_state":"computed","paper":{"title":"Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"HM3D dataset of 1000 real indoor 3D scenes produces PointGoal navigation agents that achieve top performance on HM3D, Gibson, and MP3D evaluations.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Aaron Gokaslan, Alex Clegg, Andrew Westbury, Angel X. Chang, Dhruv Batra, Eric Undersander, Erik Wijmans, John Turner, Manolis Savva, Oleksandr Maksymets, Santhosh K. Ramakrishnan, Wojciech Galuba, Yili Zhao","submitted_at":"2021-09-16T22:01:24Z","abstract_excerpt":"We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces.\n  HM3D surpasses existing datasets available for academic research in terms of physical scale, completeness of the reconstruction, and visual fidelity. HM3D contains 112.5k m^2 of navigable space, which is 1.4 - 3.7x larger than other building-scale datasets such as MP3D "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2109.08238","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","primary_cat":"cs.CV","submitted_at":"2021-09-16T22:01:24Z","cross_cats_sorted":["cs.AI"],"title_canon_sha256":"c73c1f5f04e172cf3716c3e7a6812c660d72e56af413c77d1c1c460ce9c1c8b2","abstract_canon_sha256":"c1b8bccd81c267a3d91499d8cdf238a95099495019fcfba75b5fc265ce9b3eb9"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:22.208136Z","signature_b64":"G5pwlPXWZIO1n0qHUHh1+4sZsS0ZG7+1qsNuNCCbPW8myvsDnwtiyzXdUKJs1GF3ajWMkPM89Ioo1CK1PWy5DA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"6890f9fb5105856fb74531f356c94431f93bbe0d890b3f99a356cc95c76b3af0","last_reissued_at":"2026-05-17T23:39:22.207461Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:22.207461Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Habitat-Matterport 3D Dataset (HM3D): 1000 Large-scale 3D Environments for Embodied AI","license":"http://creativecommons.org/licenses/by-nc-nd/4.0/","headline":"HM3D dataset of 1000 real indoor 3D scenes produces PointGoal navigation agents that achieve top performance on HM3D, Gibson, and MP3D evaluations.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Aaron Gokaslan, Alex Clegg, Andrew Westbury, Angel X. Chang, Dhruv Batra, Eric Undersander, Erik Wijmans, John Turner, Manolis Savva, Oleksandr Maksymets, Santhosh K. Ramakrishnan, Wojciech Galuba, Yili Zhao","submitted_at":"2021-09-16T22:01:24Z","abstract_excerpt":"We present the Habitat-Matterport 3D (HM3D) dataset. HM3D is a large-scale dataset of 1,000 building-scale 3D reconstructions from a diverse set of real-world locations. Each scene in the dataset consists of a textured 3D mesh reconstruction of interiors such as multi-floor residences, stores, and other private indoor spaces.\n  HM3D surpasses existing datasets available for academic research in terms of physical scale, completeness of the reconstruction, and visual fidelity. HM3D contains 112.5k m^2 of navigable space, which is 1.4 - 3.7x larger than other building-scale datasets such as MP3D "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"HM3D is 'pareto optimal' in the sense that agents trained to perform PointGoal navigation on HM3D achieve the highest performance regardless of whether they are evaluated on HM3D, Gibson, or MP3D.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that the reported performance gains are primarily attributable to the dataset's scale, completeness, and visual fidelity rather than differences in training procedures or evaluation protocols.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"HM3D dataset of 1000 real indoor 3D scenes produces PointGoal navigation agents that achieve top performance on HM3D, Gibson, and MP3D evaluations.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"449b31e5544a2b937c9310a6881bff8d01d244196cf4bd13a097b480587078d7"},"source":{"id":"2109.08238","kind":"arxiv","version":1},"verdict":{"id":"5aad21b3-b658-4956-a2af-5a19478b9a04","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T18:18:41.752925Z","strongest_claim":"HM3D is 'pareto optimal' in the sense that agents trained to perform PointGoal navigation on HM3D achieve the highest performance regardless of whether they are evaluated on HM3D, Gibson, or MP3D.","one_line_summary":"HM3D offers 1000 building-scale 3D environments that are larger and higher-fidelity than existing datasets, enabling better-performing embodied AI agents for tasks like PointGoal navigation.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that the reported performance gains are primarily attributable to the dataset's scale, completeness, and visual fidelity rather than differences in training procedures or evaluation protocols.","pith_extraction_headline":"HM3D dataset of 1000 real indoor 3D scenes produces PointGoal navigation agents that achieve top performance on HM3D, Gibson, and MP3D evaluations."},"references":{"count":46,"sample":[{"doi":"","year":2016,"title":"SceneNN: A scene meshes dataset with annotations","work_id":"1d4ce42b-5f2c-488d-aa37-c4b0789f43b7","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"ScanNet: Richly-annotated 3D reconstructions of indoor scenes","work_id":"cc6a46fb-44e2-4ce9-b9d1-5d230bca31c4","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Joint 2d-3d-semantic data for indoor scene understanding","work_id":"cd49417b-13e3-4652-87f2-c992e78d093a","ref_index":3,"cited_arxiv_id":"1702.01105","is_internal_anchor":true},{"doi":"","year":2017,"title":"Matterport3D: Learning from RGB-D data in indoor environments","work_id":"8c3ec0db-2abe-4446-8dc1-12f0f514a29f","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Zamir, Zhi-Yang He, Alexander Sax, Jitendra Malik, and Silvio Savarese","work_id":"e2650290-571e-4283-b91d-18f5337ae225","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":46,"snapshot_sha256":"607040bbd199ebcaedad8227adb3075227f3a92544b78d8ffec49d8ee6a9f6ad","internal_anchors":6},"formal_canon":{"evidence_count":2,"snapshot_sha256":"ee85e904da31569d4bc599b9f794c00ae03afae2c899641fc436fe1c79fa3f87"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2109.08238","created_at":"2026-05-17T23:39:22.207566+00:00"},{"alias_kind":"arxiv_version","alias_value":"2109.08238v1","created_at":"2026-05-17T23:39:22.207566+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2109.08238","created_at":"2026-05-17T23:39:22.207566+00:00"},{"alias_kind":"pith_short_12","alias_value":"NCIPT62RAWCW","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"NCIPT62RAWCW7N2F","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"NCIPT62R","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":44,"internal_anchor_count":44,"sample":[{"citing_arxiv_id":"2605.23281","citing_title":"DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2403.09905","citing_title":"Personalized Embodied Navigation for Portable Object Finding","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22036","citing_title":"GA-VLN: Geometry-Aware BEV Representation for Efficient Vision-Language Navigation","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2604.21363","citing_title":"A Deployable Embodied Vision-Language Navigation System with Hierarchical Cognition and Context-Aware Exploration","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14504","citing_title":"When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14963","citing_title":"H-OmniStereo: Zero-Shot Omnidirectional Stereo Matching with Heading-Aligned Normal Priors","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18729","citing_title":"Robo-Cortex: A Self-Evolving Embodied Agent via Dual-Grain Cognitive Memory and Autonomous Knowledge Induction","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19600","citing_title":"FlyMirage: A Fully Automated Generation Pipeline for Diverse and Scalable UAV Flight Data via Generative World Model","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2509.10813","citing_title":"InternScenes: A Large-scale Simulatable Indoor Scene Dataset with Realistic Layouts","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2509.16445","citing_title":"FiLM-Nav: Efficient and Generalizable Navigation via VLM Fine-tuning","ref_index":16,"is_internal_anchor":true},{"citing_arxiv_id":"2511.04320","citing_title":"MacroNav: Multi-Task Context Representation Learning Enables Efficient Navigation in Unknown Environments","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2511.16567","citing_title":"POMA-3D: The Point Map Way to 3D Scene Understanding","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2311.12871","citing_title":"An Embodied Generalist Agent in 3D World","ref_index":7,"is_internal_anchor":true},{"citing_arxiv_id":"2412.06224","citing_title":"Uni-NaVid: A Video-based Vision-Language-Action Model for Unifying Embodied Navigation Tasks","ref_index":69,"is_internal_anchor":true},{"citing_arxiv_id":"2512.21714","citing_title":"AstraNav-World: World Model for Foresight Control and Consistency","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2601.16806","citing_title":"An Efficient Insect-inspired Approach for Visual Point-goal Navigation","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2310.06114","citing_title":"Learning Interactive Real-World Simulators","ref_index":47,"is_internal_anchor":true},{"citing_arxiv_id":"2603.02972","citing_title":"TagaVLM: Topology-Aware Global Action Reasoning for Vision-Language Navigation","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2603.18943","citing_title":"VGGT-360: Geometry-Consistent Zero-Shot Panoramic Depth Estimation","ref_index":27,"is_internal_anchor":true},{"citing_arxiv_id":"2603.20530","citing_title":"Memory Over Maps: 3D Object Localization Without Reconstruction","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14504","citing_title":"When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2603.26788","citing_title":"ReMemNav: A Rethinking and Memory-Augmented Framework for Zero-Shot Object Navigation","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2603.21887","citing_title":"IGV-RRT: Prior-Real-Time Observation Fusion for Active Object Search in Changing Environments","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2603.27105","citing_title":"UniDAC: Universal Metric Depth Estimation for Any Camera","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02546","citing_title":"Contrastive Language-Colored Pointmap Pretraining for Unified 3D Scene Understanding","ref_index":44,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH","json":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH.json","graph_json":"https://pith.science/api/pith-number/NCIPT62RAWCW7N2FGHZVNSKEGH/graph.json","events_json":"https://pith.science/api/pith-number/NCIPT62RAWCW7N2FGHZVNSKEGH/events.json","paper":"https://pith.science/paper/NCIPT62R"},"agent_actions":{"view_html":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH","download_json":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH.json","view_paper":"https://pith.science/paper/NCIPT62R","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2109.08238&json=true","fetch_graph":"https://pith.science/api/pith-number/NCIPT62RAWCW7N2FGHZVNSKEGH/graph.json","fetch_events":"https://pith.science/api/pith-number/NCIPT62RAWCW7N2FGHZVNSKEGH/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH/action/timestamp_anchor","attest_storage":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH/action/storage_attestation","attest_author":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH/action/author_attestation","sign_citation":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH/action/citation_signature","submit_replication":"https://pith.science/pith/NCIPT62RAWCW7N2FGHZVNSKEGH/action/replication_record"}},"created_at":"2026-05-17T23:39:22.207566+00:00","updated_at":"2026-05-17T23:39:22.207566+00:00"}