{"paper":{"title":"LASAR: Towards Spatio-temporal Reasoning with Latent Cognitive Map","license":"http://creativecommons.org/licenses/by/4.0/","headline":"A dual-memory architecture with contrastive spatio-temporal learning builds latent cognitive maps for embodied agents.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Jinzhou Tang, Keze Wang, Sidi Liu, Waikit Xiu, Weixing Chen","submitted_at":"2026-05-16T09:21:56Z","abstract_excerpt":"A fundamental challenge in embodied AI is verifying if agents build internal models of spatial structure or merely learn to mimic task-specific expert trajectories. This is critical as foundational approaches rooted in action-centric tasks (e.g., VLN) and reasoning-centric tasks (e.g., EQA) often share a common limitation: they lack a learning signal that forces them to encode fine-grained spatial relationships (like topology or distance) over long-range, fragmented experiences. To address this, we first propose LASAR, an architecture featuring a dual-memory system designed to maintain both ep"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments demonstrate that our method achieves 2%-3.5% gains in both zero-shot generalization on standard VLN-CE and VSI-Bench benchmarks. We also demonstrate that our proposed cognitive map has high self-consistency.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"Foundational approaches rooted in action-centric tasks (e.g., VLN) and reasoning-centric tasks (e.g., EQA) lack a learning signal that forces them to encode fine-grained spatial relationships (like topology or distance) over long-range, fragmented experiences; the ST-CRL objective is assumed to supply exactly that missing signal.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"LASAR pairs a dual-memory system with spatio-temporal contrastive learning to induce latent cognitive maps, reporting 2-3.5% zero-shot gains on VLN-CE and VSI-Bench plus high map self-consistency.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A dual-memory architecture with contrastive spatio-temporal learning builds latent cognitive maps for embodied agents.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"7412bee171c37671f7cc0661cef794d8e13d4625c897e103e95c634d9e2c94c1"},"source":{"id":"2605.16899","kind":"arxiv","version":1},"verdict":{"id":"47f39333-f267-418c-b6f0-ecef42137a8c","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T21:31:15.424475Z","strongest_claim":"Experiments demonstrate that our method achieves 2%-3.5% gains in both zero-shot generalization on standard VLN-CE and VSI-Bench benchmarks. We also demonstrate that our proposed cognitive map has high self-consistency.","one_line_summary":"LASAR pairs a dual-memory system with spatio-temporal contrastive learning to induce latent cognitive maps, reporting 2-3.5% zero-shot gains on VLN-CE and VSI-Bench plus high map self-consistency.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"Foundational approaches rooted in action-centric tasks (e.g., VLN) and reasoning-centric tasks (e.g., EQA) lack a learning signal that forces them to encode fine-grained spatial relationships (like topology or distance) over long-range, fragmented experiences; the ST-CRL objective is assumed to supply exactly that missing signal.","pith_extraction_headline":"A dual-memory architecture with contrastive spatio-temporal learning builds latent cognitive maps for embodied agents."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.16899/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"doi_title_agreement","ran_at":"2026-05-19T22:01:19.531210Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T21:40:53.102671Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"cited_work_retraction","ran_at":"2026-05-19T20:52:41.004959Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"claim_evidence","ran_at":"2026-05-19T18:41:56.278020Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"ai_meta_artifact","ran_at":"2026-05-19T18:33:26.356921Z","status":"skipped","version":"1.0.0","findings_count":0}],"snapshot_sha256":"ee862382319554abfa35cc9514f219bd8ea129982404d03313bfb2b53803fae7"},"references":{"count":73,"sample":[{"doi":"","year":2023,"title":"Self-supervised object detection from egocentric videos","work_id":"a6a4b4eb-a1cd-4f11-b912-299e5b0c32ef","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"Vision- and-language navigation: Interpreting visually-grounded navigation instructions in real environments","work_id":"a9820bf7-7e1b-470c-b002-2c2beab07ef3","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"org/abs/2510.19818","work_id":"4849b0f4-7516-4161-abbb-421b009d9a71","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2020,"title":"Clark, and Aaron Wilber","work_id":"b74f8447-77fa-4bc0-ad60-190a22b70eb2","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2023,"title":"Rt-2: Vision-language-action mod- els transfer web knowledge to robotic control, 2023","work_id":"e158e303-e769-4cf6-8788-77ac3138d07c","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":73,"snapshot_sha256":"fe607f4417e0e79b8a0219f94e5fb3810383f6aa623ad5eda4f584ef6ba56e83","internal_anchors":12},"formal_canon":{"evidence_count":2,"snapshot_sha256":"baea8f8b72b99bb0582a2a9fbd6a5b20075529f2953c6d3f28fc7c998a3111ad"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}