{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:XJ3JXN752PRA6MHC4ZS4PXUZLT","short_pith_number":"pith:XJ3JXN75","schema_version":"1.0","canonical_sha256":"ba769bb7fdd3e20f30e2e665c7de995cf7eb435fa991e42cbde370a7f699b1fa","source":{"kind":"arxiv","id":"2508.19236","version":2},"attestation_state":"computed","paper":{"title":"MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"MemoryVLA adds a perceptual-cognitive memory bank to vision-language-action models to supply temporal context for long-horizon robotic manipulation.","cross_cats":["cs.CV"],"primary_cat":"cs.RO","authors_text":"Bin Xie, Erjin Zhou, Fengrong Liu, Gao Huang, Haoqiang Fan, Hao Shi, Lin Sun, Tiancai Wang, Xiangyu Zhang, Yingfei Liu","submitted_at":"2025-08-26T17:57:16Z","abstract_excerpt":"Temporal context is essential for robotic manipulation because such tasks are inherently non-Markovian, yet mainstream VLA models typically overlook it and struggle with long-horizon, temporally dependent tasks. Cognitive science suggests that humans rely on working memory to buffer short-lived representations for immediate control, while the hippocampal system preserves verbatim episodic details and semantic gist of past experience for long-term memory. Inspired by these mechanisms, we propose MemoryVLA, a Cognition-Memory-Action framework for long-horizon robotic manipulation. A pretrained V"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2508.19236","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.RO","submitted_at":"2025-08-26T17:57:16Z","cross_cats_sorted":["cs.CV"],"title_canon_sha256":"7fe391cae22206b215ed6d4e34c19b7986d01255dd96a3ec7d4e4ea657e09818","abstract_canon_sha256":"c91aa3c92e1fbb96721d9fcce54aa812e4d716f28bce92fb79cccf5b9c37e3fe"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:50.222656Z","signature_b64":"xcoU5Mbe+Of9oBQ/akdPt150sjynD8+33txWiH5i63OzBBmeLsGwtj6hMJMOOExS6/ouNsevf6l/pFUmTQvFAw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"ba769bb7fdd3e20f30e2e665c7de995cf7eb435fa991e42cbde370a7f699b1fa","last_reissued_at":"2026-05-17T23:38:50.222119Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:50.222119Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"MemoryVLA: Perceptual-Cognitive Memory in Vision-Language-Action Models for Robotic Manipulation","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"MemoryVLA adds a perceptual-cognitive memory bank to vision-language-action models to supply temporal context for long-horizon robotic manipulation.","cross_cats":["cs.CV"],"primary_cat":"cs.RO","authors_text":"Bin Xie, Erjin Zhou, Fengrong Liu, Gao Huang, Haoqiang Fan, Hao Shi, Lin Sun, Tiancai Wang, Xiangyu Zhang, Yingfei Liu","submitted_at":"2025-08-26T17:57:16Z","abstract_excerpt":"Temporal context is essential for robotic manipulation because such tasks are inherently non-Markovian, yet mainstream VLA models typically overlook it and struggle with long-horizon, temporally dependent tasks. Cognitive science suggests that humans rely on working memory to buffer short-lived representations for immediate control, while the hippocampal system preserves verbatim episodic details and semantic gist of past experience for long-term memory. Inspired by these mechanisms, we propose MemoryVLA, a Cognition-Memory-Action framework for long-horizon robotic manipulation. A pretrained V"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"On 12 real-world tasks spanning general skills and long-horizon temporal dependencies, MemoryVLA achieves 84.0% success rate, with long-horizon tasks showing a +26 improvement over state-of-the-art baseline.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the adaptive retrieval, fusion, and redundancy-merging operations in the Perceptual-Cognitive Memory Bank will reliably supply temporally relevant context without introducing noise or stale entries that degrade action generation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"MemoryVLA introduces a perceptual-cognitive memory bank and working-memory retrieval mechanism into VLA models, raising success rates on long-horizon robotic tasks by up to 26 points over prior baselines.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"MemoryVLA adds a perceptual-cognitive memory bank to vision-language-action models to supply temporal context for long-horizon robotic manipulation.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a83598c45f7663ac45a09f74942c4c937c4139903d790b76888d527ad51e71a7"},"source":{"id":"2508.19236","kind":"arxiv","version":2},"verdict":{"id":"729cc02f-cd30-434c-9c26-de96826b76da","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T20:39:39.773638Z","strongest_claim":"On 12 real-world tasks spanning general skills and long-horizon temporal dependencies, MemoryVLA achieves 84.0% success rate, with long-horizon tasks showing a +26 improvement over state-of-the-art baseline.","one_line_summary":"MemoryVLA introduces a perceptual-cognitive memory bank and working-memory retrieval mechanism into VLA models, raising success rates on long-horizon robotic tasks by up to 26 points over prior baselines.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the adaptive retrieval, fusion, and redundancy-merging operations in the Perceptual-Cognitive Memory Bank will reliably supply temporally relevant context without introducing noise or stale entries that degrade action generation.","pith_extraction_headline":"MemoryVLA adds a perceptual-cognitive memory bank to vision-language-action models to supply temporal context for long-horizon robotic manipulation."},"references":{"count":41,"sample":[{"doi":"","year":null,"title":"GPT-4 Technical Report","work_id":"b928e041-6991-4c08-8c81-0359e4097c7b","ref_index":1,"cited_arxiv_id":"2303.08774","is_internal_anchor":true},{"doi":"","year":null,"title":"Qwen Technical Report","work_id":"bb1fd52f-6b2f-437c-9516-37bdf6eb9be8","ref_index":2,"cited_arxiv_id":"2309.16609","is_internal_anchor":true},{"doi":"","year":null,"title":"RT-1: Robotics Transformer for Real-World Control at Scale","work_id":"e11bda85-8531-46bc-a07f-d0ade3643ab1","ref_index":3,"cited_arxiv_id":"2212.06817","is_internal_anchor":true},{"doi":"","year":null,"title":"RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control","work_id":"ff438a8a-8003-4fae-9131-acd418b3597b","ref_index":4,"cited_arxiv_id":"2307.15818","is_internal_anchor":true},{"doi":"","year":null,"title":"AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems","work_id":"f797e9ec-510f-43a7-8a0c-18009ce332e5","ref_index":5,"cited_arxiv_id":"2503.06669","is_internal_anchor":true}],"resolved_work":41,"snapshot_sha256":"b812842533576896938c3bdbb8ddee537924dfed431ff75851c6312045c12ade","internal_anchors":20},"formal_canon":{"evidence_count":2,"snapshot_sha256":"ca37bcb2a8e6f0961e446b609276285aa249012eab828fb798de3ca4ed563667"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2508.19236","created_at":"2026-05-17T23:38:50.222210+00:00"},{"alias_kind":"arxiv_version","alias_value":"2508.19236v2","created_at":"2026-05-17T23:38:50.222210+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2508.19236","created_at":"2026-05-17T23:38:50.222210+00:00"},{"alias_kind":"pith_short_12","alias_value":"XJ3JXN752PRA","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"XJ3JXN752PRA6MHC","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"XJ3JXN75","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":31,"internal_anchor_count":31,"sample":[{"citing_arxiv_id":"2605.21862","citing_title":"EvoScene-VLA: Evolving Scene Beliefs Inside the Action Decoder for Chunked Robot Control","ref_index":30,"is_internal_anchor":true},{"citing_arxiv_id":"2605.22671","citing_title":"From Abstraction to Instantiation: Learning Behavioral Representation for Vision-Language-Action Model","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2602.20200","citing_title":"Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation","ref_index":42,"is_internal_anchor":true},{"citing_arxiv_id":"2602.19710","citing_title":"Universal Pose Pretraining for Generalizable Vision-Language-Action Policies","ref_index":36,"is_internal_anchor":true},{"citing_arxiv_id":"2602.18532","citing_title":"VLANeXt: Recipes for Building Strong VLA Models","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2511.14148","citing_title":"AsyncVLA: Asynchronous Flow Matching for Vision-Language-Action Models","ref_index":61,"is_internal_anchor":true},{"citing_arxiv_id":"2512.09928","citing_title":"HiF-VLA: Hindsight, Insight and Foresight through Motion Representation for Vision-Language-Action Models","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2602.11183","citing_title":"Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2602.09725","citing_title":"Efficient Remote KV Cache Reuse with GPU-native Video Codec","ref_index":56,"is_internal_anchor":true},{"citing_arxiv_id":"2602.20323","citing_title":"PhysMem: Scaling Test-Time Memory for Embodied Physical Reasoning","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2603.10126","citing_title":"AR-VLA: True Autoregressive Action Expert for Vision-Language-Action Models","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2603.15620","citing_title":"Towards Generalizable Robotic Manipulation in Dynamic Environments","ref_index":47,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11459","citing_title":"Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12624","citing_title":"MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving","ref_index":76,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12624","citing_title":"MindVLA-U1: VLA Beats VA with Unified Streaming Architecture for Autonomous Driving","ref_index":76,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10094","citing_title":"Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10993","citing_title":"ECHO: Continuous Hierarchical Memory for Vision-Language-Action Models","ref_index":4,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11459","citing_title":"Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2601.21998","citing_title":"Causal World Modeling for Robot Control","ref_index":65,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10094","citing_title":"Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10921","citing_title":"RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2604.24622","citing_title":"CF-VLA: Efficient Coarse-to-Fine Action Generation for Vision-Language-Action Policies","ref_index":40,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06481","citing_title":"OA-WAM: Object-Addressable World Action Model for Robust Robot Manipulation","ref_index":66,"is_internal_anchor":true},{"citing_arxiv_id":"2605.02525","citing_title":"A Semantic Autonomy Framework for VLM-Integrated Indoor Mobile Robots: Hybrid Deterministic Reasoning and Cross-Robot Adaptive Memory","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2604.18933","citing_title":"Gated Memory Policy","ref_index":43,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT","json":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT.json","graph_json":"https://pith.science/api/pith-number/XJ3JXN752PRA6MHC4ZS4PXUZLT/graph.json","events_json":"https://pith.science/api/pith-number/XJ3JXN752PRA6MHC4ZS4PXUZLT/events.json","paper":"https://pith.science/paper/XJ3JXN75"},"agent_actions":{"view_html":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT","download_json":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT.json","view_paper":"https://pith.science/paper/XJ3JXN75","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2508.19236&json=true","fetch_graph":"https://pith.science/api/pith-number/XJ3JXN752PRA6MHC4ZS4PXUZLT/graph.json","fetch_events":"https://pith.science/api/pith-number/XJ3JXN752PRA6MHC4ZS4PXUZLT/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT/action/timestamp_anchor","attest_storage":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT/action/storage_attestation","attest_author":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT/action/author_attestation","sign_citation":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT/action/citation_signature","submit_replication":"https://pith.science/pith/XJ3JXN752PRA6MHC4ZS4PXUZLT/action/replication_record"}},"created_at":"2026-05-17T23:38:50.222210+00:00","updated_at":"2026-05-17T23:38:50.222210+00:00"}