{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2022:UE5VBB4PDTZCIIKHQ3JQNGQNUS","short_pith_number":"pith:UE5VBB4P","schema_version":"1.0","canonical_sha256":"a13b50878f1cf224214786d3069a0da48079ab3f804d1f329f053962b4124946","source":{"kind":"arxiv","id":"2208.04933","version":3},"attestation_state":"computed","paper":{"title":"Simplified State Space Layers for Sequence Modeling","license":"http://creativecommons.org/licenses/by/4.0/","headline":"S5 uses one multi-input multi-output state space model to match S4 performance and efficiency on long sequences","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Andrew Warrington, Jimmy T.H. Smith, Scott W. Linderman","submitted_at":"2022-08-09T17:57:43Z","abstract_excerpt":"Models using structured state space sequence (S4) layers have achieved state-of-the-art performance on long-range sequence modeling tasks. An S4 layer combines linear state space models (SSMs), the HiPPO framework, and deep learning to achieve high performance. We build on the design of the S4 layer and introduce a new state space layer, the S5 layer. Whereas an S4 layer uses many independent single-input, single-output SSMs, the S5 layer uses one multi-input, multi-output SSM. We establish a connection between S5 and S4, and use this to develop the initialization and parameterization used by "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2208.04933","kind":"arxiv","version":3},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.LG","submitted_at":"2022-08-09T17:57:43Z","cross_cats_sorted":[],"title_canon_sha256":"d9d336309075397c0e854e1e82fa03d3b50fef0684cd6beda25e9c71242dc572","abstract_canon_sha256":"c860adae180ffe3f0a2fa9861d88e17270de679593db299799e028cac402c806"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:48.531861Z","signature_b64":"JHso3Ah8KEG3eLF8k5079nkBM/jatg0POvbNRXflbJYR0KENEq/seBA6HnEudM+TSU+uzxNJd4QkPxBS2EhJBg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"a13b50878f1cf224214786d3069a0da48079ab3f804d1f329f053962b4124946","last_reissued_at":"2026-05-17T23:38:48.531338Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:48.531338Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Simplified State Space Layers for Sequence Modeling","license":"http://creativecommons.org/licenses/by/4.0/","headline":"S5 uses one multi-input multi-output state space model to match S4 performance and efficiency on long sequences","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Andrew Warrington, Jimmy T.H. Smith, Scott W. Linderman","submitted_at":"2022-08-09T17:57:43Z","abstract_excerpt":"Models using structured state space sequence (S4) layers have achieved state-of-the-art performance on long-range sequence modeling tasks. An S4 layer combines linear state space models (SSMs), the HiPPO framework, and deep learning to achieve high performance. We build on the design of the S4 layer and introduce a new state space layer, the S5 layer. Whereas an S4 layer uses many independent single-input, single-output SSMs, the S5 layer uses one multi-input, multi-output SSM. We establish a connection between S5 and S4, and use this to develop the initialization and parameterization used by "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"S5 averages 87.4% on the long range arena benchmark, and 98.5% on the most difficult Path-X task, while matching the computational efficiency of S4.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the initialization and parameterization obtained by connecting S5 to S4 will produce stable, high-performing models across tasks without additional per-task tuning or post-hoc adjustments.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"S5 uses one multi-input multi-output state space model to match S4 performance and efficiency on long sequences","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"1efe71fde0b11874e3355af44cda47441e62399a349c8169887cfac653882a2f"},"source":{"id":"2208.04933","kind":"arxiv","version":3},"verdict":{"id":"8c0e75c5-dd04-4f21-ae41-5fab322ac174","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T08:13:20.624547Z","strongest_claim":"S5 averages 87.4% on the long range arena benchmark, and 98.5% on the most difficult Path-X task, while matching the computational efficiency of S4.","one_line_summary":"S5 uses a single MIMO state space model with S4-derived initialization to match S4 efficiency and reach 87.4% average accuracy on the Long Range Arena benchmark.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the initialization and parameterization obtained by connecting S5 to S4 will produce stable, high-performing models across tasks without additional per-task tuning or post-hoc adjustments.","pith_extraction_headline":"S5 uses one multi-input multi-output state space model to match S4 performance and efficiency on long sequences"},"references":{"count":149,"sample":[{"doi":"","year":null,"title":"Advances in Neural Information Processing Systems , year=","work_id":"aa07cf43-b14f-441f-a613-12788839a164","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Voelker, Aaron and Kaji. Legendre. Advances in Neural Information Processing Systems , volume=","work_id":"3768e064-523e-426a-b291-c47451d3de0f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Goel, Karan and Gu, Albert and Donahue, Chris and Re, Christopher , booktitle =. It’s Raw!. 2022 , volume =","work_id":"8eec0f5d-f0c2-4d4e-8239-33507023bff6","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Gu, Albert and Dao, Tri and Ermon, Stefano and Rudra, Atri and R. Hi. Advances in Neural Information Processing Systems , volume=","work_id":"e6142161-6b2f-4ba5-91c2-49d10140405d","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.48550/arxiv.2006.03274","year":2020,"title":"2020 , copyright =","work_id":"98408eae-06e2-4781-915c-41237b2aaeca","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":149,"snapshot_sha256":"cd6053c7f70f6d3fc54deb85f8f5a0eaf6b56eb6520a9d0fcae01d1e23c50534","internal_anchors":6},"formal_canon":{"evidence_count":3,"snapshot_sha256":"d6b8db7edd36e197a50ebf47dd5971e6c49083cccb061129bdea5adfe1ad8529"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2208.04933","created_at":"2026-05-17T23:38:48.531414+00:00"},{"alias_kind":"arxiv_version","alias_value":"2208.04933v3","created_at":"2026-05-17T23:38:48.531414+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2208.04933","created_at":"2026-05-17T23:38:48.531414+00:00"},{"alias_kind":"pith_short_12","alias_value":"UE5VBB4PDTZC","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_16","alias_value":"UE5VBB4PDTZCIIKH","created_at":"2026-05-18T12:33:33.725879+00:00"},{"alias_kind":"pith_short_8","alias_value":"UE5VBB4P","created_at":"2026-05-18T12:33:33.725879+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":31,"internal_anchor_count":31,"sample":[{"citing_arxiv_id":"2404.07106","citing_title":"3DMambaComplete: Exploring Structured State Space Model for Point Cloud Completion","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2411.18328","citing_title":"EventCrab: Harnessing Frame and Point Synergy for Event-based Action Recognition and Beyond","ref_index":43,"is_internal_anchor":true},{"citing_arxiv_id":"2503.23818","citing_title":"L2RU: a Structured State Space Model with prescribed L2-bound","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2503.18970","citing_title":"Advancing Intelligent Sequence Modeling: Evolution, Trade-offs, and Applications of State- Space Architectures from S4 to Mamba","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21174","citing_title":"Exact expression for maximum Lyapunov exponent during transients in computationally powerful dynamical networks","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06501","citing_title":"Cubit: Token Mixer with Kernel Ridge Regression","ref_index":73,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19826","citing_title":"Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support","ref_index":60,"is_internal_anchor":true},{"citing_arxiv_id":"2506.06374","citing_title":"SiLIF: Structured State Space Model Dynamics and Parametrization for Spiking Neural Networks","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2506.09110","citing_title":"CodeBrain: Bridging Decoupled Tokenizer and Multi-Scale Architecture for EEG Foundation Model","ref_index":56,"is_internal_anchor":true},{"citing_arxiv_id":"2510.04800","citing_title":"Hybrid Architectures for Language Models: Systematic Analysis and Design Insights","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2602.09034","citing_title":"Latent-Space Causal Discovery from Indirect Neuroimaging Observations","ref_index":2,"is_internal_anchor":true},{"citing_arxiv_id":"2601.21293","citing_title":"Physics-Guided Tiny-Mamba Transformer for Reliability-Aware Early Fault Warning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2309.16797","citing_title":"Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution","ref_index":298,"is_internal_anchor":true},{"citing_arxiv_id":"2603.09138","citing_title":"Rotation Equivariant Mamba for Vision Tasks","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2604.16341","citing_title":"Deep Learning for Virtual Reality User Identification: A Benchmark","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2402.19427","citing_title":"Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models","ref_index":29,"is_internal_anchor":true},{"citing_arxiv_id":"2404.14294","citing_title":"A Survey on Efficient Inference for Large Language Models","ref_index":72,"is_internal_anchor":true},{"citing_arxiv_id":"2605.14489","citing_title":"A Novel Schur-Decomposition-Based Weight Projection Method for Stable State-Space Neural-Network Architectures","ref_index":3,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13370","citing_title":"Phasor Memory Networks: Stable Backpropagation Through Time for Scalable Explicit Memory","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13833","citing_title":"QLAM: A Quantum Long-Attention Memory Approach to Long-Sequence Token Modeling","ref_index":22,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11563","citing_title":"TCP-SSM: Efficient Vision State Space Models with Token-Conditioned Poles","ref_index":51,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08539","citing_title":"Continuity Laws for Sequential Models","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2605.09742","citing_title":"TIDES: Implicit Time-Awareness in Selective State Space Models","ref_index":5,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06501","citing_title":"Cubit: Token Mixer with Kernel Ridge Regression","ref_index":73,"is_internal_anchor":true},{"citing_arxiv_id":"2605.05066","citing_title":"The Impossibility Triangle of Long-Context Modeling","ref_index":31,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":3,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS","json":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS.json","graph_json":"https://pith.science/api/pith-number/UE5VBB4PDTZCIIKHQ3JQNGQNUS/graph.json","events_json":"https://pith.science/api/pith-number/UE5VBB4PDTZCIIKHQ3JQNGQNUS/events.json","paper":"https://pith.science/paper/UE5VBB4P"},"agent_actions":{"view_html":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS","download_json":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS.json","view_paper":"https://pith.science/paper/UE5VBB4P","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2208.04933&json=true","fetch_graph":"https://pith.science/api/pith-number/UE5VBB4PDTZCIIKHQ3JQNGQNUS/graph.json","fetch_events":"https://pith.science/api/pith-number/UE5VBB4PDTZCIIKHQ3JQNGQNUS/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS/action/timestamp_anchor","attest_storage":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS/action/storage_attestation","attest_author":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS/action/author_attestation","sign_citation":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS/action/citation_signature","submit_replication":"https://pith.science/pith/UE5VBB4PDTZCIIKHQ3JQNGQNUS/action/replication_record"}},"created_at":"2026-05-17T23:38:48.531414+00:00","updated_at":"2026-05-17T23:38:48.531414+00:00"}