{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:U7EFB22JO2J4IDTVUHJU76CAOS","short_pith_number":"pith:U7EFB22J","schema_version":"1.0","canonical_sha256":"a7c850eb497693c40e75a1d34ff84074b5524167c32ff72f9632a4524f8fa5c2","source":{"kind":"arxiv","id":"2512.24880","version":2},"attestation_state":"computed","paper":{"title":"mHC: Manifold-Constrained Hyper-Connections","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Projecting hyper-connection residuals onto a manifold restores identity mapping for stable large-scale training.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CL","authors_text":"Chenggang Zhao, Chengqi Deng, Damai Dai, Huanqi Cao, Huazuo Gao, Jiang Chang, Jiashi Li, Jingyang Yuan, Kuai Yu, Lean Wang, Liang Zhao, Shangyan Zhou, Shengding Hu, Wangding Zeng, Wenfeng Liang, Yixuan Wei, Yuqing Wang, Zhean Xu, Zhenda Xie, Zhengyan Zhang","submitted_at":"2025-12-31T14:16:26Z","abstract_excerpt":"Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2512.24880","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CL","submitted_at":"2025-12-31T14:16:26Z","cross_cats_sorted":["cs.AI","cs.LG"],"title_canon_sha256":"9993aff5599a2f3a5b8c8f95353f9664ca12b81962583c7dd2d607e1559256e4","abstract_canon_sha256":"695fd47b14de1eaf1636c62dc4b0d17786c118c29a5ee24449d7df3f1a067fc1"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:38:52.548199Z","signature_b64":"by54lrvxeDwD/pulG5XE8VRY/kv7jKKq6X55kSPA4nsOg6PC9pU5z/n1+rk7n4u9XIURLsuJJ6MK2u6zdDtoCQ==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"a7c850eb497693c40e75a1d34ff84074b5524167c32ff72f9632a4524f8fa5c2","last_reissued_at":"2026-05-17T23:38:52.547741Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:38:52.547741Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"mHC: Manifold-Constrained Hyper-Connections","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Projecting hyper-connection residuals onto a manifold restores identity mapping for stable large-scale training.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CL","authors_text":"Chenggang Zhao, Chengqi Deng, Damai Dai, Huanqi Cao, Huazuo Gao, Jiang Chang, Jiashi Li, Jingyang Yuan, Kuai Yu, Lean Wang, Liang Zhao, Shangyan Zhou, Shengding Hu, Wangding Zeng, Wenfeng Liang, Yixuan Wei, Yuqing Wang, Zhean Xu, Zhenda Xie, Zhengyan Zhang","submitted_at":"2025-12-31T14:16:26Z","abstract_excerpt":"Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That projecting the residual connection space of HC onto a specific manifold restores the identity mapping property while preserving the performance benefits of diversified connectivity patterns.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"mHC projects hyper-connection residual spaces onto a manifold to restore identity mapping, enabling stable large-scale training with performance gains over standard HC.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Projecting hyper-connection residuals onto a manifold restores identity mapping for stable large-scale training.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"17e000abd2449a17e189076e3589062aa814b8ac4cca582d189eb250ac19f210"},"source":{"id":"2512.24880","kind":"arxiv","version":2},"verdict":{"id":"1db5bded-9d50-4a8f-a142-203fbdf82210","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T12:27:06.891339Z","strongest_claim":"Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability.","one_line_summary":"mHC projects hyper-connection residual spaces onto a manifold to restore identity mapping, enabling stable large-scale training with performance gains over standard HC.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That projecting the residual connection space of HC onto a specific manifold restores the identity mapping property while preserving the performance benefits of diversified connectivity patterns.","pith_extraction_headline":"Projecting hyper-connection residuals onto a manifold restores identity mapping for stable large-scale training."},"references":{"count":44,"sample":[{"doi":"","year":null,"title":"Proceedings of the IEEE conference on computer vision and pattern recognition , pages=","work_id":"da360c40-6481-4088-bd96-8e73e0280a6b","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"European conference on computer vision , pages=","work_id":"98803d46-9f44-4a58-94fe-a514b8f31627","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Proceedings of the IEEE conference on computer vision and pattern recognition , pages=","work_id":"07997e38-6b8e-4f93-8aeb-d77afb7966d9","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"FractalNet: Ultra-deep neural net- works without residuals","work_id":"ecf74c2e-7eed-4017-8f0b-f90686110609","ref_index":4,"cited_arxiv_id":"1605.07648","is_internal_anchor":true},{"doi":"","year":null,"title":"Advances in neural information processing systems , volume=","work_id":"a1fd09f1-b62b-4aca-a5ef-dd2b50ad08b5","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":44,"snapshot_sha256":"fa2d26e389bbda4565faf823a7f0e68e5bae6eec7aa1a632631609ffc8c391db","internal_anchors":14},"formal_canon":{"evidence_count":2,"snapshot_sha256":"1c8b30c8d301be5d14cf53712733a2547c3c7bb3792f52a2a5541cb3a20d7994"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2512.24880","created_at":"2026-05-17T23:38:52.547812+00:00"},{"alias_kind":"arxiv_version","alias_value":"2512.24880v2","created_at":"2026-05-17T23:38:52.547812+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2512.24880","created_at":"2026-05-17T23:38:52.547812+00:00"},{"alias_kind":"pith_short_12","alias_value":"U7EFB22JO2J4","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"U7EFB22JO2J4IDTV","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"U7EFB22J","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":37,"internal_anchor_count":37,"sample":[{"citing_arxiv_id":"2605.23467","citing_title":"S$^3$GNN: Efficient Global Mixing and Local Message Passing for Long-Range Graph Learning","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2605.23259","citing_title":"Multi-Gate Residuals","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"2602.08064","citing_title":"SiameseNorm: Breaking the Barrier to Reconciling Pre/Post-Norm","ref_index":25,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21724","citing_title":"TBP-mHC: full expressivity for manifold-constrained hyper connections through transportation polytopes","ref_index":21,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18848","citing_title":"Exact Linear Attention","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12374","citing_title":"Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2603.15031","citing_title":"Attention Residuals","ref_index":60,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20708","citing_title":"Rethinking Cross-Layer Information Routing in Diffusion Transformers","ref_index":60,"is_internal_anchor":true},{"citing_arxiv_id":"2605.06501","citing_title":"Cubit: Token Mixer with Kernel Ridge Regression","ref_index":89,"is_internal_anchor":true},{"citing_arxiv_id":"2605.15793","citing_title":"AOT-POT: Adaptive Operator Transformation for Large-Scale PDE Pre-training","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18855","citing_title":"Delta Attention Residuals","ref_index":31,"is_internal_anchor":true},{"citing_arxiv_id":"2605.18848","citing_title":"Exact Linear Attention","ref_index":8,"is_internal_anchor":true},{"citing_arxiv_id":"2605.17842","citing_title":"SNLP: Layer-Parallel Inference via Structured Newton Corrections","ref_index":44,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12374","citing_title":"Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2601.00417","citing_title":"Deep Delta Learning","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2601.18832","citing_title":"The Geometric Reasoner: Manifold-Informed Latent Foresight Search for Long-Context Reasoning","ref_index":23,"is_internal_anchor":true},{"citing_arxiv_id":"2603.13381","citing_title":"Beyond Linearity in Attention Projections: The Case for Nonlinear Queries","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03263","citing_title":"LPC-SM: Local Predictive Coding and Sparse Memory for Long-Context Language Modeling","ref_index":32,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12374","citing_title":"Fill the GAP: A Granular Alignment Paradigm for Visual Reasoning in Multimodal Large Language Models","ref_index":24,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11172","citing_title":"Optimistic Dual Averaging Unifies Modern Optimizers","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.11526","citing_title":"Efficient and provably convergent end-to-end training of deep neural networks with linear constraints","ref_index":75,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08300","citing_title":"mHC-SSM: Manifold-Constrained Hyper-Connections for State Space Language Models with Stream-Specialized Adapters","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.03953","citing_title":"Transformers with Selective Access to Early Representations","ref_index":17,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23705","citing_title":"Can an MLP Absorb Its Own Skip Connection?","ref_index":11,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23036","citing_title":"Preserving Long-Tailed Expert Information in Mixture-of-Experts Tuning","ref_index":31,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS","json":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS.json","graph_json":"https://pith.science/api/pith-number/U7EFB22JO2J4IDTVUHJU76CAOS/graph.json","events_json":"https://pith.science/api/pith-number/U7EFB22JO2J4IDTVUHJU76CAOS/events.json","paper":"https://pith.science/paper/U7EFB22J"},"agent_actions":{"view_html":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS","download_json":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS.json","view_paper":"https://pith.science/paper/U7EFB22J","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2512.24880&json=true","fetch_graph":"https://pith.science/api/pith-number/U7EFB22JO2J4IDTVUHJU76CAOS/graph.json","fetch_events":"https://pith.science/api/pith-number/U7EFB22JO2J4IDTVUHJU76CAOS/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS/action/timestamp_anchor","attest_storage":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS/action/storage_attestation","attest_author":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS/action/author_attestation","sign_citation":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS/action/citation_signature","submit_replication":"https://pith.science/pith/U7EFB22JO2J4IDTVUHJU76CAOS/action/replication_record"}},"created_at":"2026-05-17T23:38:52.547812+00:00","updated_at":"2026-05-17T23:38:52.547812+00:00"}