{"paper":{"title":"mHC: Manifold-Constrained Hyper-Connections","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Projecting hyper-connection residuals onto a manifold restores identity mapping for stable large-scale training.","cross_cats":["cs.AI","cs.LG"],"primary_cat":"cs.CL","authors_text":"Chenggang Zhao, Chengqi Deng, Damai Dai, Huanqi Cao, Huazuo Gao, Jiang Chang, Jiashi Li, Jingyang Yuan, Kuai Yu, Lean Wang, Liang Zhao, Shangyan Zhou, Shengding Hu, Wangding Zeng, Wenfeng Liang, Yixuan Wei, Yuqing Wang, Zhean Xu, Zhenda Xie, Zhengyan Zhang","submitted_at":"2025-12-31T14:16:26Z","abstract_excerpt":"Recently, studies exemplified by Hyper-Connections (HC) have extended the ubiquitous residual connection paradigm established over the past decade by expanding the residual stream width and diversifying connectivity patterns. While yielding substantial performance gains, this diversification fundamentally compromises the identity mapping property intrinsic to the residual connection, which causes severe training instability and restricted scalability, and additionally incurs notable memory access overhead. To address these challenges, we propose Manifold-Constrained Hyper-Connections (mHC), a "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That projecting the residual connection space of HC onto a specific manifold restores the identity mapping property while preserving the performance benefits of diversified connectivity patterns.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"mHC projects hyper-connection residual spaces onto a manifold to restore identity mapping, enabling stable large-scale training with performance gains over standard HC.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Projecting hyper-connection residuals onto a manifold restores identity mapping for stable large-scale training.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"17e000abd2449a17e189076e3589062aa814b8ac4cca582d189eb250ac19f210"},"source":{"id":"2512.24880","kind":"arxiv","version":2},"verdict":{"id":"1db5bded-9d50-4a8f-a142-203fbdf82210","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T12:27:06.891339Z","strongest_claim":"Empirical experiments demonstrate that mHC is effective for training at scale, offering tangible performance improvements and superior scalability.","one_line_summary":"mHC projects hyper-connection residual spaces onto a manifold to restore identity mapping, enabling stable large-scale training with performance gains over standard HC.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That projecting the residual connection space of HC onto a specific manifold restores the identity mapping property while preserving the performance benefits of diversified connectivity patterns.","pith_extraction_headline":"Projecting hyper-connection residuals onto a manifold restores identity mapping for stable large-scale training."},"references":{"count":44,"sample":[{"doi":"","year":null,"title":"Proceedings of the IEEE conference on computer vision and pattern recognition , pages=","work_id":"da360c40-6481-4088-bd96-8e73e0280a6b","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2016,"title":"European conference on computer vision , pages=","work_id":"98803d46-9f44-4a58-94fe-a514b8f31627","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Proceedings of the IEEE conference on computer vision and pattern recognition , pages=","work_id":"07997e38-6b8e-4f93-8aeb-d77afb7966d9","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"FractalNet: Ultra-deep neural net- works without residuals","work_id":"ecf74c2e-7eed-4017-8f0b-f90686110609","ref_index":4,"cited_arxiv_id":"1605.07648","is_internal_anchor":true},{"doi":"","year":null,"title":"Advances in neural information processing systems , volume=","work_id":"a1fd09f1-b62b-4aca-a5ef-dd2b50ad08b5","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":44,"snapshot_sha256":"fa2d26e389bbda4565faf823a7f0e68e5bae6eec7aa1a632631609ffc8c391db","internal_anchors":14},"formal_canon":{"evidence_count":2,"snapshot_sha256":"1c8b30c8d301be5d14cf53712733a2547c3c7bb3792f52a2a5541cb3a20d7994"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}