{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2016:XTMPV52MWM2PJDBQXKLMDG7SUW","short_pith_number":"pith:XTMPV52M","schema_version":"1.0","canonical_sha256":"bcd8faf74cb334f48c30ba96c19bf2a5a450e16762219b99ee36f70678c1053f","source":{"kind":"arxiv","id":"1609.09106","version":4},"attestation_state":"computed","paper":{"title":"HyperNetworks","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A hypernetwork generates the weights for another network to enable non-shared weights in LSTMs.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Andrew Dai, David Ha, Quoc V. Le","submitted_at":"2016-09-27T05:57:00Z","abstract_excerpt":"This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernet"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"1609.09106","kind":"arxiv","version":4},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2016-09-27T05:57:00Z","cross_cats_sorted":[],"title_canon_sha256":"8bf1cfa479388c867ba136c02430ed6c91ee9657893763355e6cdb314b78f562","abstract_canon_sha256":"85d2334c445f73488e44c555f4a9dff5581eefd31f01895a8957c256ad2c12be"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T03:23:37.616984Z","signature_b64":"97JBk0O+6zvYNFkUI0bB0O/E50tTS4JzLaE9+sfjQK9rMFzlpBajHHhSr9tmjHuANJL5yWs6jb4mrrDymVYODA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"bcd8faf74cb334f48c30ba96c19bf2a5a450e16762219b99ee36f70678c1053f","last_reissued_at":"2026-05-18T03:23:37.616277Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T03:23:37.616277Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"HyperNetworks","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"A hypernetwork generates the weights for another network to enable non-shared weights in LSTMs.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Andrew Dai, David Ha, Quoc V. Le","submitted_at":"2016-09-27T05:57:00Z","abstract_excerpt":"This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the relationship between a genotype - the hypernetwork - and a phenotype - the main network. Though they are also reminiscent of HyperNEAT in evolution, our hypernetworks are trained end-to-end with backpropagation and thus are usually faster. The focus of this work is to make hypernetworks useful for deep convolutional networks and long recurrent networks, where hypernet"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve near state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The assumption that the hypernetwork can be effectively trained end-to-end with backpropagation to produce high-quality weights for the main network without introducing instability, overfitting, or requiring excessive additional computation.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Hypernetworks generate weights for a main network, allowing LSTMs to use non-shared weights and achieve near state-of-the-art results on sequence modeling tasks while using fewer parameters overall.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"A hypernetwork generates the weights for another network to enable non-shared weights in LSTMs.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"a6c81e69fc6a6113cc3a1c8af25ce7bffeea20dddaaab4e40d518b093ab378b7"},"source":{"id":"1609.09106","kind":"arxiv","version":4},"verdict":{"id":"ffe8543a-bf39-408a-baa3-3f8263dc554f","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T00:14:34.273271Z","strongest_claim":"Our main result is that hypernetworks can generate non-shared weights for LSTM and achieve near state-of-the-art results on a variety of sequence modelling tasks including character-level language modelling, handwriting generation and neural machine translation, challenging the weight-sharing paradigm for recurrent networks.","one_line_summary":"Hypernetworks generate weights for a main network, allowing LSTMs to use non-shared weights and achieve near state-of-the-art results on sequence modeling tasks while using fewer parameters overall.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The assumption that the hypernetwork can be effectively trained end-to-end with backpropagation to produce high-quality weights for the main network without introducing instability, overfitting, or requiring excessive additional computation.","pith_extraction_headline":"A hypernetwork generates the weights for another network to enable non-shared weights in LSTMs."},"references":{"count":2,"sample":[{"doi":"","year":2016,"title":"TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems","work_id":"91f3c09e-dae6-48ca-80c0-463dd1b1f6e1","ref_index":1,"cited_arxiv_id":"1603.04467","is_internal_anchor":false},{"doi":"","year":2015,"title":"Large Embedding","work_id":"f71b6e48-8bbe-460c-ba5d-4ad1d91006fd","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":2,"snapshot_sha256":"03304a60339b17fe679342dc4ba51791cc91c58006c5498e1c8726e97ce099f4","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"da15d5c9170af51bc94b5e34cd417ef8aebdb0b15b9d7e7f0373e8ce9a100ec2"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"1609.09106","created_at":"2026-05-18T03:23:37.616383+00:00"},{"alias_kind":"arxiv_version","alias_value":"1609.09106v4","created_at":"2026-05-18T03:23:37.616383+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.1609.09106","created_at":"2026-05-18T03:23:37.616383+00:00"},{"alias_kind":"pith_short_12","alias_value":"XTMPV52MWM2P","created_at":"2026-05-18T12:30:51.357362+00:00"},{"alias_kind":"pith_short_16","alias_value":"XTMPV52MWM2PJDBQ","created_at":"2026-05-18T12:30:51.357362+00:00"},{"alias_kind":"pith_short_8","alias_value":"XTMPV52M","created_at":"2026-05-18T12:30:51.357362+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":35,"internal_anchor_count":35,"sample":[{"citing_arxiv_id":"1906.12087","citing_title":"ARMIN: Towards a More Efficient and Light-weight Recurrent Memory Network","ref_index":14,"is_internal_anchor":true},{"citing_arxiv_id":"2503.12868","citing_title":"UniReg: A Universal Model for Controllable CT Image Registration","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.20856","citing_title":"DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation","ref_index":9,"is_internal_anchor":true},{"citing_arxiv_id":"2605.21309","citing_title":"Hyper-V2X: Hypernetworks for Estimating Epistemic and Aleatoric Uncertainty in Cooperative Bird's-Eye-View Semantic Segmentation","ref_index":28,"is_internal_anchor":true},{"citing_arxiv_id":"2502.05564","citing_title":"TabICL: A Tabular Foundation Model for In-Context Learning on Large Data","ref_index":207,"is_internal_anchor":true},{"citing_arxiv_id":"2605.19826","citing_title":"Explainable Wastewater Digital Twins: Adaptive Context-Conditioned Structured Simulators with Self-Falsifying Decision Support","ref_index":38,"is_internal_anchor":true},{"citing_arxiv_id":"1911.05507","citing_title":"Compressive Transformers for Long-Range Sequence Modelling","ref_index":112,"is_internal_anchor":true},{"citing_arxiv_id":"2601.16672","citing_title":"ReWeaver: Towards Simulation-Ready and Topology-Accurate Garment Reconstruction","ref_index":12,"is_internal_anchor":true},{"citing_arxiv_id":"1806.07366","citing_title":"Neural Ordinary Differential Equations","ref_index":20,"is_internal_anchor":true},{"citing_arxiv_id":"2603.20410","citing_title":"SLE-FNO: Single-Layer Extensions for Task-Agnostic Continual Learning in Fourier Neural Operators","ref_index":68,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08512","citing_title":"MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12388","citing_title":"Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13333","citing_title":"Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation","ref_index":26,"is_internal_anchor":true},{"citing_arxiv_id":"2605.13839","citing_title":"Good Agentic Friends Do Not Just Give Verbal Advice: They Can Update Your Weights","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02441","citing_title":"Adaptive Learned State Estimation based on KalmanNet","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2604.02564","citing_title":"Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It","ref_index":19,"is_internal_anchor":true},{"citing_arxiv_id":"2604.03150","citing_title":"HyperFitS -- Hypernetwork Fitting Spectra for metabolic quantification of ${}^1$H MR spectroscopic imaging","ref_index":35,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12134","citing_title":"MULTI: Disentangling Camera Lens, Sensor, View, and Domain for Novel Image Generation","ref_index":18,"is_internal_anchor":true},{"citing_arxiv_id":"2605.12388","citing_title":"Events as Triggers for Behavioral Diversity in Multi-Agent Reinforcement Learning","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26762","citing_title":"Exploring the Potential of Probabilistic Transformer for Time Series Modeling: A Report on the ST-PT Framework","ref_index":46,"is_internal_anchor":true},{"citing_arxiv_id":"2604.26227","citing_title":"HOI-aware Adaptive Network for Weakly-supervised Action Segmentation","ref_index":10,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08857","citing_title":"RareCP: Regime-Aware Retrieval for Efficient Conformal Prediction","ref_index":39,"is_internal_anchor":true},{"citing_arxiv_id":"2605.08512","citing_title":"MoMo: Conditioned Contrastive Representation Learning for Preference-Modulated Planning","ref_index":49,"is_internal_anchor":true},{"citing_arxiv_id":"2604.23750","citing_title":"The Override Gap: A Magnitude Account of Knowledge Conflict Failure in Hypernetwork-Based Instant LLM Adaptation","ref_index":13,"is_internal_anchor":true},{"citing_arxiv_id":"2605.10009","citing_title":"Hystar: Hypernetwork-driven Style-adaptive Retrieval via Dynamic SVD Modulation","ref_index":7,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW","json":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW.json","graph_json":"https://pith.science/api/pith-number/XTMPV52MWM2PJDBQXKLMDG7SUW/graph.json","events_json":"https://pith.science/api/pith-number/XTMPV52MWM2PJDBQXKLMDG7SUW/events.json","paper":"https://pith.science/paper/XTMPV52M"},"agent_actions":{"view_html":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW","download_json":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW.json","view_paper":"https://pith.science/paper/XTMPV52M","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=1609.09106&json=true","fetch_graph":"https://pith.science/api/pith-number/XTMPV52MWM2PJDBQXKLMDG7SUW/graph.json","fetch_events":"https://pith.science/api/pith-number/XTMPV52MWM2PJDBQXKLMDG7SUW/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW/action/timestamp_anchor","attest_storage":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW/action/storage_attestation","attest_author":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW/action/author_attestation","sign_citation":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW/action/citation_signature","submit_replication":"https://pith.science/pith/XTMPV52MWM2PJDBQXKLMDG7SUW/action/replication_record"}},"created_at":"2026-05-18T03:23:37.616383+00:00","updated_at":"2026-05-18T03:23:37.616383+00:00"}