{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2025:77OMDIUIMRATI43SRJAXMN7SJV","short_pith_number":"pith:77OMDIUI","schema_version":"1.0","canonical_sha256":"ffdcc1a28864413473728a417637f24d4147d43dc5dc4d6edb3ce78e26d748c6","source":{"kind":"arxiv","id":"2511.07308","version":2},"attestation_state":"computed","paper":{"title":"Can Stationary Distributions of Scale-Invariant Neural Networks Be Described by the Thermodynamics of an Ideal Gas?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Stationary distributions of SGD for scale-invariant networks correspond to ideal gas thermodynamics.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Dmitry Vetrov, Ekaterina Lobacheva, Ildus Sadrtdinov, Ivan Klimov, Mikhail Burtsev, Mikhail I. Katsnelson","submitted_at":"2025-11-10T17:10:01Z","abstract_excerpt":"Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights. Building on this perspective, we develop a thermodynamic framework to describe the stationary distributions of stochastic gradient descent (SGD) with weight decay for scale-invariant neural networks, a setting that both reflects practical architectures with normalization layers and permits theoretical analysis. We establish analogies between training hyperparameters (e.g., learning rate, weight decay) and thermodynamic variables such as tempera"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2511.07308","kind":"arxiv","version":2},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2025-11-10T17:10:01Z","cross_cats_sorted":[],"title_canon_sha256":"3883034d18afb44f7477dd2d9eb85730aaa04cd022c5c079dabc6f8712ab4f30","abstract_canon_sha256":"fa0483c71cb2b6908fcff72f120e1c968b942d7c0a5f339518bb0b35915dff5f"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:17.168925Z","signature_b64":"DlIbVHS4CBQd/n6CigWPlhpMEI5t7J74HrNNwjV/OkOclzUmqhYIRAieRdPnJh/9TUgudQBxX3rxl8WrPtLfDg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"ffdcc1a28864413473728a417637f24d4147d43dc5dc4d6edb3ce78e26d748c6","last_reissued_at":"2026-05-17T23:39:17.168209Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:17.168209Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Can Stationary Distributions of Scale-Invariant Neural Networks Be Described by the Thermodynamics of an Ideal Gas?","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Stationary distributions of SGD for scale-invariant networks correspond to ideal gas thermodynamics.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Dmitry Vetrov, Ekaterina Lobacheva, Ildus Sadrtdinov, Ivan Klimov, Mikhail Burtsev, Mikhail I. Katsnelson","submitted_at":"2025-11-10T17:10:01Z","abstract_excerpt":"Understanding the training dynamics of deep neural networks remains a major open problem, with physics-inspired approaches offering promising insights. Building on this perspective, we develop a thermodynamic framework to describe the stationary distributions of stochastic gradient descent (SGD) with weight decay for scale-invariant neural networks, a setting that both reflects practical architectures with normalization layers and permits theoretical analysis. We establish analogies between training hyperparameters (e.g., learning rate, weight decay) and thermodynamic variables such as tempera"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Starting with a simplified isotropic noise model, we uncover a close correspondence between SGD dynamics and ideal gas behavior, validated through theory and simulation. Extending to training of neural networks, we show that key predictions of the framework, including the behavior of stationary entropy, align closely with experimental observations.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The derivation begins with a simplified isotropic noise model whose relation to the actual gradient noise in deep networks is not quantified; if this model is a poor approximation, the ideal-gas analogy and its thermodynamic-variable mappings lose their justification.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A thermodynamic framework maps SGD stationary distributions in scale-invariant networks to ideal-gas behavior, with training hyperparameters acting as thermodynamic variables.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Stationary distributions of SGD for scale-invariant networks correspond to ideal gas thermodynamics.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"74e3dac28ed50ee9743e6a845d97329eb7c882c6f0d82efe4548c45ff397f4dd"},"source":{"id":"2511.07308","kind":"arxiv","version":2},"verdict":{"id":"37f08171-43f1-45a3-8588-f08e0cc51e46","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T23:28:24.037250Z","strongest_claim":"Starting with a simplified isotropic noise model, we uncover a close correspondence between SGD dynamics and ideal gas behavior, validated through theory and simulation. Extending to training of neural networks, we show that key predictions of the framework, including the behavior of stationary entropy, align closely with experimental observations.","one_line_summary":"A thermodynamic framework maps SGD stationary distributions in scale-invariant networks to ideal-gas behavior, with training hyperparameters acting as thermodynamic variables.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The derivation begins with a simplified isotropic noise model whose relation to the actual gradient noise in deep networks is not quantified; if this model is a poor approximation, the ideal-gas analogy and its thermodynamic-variable mappings lose their justification.","pith_extraction_headline":"Stationary distributions of SGD for scale-invariant networks correspond to ideal gas thermodynamics."},"references":{"count":19,"sample":[{"doi":"","year":2018,"title":"Layer Normalization","work_id":"20a2d720-0046-4c7c-bcd6-327ec8143f69","ref_index":1,"cited_arxiv_id":"1607.06450","is_internal_anchor":true},{"doi":"","year":2017,"title":"Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., and Zecchina, R. (2017). Entropy-SGD: Biasing gra- dient descent into wide valleys. InInternation","work_id":"25027f08-274c-4b1e-92f2-c572d54c8370","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2015,"title":"arXiv preprint arXiv:1711.04623 , year=","work_id":"bcb52129-9115-4696-8926-64f96214f2d7","ref_index":3,"cited_arxiv_id":"1711.04623","is_internal_anchor":true},{"doi":"","year":2008,"title":"Le Ny, A. (2008). Introduction to (generalized) gibbs measures.Ensaios Matemáticos, 15(1-126). Li, Z. and Arora, S. (2020). An exponential learn- ing rate schedule for deep learning. InInternational C","work_id":"9e6a1830-9d4a-4c3d-baf0-af931436b12d","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Liu, Z., Liu, Y., Gore, J., and Tegmark, M. (2025). Neural thermodynamic laws for large language model training. Lobacheva, E., Kodryan, M., Chirkova, N., Malinin, A., and Vetrov, D. P. (2021). On the","work_id":"c82139be-8e94-4e4d-a9b5-3ca1135a6afb","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":19,"snapshot_sha256":"e55e01ad9e410134d8c27292192c8a3438298443df18f2d308820618f8c47d8a","internal_anchors":3},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2511.07308","created_at":"2026-05-17T23:39:17.168332+00:00"},{"alias_kind":"arxiv_version","alias_value":"2511.07308v2","created_at":"2026-05-17T23:39:17.168332+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2511.07308","created_at":"2026-05-17T23:39:17.168332+00:00"},{"alias_kind":"pith_short_12","alias_value":"77OMDIUIMRAT","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"77OMDIUIMRATI43S","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"77OMDIUI","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV","json":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV.json","graph_json":"https://pith.science/api/pith-number/77OMDIUIMRATI43SRJAXMN7SJV/graph.json","events_json":"https://pith.science/api/pith-number/77OMDIUIMRATI43SRJAXMN7SJV/events.json","paper":"https://pith.science/paper/77OMDIUI"},"agent_actions":{"view_html":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV","download_json":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV.json","view_paper":"https://pith.science/paper/77OMDIUI","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2511.07308&json=true","fetch_graph":"https://pith.science/api/pith-number/77OMDIUIMRATI43SRJAXMN7SJV/graph.json","fetch_events":"https://pith.science/api/pith-number/77OMDIUIMRATI43SRJAXMN7SJV/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV/action/timestamp_anchor","attest_storage":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV/action/storage_attestation","attest_author":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV/action/author_attestation","sign_citation":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV/action/citation_signature","submit_replication":"https://pith.science/pith/77OMDIUIMRATI43SRJAXMN7SJV/action/replication_record"}},"created_at":"2026-05-17T23:39:17.168332+00:00","updated_at":"2026-05-17T23:39:17.168332+00:00"}