{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2019:HJW6E4YADRP6F2SC5N4ACBO5O6","short_pith_number":"pith:HJW6E4YA","schema_version":"1.0","canonical_sha256":"3a6de273001c5fe2ea42eb780105dd7782b34453323a88d800da5de4ae5ce0cc","source":{"kind":"arxiv","id":"1903.10145","version":3},"attestation_state":"computed","paper":{"title":"Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.AI","cs.CL","cs.CV","stat.ML"],"primary_cat":"cs.LG","authors_text":"Asli Celikyilmaz, Chunyuan Li, Hao Fu, Jianfeng Gao, Lawrence Carin, XiaoDong Liu","submitted_at":"2019-03-25T06:28:24Z","abstract_excerpt":"Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter \\beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for \\beta, and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the proce"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":false,"formal_links_present":false},"canonical_record":{"source":{"id":"1903.10145","kind":"arxiv","version":3},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.LG","submitted_at":"2019-03-25T06:28:24Z","cross_cats_sorted":["cs.AI","cs.CL","cs.CV","stat.ML"],"title_canon_sha256":"75eaddb0a3307e24b74f7ca1223304eeed0986af2298c645fc19fe3968e20deb","abstract_canon_sha256":"0c9f1a0ffb0a29c9c17517574c07dce32a7a0f72f78d2cbe64ea693df4dabf71"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:43:42.384452Z","signature_b64":"PiONQGFa0WgTRTjj0abLAfHcsNAgSA7sfr4luLhknowGRXjlx/0ZuxJD/PjgvM8/X60moAzaLWcEAYPy6aAwBw==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"3a6de273001c5fe2ea42eb780105dd7782b34453323a88d800da5de4ae5ce0cc","last_reissued_at":"2026-05-17T23:43:42.383715Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:43:42.383715Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"","cross_cats":["cs.AI","cs.CL","cs.CV","stat.ML"],"primary_cat":"cs.LG","authors_text":"Asli Celikyilmaz, Chunyuan Li, Hao Fu, Jianfeng Gao, Lawrence Carin, XiaoDong Liu","submitted_at":"2019-03-25T06:28:24Z","abstract_excerpt":"Variational autoencoders (VAEs) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. The VAE objective consists of two terms, (i) reconstruction and (ii) KL regularization, balanced by a weighting hyper-parameter \\beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study scheduling schemes for \\beta, and show that KL vanishing is caused by the lack of good latent codes in training the decoder at the beginning of optimization. To remedy this, we propose a cyclical annealing schedule, which repeats the proce"},"claims":{"count":0,"items":[],"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"source":{"id":"1903.10145","kind":"arxiv","version":3},"verdict":{"id":null,"model_set":{},"created_at":null,"strongest_claim":"","one_line_summary":"","pipeline_version":null,"weakest_assumption":"","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"1903.10145","created_at":"2026-05-17T23:43:42.383828+00:00"},{"alias_kind":"arxiv_version","alias_value":"1903.10145v3","created_at":"2026-05-17T23:43:42.383828+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.1903.10145","created_at":"2026-05-17T23:43:42.383828+00:00"},{"alias_kind":"pith_short_12","alias_value":"HJW6E4YADRP6","created_at":"2026-05-18T12:33:18.533446+00:00"},{"alias_kind":"pith_short_16","alias_value":"HJW6E4YADRP6F2SC","created_at":"2026-05-18T12:33:18.533446+00:00"},{"alias_kind":"pith_short_8","alias_value":"HJW6E4YA","created_at":"2026-05-18T12:33:18.533446+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":3,"internal_anchor_count":1,"sample":[{"citing_arxiv_id":"2509.24965","citing_title":"VIVALDy: A Hybrid Generative Reduced-Order Model for Turbulent Flows, Applied to Vortex-Induced Vibrations","ref_index":41,"is_internal_anchor":true},{"citing_arxiv_id":"2604.06032","citing_title":"Ensemble-Based Dirichlet Modeling for Predictive Uncertainty and Selective Classification","ref_index":23,"is_internal_anchor":false},{"citing_arxiv_id":"2604.05513","citing_title":"From Unsupervised to Guided Clustering: A Variational Implementation","ref_index":12,"is_internal_anchor":false}]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6","json":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6.json","graph_json":"https://pith.science/api/pith-number/HJW6E4YADRP6F2SC5N4ACBO5O6/graph.json","events_json":"https://pith.science/api/pith-number/HJW6E4YADRP6F2SC5N4ACBO5O6/events.json","paper":"https://pith.science/paper/HJW6E4YA"},"agent_actions":{"view_html":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6","download_json":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6.json","view_paper":"https://pith.science/paper/HJW6E4YA","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=1903.10145&json=true","fetch_graph":"https://pith.science/api/pith-number/HJW6E4YADRP6F2SC5N4ACBO5O6/graph.json","fetch_events":"https://pith.science/api/pith-number/HJW6E4YADRP6F2SC5N4ACBO5O6/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6/action/timestamp_anchor","attest_storage":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6/action/storage_attestation","attest_author":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6/action/author_attestation","sign_citation":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6/action/citation_signature","submit_replication":"https://pith.science/pith/HJW6E4YADRP6F2SC5N4ACBO5O6/action/replication_record"}},"created_at":"2026-05-17T23:43:42.383828+00:00","updated_at":"2026-05-17T23:43:42.383828+00:00"}