{"paper":{"title":"Reliability-Gated Source Anchoring for Continual Test-Time Adaptation","license":"http://creativecommons.org/licenses/by/4.0/","headline":"RMemSafe uses normalized predictive entropy to gate source anchoring in continual test-time adaptation, disabling unreliable anchors when the source posterior flattens.","cross_cats":[],"primary_cat":"cs.LG","authors_text":"Biyao Zhang, Christian Gagn\\'e, Debargha Ganguly, Mohsen Harir, Osama Zafar, Sabyasachi Sahoo, Shouren Wang, Sreehari Sankar, Vikash Singh, Vipin Chaudhary, Weicong Chen","submitted_at":"2026-05-13T19:38:08Z","abstract_excerpt":"Continual test-time adaptation (CTTA) updates a pretrained model online on an unlabeled, non-stationary stream while anchoring it to a frozen source checkpoint. This anchor is useful only when the source remains reliable. On CCC-Hard, however, a ResNet-50 source falls to approximately $1.3\\%$ top-$1$ accuracy, while existing source-anchored CTTA methods continue applying the same anchor strength. We call this failure mode blind anchoring and propose RMemSafe, a reliability-gated extension of ROID that uses the frozen source's normalized predictive entropy to attenuate all explicit source-coupl"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Combined with ASR, RMemSafe achieves the lowest error on 8 of 9 matched-split continual-corruption cells and is the best reset-based method on all 9, improving ROID+ASR by 1.05 pp on ResNet-50 and 0.48 pp on ViT-B/16. A controlled source-degradation sweep shows a 1.13× shallower harm slope than ROID+ASR.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That normalized predictive entropy from the frozen source reliably signals when anchoring should be attenuated, without missing cases of confidently wrong low-entropy predictions or introducing instability in the fallback objective.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"RMemSafe gates source anchoring via entropy in CTTA, reducing error by 1.05pp on ResNet-50 when source accuracy collapses and showing shallower degradation slope than prior methods.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"RMemSafe uses normalized predictive entropy to gate source anchoring in continual test-time adaptation, disabling unreliable anchors when the source posterior flattens.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"7bfe6a0ea90eff13df743ed61c7cd14cd4fc05b0359a2c7241d23d526456777b"},"source":{"id":"2605.14063","kind":"arxiv","version":1},"verdict":{"id":"25b292d8-15ed-4dac-a078-a29a9e2c532b","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T05:25:43.615263Z","strongest_claim":"Combined with ASR, RMemSafe achieves the lowest error on 8 of 9 matched-split continual-corruption cells and is the best reset-based method on all 9, improving ROID+ASR by 1.05 pp on ResNet-50 and 0.48 pp on ViT-B/16. A controlled source-degradation sweep shows a 1.13× shallower harm slope than ROID+ASR.","one_line_summary":"RMemSafe gates source anchoring via entropy in CTTA, reducing error by 1.05pp on ResNet-50 when source accuracy collapses and showing shallower degradation slope than prior methods.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That normalized predictive entropy from the frozen source reliably signals when anchoring should be attenuated, without missing cases of confidently wrong low-entropy predictions or introducing instability in the fallback objective.","pith_extraction_headline":"RMemSafe uses normalized predictive entropy to gate source anchoring in continual test-time adaptation, disabling unreliable anchors when the source posterior flattens."},"references":{"count":43,"sample":[{"doi":"","year":2025,"title":"$K^4$: Online Log Anomaly Detection Via Unsupervised Typicality Learning","work_id":"ce79fed9-a586-4766-abdf-44b96f8005d3","ref_index":1,"cited_arxiv_id":"2507.20051","is_internal_anchor":true},{"doi":"","year":2023,"title":"A., and Yang, B","work_id":"9ab28abd-8337-4025-bf6f-2ec176bfb46f","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2024,"title":"Ganguly, D., Iyengar, S., Chaudhary, V ., and Kalyanaraman, S. (2024). PROOF OF THOUGHT : Neurosymbolic program synthesis allows robust and interpretable reasoning. InThe First Workshop on System-2 Re","work_id":"4643340a-5f09-48a6-8908-1c447c784271","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Ganguly, D., Morningstar, W. R., Yu, A. S., and Chaudhary, V . (2025a). Forte : Finding outliers with representation typicality estimation. InThe Thirteenth International Conference on Learning Repres","work_id":"b309b410-1a4f-480b-bdd0-df95bf17886d","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2026,"title":"Ganguly, D., Sankar, S., Zhang, B., Singh, V ., Gupta, K., Kavuru, H., Luo, A., et al. (2026). Trust the typical: An out-of-distribution safety detection framework.arXiv preprint arXiv:2602.04581. ICL","work_id":"f9953d8e-eb2a-4e8e-9610-1df1dda10d04","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":43,"snapshot_sha256":"2c4ab01fd3621093e8fdd8d189b8e199094cd81d989d81ed7d7d504665be2b86","internal_anchors":2},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}