MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation

Ahanaf Hasan Ariq

arxiv: 2605.21783 · v1 · pith:DLAJFFOSnew · submitted 2026-05-20 · 💻 cs.LG · stat.ML

MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time Adaptation

Ahanaf Hasan Ariq This is my paper

Pith reviewed 2026-05-22 09:02 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords test-time adaptationPAC-Bayesian boundsmaximum mean discrepancycredal setsepistemic uncertaintydistribution shiftgeneralization bounds

0 comments

The pith

Interpreting MMD-balls around the source distribution as credal sets yields a PAC-Bayesian framework for epistemic uncertainty in test-time adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a PAC-Bayesian framework for test-time adaptation under distribution shift that explicitly ties the size of the shift, measured by maximum mean discrepancy, to bounds on prediction risk. It treats MMD-balls centered on the source distribution as collections of possible target distributions, which in turn support a uniform worst-case risk bound and a separation between epistemic and aleatoric uncertainty. This supplies a decision rule for when adaptation is justified by the estimated shift. A sympathetic reader would value the result because it replaces heuristic adaptation with guarantees that scale with a concrete, computable discrepancy measure.

Core claim

Interpreting MMD-balls around the source distribution as credal sets in Walley's imprecise probability theory yields natural epistemic uncertainty quantification and a uniform worst-case risk bound over all distributions in the credal set, together with a PAC-Bayesian bound containing an MMD-dependent shift penalty.

What carries the argument

MMD-balls viewed as credal sets, which carry the argument by allowing a single worst-case risk bound to be written over every distribution inside an MMD radius of the source.

Load-bearing premise

The loss function is Lipschitz continuous with respect to the norm induced by the reproducing kernel Hilbert space.

What would settle it

A finite-sample experiment in which the observed risk on a held-out target distribution exceeds the upper bound obtained from the lower-upper risk decomposition over the corresponding MMD-ball.

read the original abstract

Test-time adaptation (TTA) methods improve model performance under distribution shift but lack formal guarantees connecting shift magnitude to prediction reliability. We develop a PAC-Bayesian framework yielding generalization bounds explicitly parameterized by the maximum mean discrepancy (MMD) between source and target distributions. Our principal contribution is interpreting MMD-balls around the source distribution as credal sets in Walley's imprecise probability theory, yielding natural epistemic uncertainty quantification. We establish: (i) a PAC-Bayesian bound with an MMD-dependent shift penalty under an RKHS-Lipschitz loss assumption; (ii) a finite-sample version via MMD concentration; (iii) a uniform worst-case risk bound over all distributions in the credal set, with a lower-upper risk decomposition; and (iv) geodesic preservation bounds explaining why kernel-guided adaptation protects local feature geometry. The credal set interpretation separates epistemic from aleatoric uncertainty and provides a principled decision criterion for when adaptation is warranted.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames MMD balls as credal sets to quantify epistemic uncertainty in TTA and derives PAC-Bayesian bounds with an MMD penalty, but the key step needs an RKHS-Lipschitz loss that standard TTA setups may not satisfy.

read the letter

This paper connects MMD balls around the source distribution to credal sets in imprecise probability theory, using that to give epistemic uncertainty measures and worst-case risk bounds for test-time adaptation under shift. The new part is treating the MMD ball as a credal set to get a uniform bound on risk over all distributions inside it, along with a lower and upper risk decomposition. They also include a PAC-Bayesian generalization bound that penalizes by the MMD distance, a finite-sample version using concentration inequalities, and some results on preserving geodesic distances in the feature space when adapting with kernels. These pieces fit together to separate epistemic uncertainty from aleatoric noise and to suggest when adaptation makes sense. The work builds on standard tools but applies them in a way that addresses a real gap in TTA, where most methods are heuristic. The main soft spot is the RKHS-Lipschitz assumption on the loss function. The bound on the difference between expectations under source and target relies on this to turn the MMD into a linear penalty. For common losses like cross-entropy with deep models, it's not clear this holds with respect to the RKHS norm of the kernel used for MMD. If the assumption is too strong, the credal set bound reduces to something already known from domain adaptation literature without adding much new. The paper states the assumption, but I wonder how restrictive it is in the experiments or if they have ways around it. The citation pattern seems standard for PAC-Bayes and MMD work, with no obvious circularity. This is for researchers in robust ML who want formal tools for uncertainty under distribution shift. Someone looking for new ways to think about epistemic uncertainty in adaptation would find it useful. It deserves a serious referee because the core idea is original enough and the framework has potential, even if the assumptions need scrutiny. I recommend putting it through peer review rather than desk rejecting it. Reviewers can check the derivations and see if the Lipschitz condition can be met or weakened for practical TTA losses.

Referee Report

2 major / 2 minor

Summary. The manuscript develops a PAC-Bayesian framework for test-time adaptation that interprets MMD-balls centered at the source distribution as credal sets in Walley's imprecise probability theory. It derives (i) a PAC-Bayesian generalization bound with an MMD-dependent shift penalty under an RKHS-Lipschitz loss assumption, (ii) a finite-sample version via MMD concentration, (iii) a uniform worst-case risk bound over the credal set together with a lower-upper risk decomposition separating epistemic uncertainty, and (iv) geodesic preservation bounds for kernel-guided adaptation.

Significance. If the derivations hold, the credal-set interpretation supplies a principled, distribution-free way to quantify epistemic uncertainty and to decide when adaptation is warranted, extending standard MMD domain-adaptation bounds. The paper provides machine-checked-style theoretical derivations and explicit lower-upper decompositions, which are strengths for a theory-oriented contribution in this area.

major comments (2)

[Main results section (derivation of the PAC-Bayesian bound with MMD shift penalty)] The uniform worst-case risk bound (abstract item (iii) and the corresponding theorem in the main results section) is obtained by controlling |E_Q[loss] - E_P[loss]| via an MMD term scaled by the RKHS-Lipschitz constant of the loss. For the cross-entropy loss composed with a deep feature map that is standard in TTA, this Lipschitz condition with respect to the RKHS norm of the kernel used for MMD is not generally satisfied; without additional verification or a relaxation of the assumption, the linear shift penalty does not exist and the supremum risk over the MMD-ball cannot be bounded by source risk plus a finite multiple of the radius.
[Finite-sample analysis subsection] The finite-sample MMD concentration step invoked for the PAC-Bayesian bound (abstract item (ii)) produces constants that depend on the kernel bandwidth and the RKHS norm of the loss; the manuscript should exhibit that these constants remain non-vacuous for the sample sizes and feature dimensions typical in TTA experiments, otherwise the credal-set guarantee reduces to a statement that is formally correct but practically uninformative.

minor comments (2)

[Preliminaries] The notation for the lower and upper expectations induced by the credal set should be introduced with an explicit reference to Walley's framework in the preliminaries to avoid ambiguity with standard expectation notation.
[Experiments / illustrative figures] Figure 2 (geodesic preservation illustration) would benefit from an additional panel showing the effect of violating the RKHS-Lipschitz condition on the preserved geometry.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and insightful comments on our work. We address the major comments point by point below, indicating where revisions will be made to the manuscript.

read point-by-point responses

Referee: [Main results section (derivation of the PAC-Bayesian bound with MMD shift penalty)] The uniform worst-case risk bound (abstract item (iii) and the corresponding theorem in the main results section) is obtained by controlling |E_Q[loss] - E_P[loss]| via an MMD term scaled by the RKHS-Lipschitz constant of the loss. For the cross-entropy loss composed with a deep feature map that is standard in TTA, this Lipschitz condition with respect to the RKHS norm of the kernel used for MMD is not generally satisfied; without additional verification or a relaxation of the assumption, the linear shift penalty does not exist and the supremum risk over the MMD-ball cannot be bounded by source risk plus a finite multiple of the radius.

Authors: We appreciate the referee's observation on the limitations of the RKHS-Lipschitz assumption for the cross-entropy loss in typical TTA settings involving deep feature maps. The manuscript explicitly states this assumption to obtain the MMD-dependent shift penalty in the PAC-Bayesian bound. We agree that this condition may not hold universally for unbounded losses like cross-entropy without additional constraints on the feature representations. In the revised manuscript, we will expand the discussion in the main results section to include a clarification of the assumption's scope, provide conditions under which it is satisfied (such as when the loss is composed with a bounded RKHS function or for specific kernel choices), and outline possible relaxations using alternative bounding techniques like those based on Rademacher complexity. This will ensure the bound is presented with appropriate caveats while preserving its validity under the stated conditions. revision: yes
Referee: [Finite-sample analysis subsection] The finite-sample MMD concentration step invoked for the PAC-Bayesian bound (abstract item (ii)) produces constants that depend on the kernel bandwidth and the RKHS norm of the loss; the manuscript should exhibit that these constants remain non-vacuous for the sample sizes and feature dimensions typical in TTA experiments, otherwise the credal-set guarantee reduces to a statement that is formally correct but practically uninformative.

Authors: We thank the referee for pointing out the need to demonstrate the practicality of the finite-sample constants. The concentration inequalities for MMD depend on the kernel parameters and the norm of the loss in the RKHS. While the manuscript focuses on the theoretical derivation, we acknowledge that explicit verification for typical TTA settings (e.g., ResNet features with Gaussian kernels) would strengthen the contribution. In the revision, we will add a remark in the finite-sample analysis subsection with a qualitative discussion and a small numerical example in the appendix showing that for sample sizes around 1000-5000 and standard bandwidth selections, the additive terms do not dominate the bound, making the guarantees informative. This addresses the concern that the result might be practically uninformative. revision: partial

Circularity Check

0 steps flagged

No significant circularity; novel credal-set interpretation with standard PAC-Bayesian derivation

full rationale

The paper's core contribution is a new interpretive step mapping MMD-balls to Walley credal sets for epistemic uncertainty, followed by PAC-Bayesian bounds that explicitly invoke an external RKHS-Lipschitz loss assumption. These bounds control the shift term via MMD in the usual way and do not reduce by construction to quantities defined only inside the paper. No self-citations appear load-bearing, no parameters are fitted then relabeled as predictions, and the uniform worst-case risk bound follows directly from the stated assumption rather than from any tautological redefinition. The framework therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework rests on the RKHS-Lipschitz loss assumption for the shift penalty and on standard concentration results for MMD; no free parameters or new invented entities are explicitly introduced in the abstract beyond the credal-set reinterpretation.

axioms (1)

domain assumption RKHS-Lipschitz loss assumption
Invoked to obtain the PAC-Bayesian bound with MMD-dependent shift penalty.

invented entities (1)

MMD-ball interpreted as credal set no independent evidence
purpose: To provide natural epistemic uncertainty quantification and uniform worst-case risk bounds
New interpretive device linking kernel discrepancy to imprecise probability; no independent falsifiable evidence supplied in abstract.

pith-pipeline@v0.9.0 · 5701 in / 1376 out tokens · 39172 ms · 2026-05-22T09:02:13.340591+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

interpreting MMD-balls around the source distribution as credal sets in Walley’s imprecise probability theory, yielding natural epistemic uncertainty quantification... uniform worst-case risk bound over all distributions in the credal set
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PAC-Bayesian bound with an MMD-dependent shift penalty under an RKHS-Lipschitz loss assumption

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

A user-friendly introduction to PAC-Bayes bounds.arXiv preprint arXiv:2211.03053, 2024

Pierre Alquier. A user-friendly introduction to PAC-Bayes bounds.arXiv preprint arXiv:2211.03053, 2024. 6 MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time AdaptationA PREPRINT

work page arXiv 2024
[2]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction: A framework for distribution-free uncertainty quantification. 2023

work page 2023
[3]

A theory of learning from different domains

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. A theory of learning from different domains. Machine Learning, 79:151–175, 2010

work page 2010
[4]

Lecture Notes in Mathematics, 2007

Olivier Catoni.PAC-Bayesian supervised classification: The thermodynamics of statistical learning. Lecture Notes in Mathematics, 2007

work page 2007
[5]

Classification

Giorgio Corani, Alessandro Antonucci, and Marco Zaffalon. Classification. pages 215–254, 2022

work page 2022
[6]

Specificity in imprecise probabilistic models

Sébastien Destercke, Didier Dubois, and Eric Chojnacki. Specificity in imprecise probabilistic models. In Proceedings of the IPMU2008 Conference, 2008

work page 2008
[7]

PAC-Bayesian theory meets Bayesian inference

Pascal Germain, Francis Bach, Alexandre Lacoste, and Simon Lacoste-Julien. PAC-Bayesian theory meets Bayesian inference. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016
[8]

A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers

Pascal Germain, Amaury Habrard, François Laviolette, and Emilie Morvant. A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers. InProceedings of the 30th International Conference on Machine Learning, pages 768–776, 2013

work page 2013
[9]

Adaptive conformal inference under distribution shift.Proceedings of the National Academy of Sciences, 118(43), 2021

Isaac Gibbs and Emmanuel Candès. Adaptive conformal inference under distribution shift.Proceedings of the National Academy of Sciences, 118(43), 2021

work page 2021
[10]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012

work page 2012
[11]

Uncertainty quantification in machine learning: One size does not fit all

Eyke Hüllermeier and Willem Waegeman. Uncertainty quantification in machine learning: One size does not fit all. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14082–14084, 2021

work page 2021
[12]

Some PAC-Bayesian theorems.Machine Learning, 37:355–363, 1999

David McAllester. Some PAC-Bayesian theorems.Machine Learning, 37:355–363, 1999

work page 1999
[13]

Probability and statistics

Enrique Miranda and Marco Zaffalon. Probability and statistics. pages 93–148, 2022

work page 2022
[14]

Sriperumbudur, and Bernhard Schölkopf

Krik Muandet, Kenji Fukumizu, Bharath K. Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

work page 2017
[15]

Towards stable test-time adaptation in dynamic wild world

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world. InInternational Conference on Learning Representations, 2023

work page 2023
[16]

Disrupted modularity and local connectivity of brain functional networks in childhood-onset schizophrenia

Omar Rivasplata, Pranjal Kamalaruban, Zoubin Ghahramani, and Emre Gözü. PAC-Bayes survey.arXiv preprint arXiv:2010.00147, 2020

work page arXiv 2010
[17]

PAC-Bayesian generalisation error bounds for Gaussian process classification.Journal of Machine Learning Research, 3:233–269, 2002

Matthias Seeger. PAC-Bayesian generalisation error bounds for Gaussian process classification.Journal of Machine Learning Research, 3:233–269, 2002

work page 2002
[18]

Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R

Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Kernel choice and classifiability. InAdvances in Neural Information Processing Systems, volume 22, 2009

work page 2009
[19]

Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering

Yuhang Su, Zhi Liu, Yong Zhang, Xing Yong, Jie Cheng, Qingjie Zeng, and Zengfu Gao. Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering. InAdvances in Neural Information Processing Systems, volume 35, 2022

work page 2022
[20]

Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Balaji Lakshminarayanan, and Arnaud Doucet

Dougal J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Balaji Lakshminarayanan, and Arnaud Doucet. Generative models and model criticism via optimized maximum mean discrepancy. InInternational Conference on Learning Representations, 2017

work page 2017
[21]

Sriperumbudur, Krik Muandet, and Bernhard Schölkopf

Ilya Tolstikhin, Bharath K. Sriperumbudur, Krik Muandet, and Bernhard Schölkopf. Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18:1–47, 2017

work page 2017
[22]

Matthias C. M. Troffaes and Sébastien Destercke.Introduction to Imprecise Probabilities. Wiley, 2023

work page 2023
[23]

Chapman and Hall, 1991

Peter Walley.Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, 1991

work page 1991
[24]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Fuxin Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Representations, 2021

work page 2021
[25]

Robust test-time adaptation in dynamic scenarios

Luyao Yuan, Yong Zhang, Xing Wang, and Liang Wang. Robust test-time adaptation in dynamic scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10512–10521, 2023

work page 2023
[26]

Memo: Test time robustness via adaptation and augmentation

Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation. InAdvances in Neural Information Processing Systems, volume 35, 2022

work page 2022
[27]

A survey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2210.05365, 2022

Yue Zhang, Mingmin Chen, Xiyuxing Zhang, and Liang Wang. A survey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2210.05365, 2022. 7 MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time AdaptationA PREPRINT A Proof of Theorem 1 We present the complete proof of the PAC-Bayesian bound with MMD shi...

work page arXiv 2022

[1] [1]

A user-friendly introduction to PAC-Bayes bounds.arXiv preprint arXiv:2211.03053, 2024

Pierre Alquier. A user-friendly introduction to PAC-Bayes bounds.arXiv preprint arXiv:2211.03053, 2024. 6 MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time AdaptationA PREPRINT

work page arXiv 2024

[2] [2]

Angelopoulos and Stephen Bates

Anastasios N. Angelopoulos and Stephen Bates. A gentle introduction to conformal prediction: A framework for distribution-free uncertainty quantification. 2023

work page 2023

[3] [3]

A theory of learning from different domains

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. A theory of learning from different domains. Machine Learning, 79:151–175, 2010

work page 2010

[4] [4]

Lecture Notes in Mathematics, 2007

Olivier Catoni.PAC-Bayesian supervised classification: The thermodynamics of statistical learning. Lecture Notes in Mathematics, 2007

work page 2007

[5] [5]

Classification

Giorgio Corani, Alessandro Antonucci, and Marco Zaffalon. Classification. pages 215–254, 2022

work page 2022

[6] [6]

Specificity in imprecise probabilistic models

Sébastien Destercke, Didier Dubois, and Eric Chojnacki. Specificity in imprecise probabilistic models. In Proceedings of the IPMU2008 Conference, 2008

work page 2008

[7] [7]

PAC-Bayesian theory meets Bayesian inference

Pascal Germain, Francis Bach, Alexandre Lacoste, and Simon Lacoste-Julien. PAC-Bayesian theory meets Bayesian inference. InAdvances in Neural Information Processing Systems, volume 29, 2016

work page 2016

[8] [8]

A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers

Pascal Germain, Amaury Habrard, François Laviolette, and Emilie Morvant. A PAC-Bayesian approach for domain adaptation with specialization to linear classifiers. InProceedings of the 30th International Conference on Machine Learning, pages 768–776, 2013

work page 2013

[9] [9]

Adaptive conformal inference under distribution shift.Proceedings of the National Academy of Sciences, 118(43), 2021

Isaac Gibbs and Emmanuel Candès. Adaptive conformal inference under distribution shift.Proceedings of the National Academy of Sciences, 118(43), 2021

work page 2021

[10] [10]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012

work page 2012

[11] [11]

Uncertainty quantification in machine learning: One size does not fit all

Eyke Hüllermeier and Willem Waegeman. Uncertainty quantification in machine learning: One size does not fit all. InProceedings of the AAAI Conference on Artificial Intelligence, volume 35, pages 14082–14084, 2021

work page 2021

[12] [12]

Some PAC-Bayesian theorems.Machine Learning, 37:355–363, 1999

David McAllester. Some PAC-Bayesian theorems.Machine Learning, 37:355–363, 1999

work page 1999

[13] [13]

Probability and statistics

Enrique Miranda and Marco Zaffalon. Probability and statistics. pages 93–148, 2022

work page 2022

[14] [14]

Sriperumbudur, and Bernhard Schölkopf

Krik Muandet, Kenji Fukumizu, Bharath K. Sriperumbudur, and Bernhard Schölkopf. Kernel mean embedding of distributions: A review and beyond.Foundations and Trends in Machine Learning, 10(1-2):1–141, 2017

work page 2017

[15] [15]

Towards stable test-time adaptation in dynamic wild world

Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Yaofo Chen, Shijian Zheng, Peilin Zhao, and Mingkui Tan. Towards stable test-time adaptation in dynamic wild world. InInternational Conference on Learning Representations, 2023

work page 2023

[16] [16]

Disrupted modularity and local connectivity of brain functional networks in childhood-onset schizophrenia

Omar Rivasplata, Pranjal Kamalaruban, Zoubin Ghahramani, and Emre Gözü. PAC-Bayes survey.arXiv preprint arXiv:2010.00147, 2020

work page arXiv 2010

[17] [17]

PAC-Bayesian generalisation error bounds for Gaussian process classification.Journal of Machine Learning Research, 3:233–269, 2002

Matthias Seeger. PAC-Bayesian generalisation error bounds for Gaussian process classification.Journal of Machine Learning Research, 3:233–269, 2002

work page 2002

[18] [18]

Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R

Bharath K. Sriperumbudur, Arthur Gretton, Kenji Fukumizu, Bernhard Schölkopf, and Gert R. G. Lanckriet. Kernel choice and classifiability. InAdvances in Neural Information Processing Systems, volume 22, 2009

work page 2009

[19] [19]

Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering

Yuhang Su, Zhi Liu, Yong Zhang, Xing Yong, Jie Cheng, Qingjie Zeng, and Zengfu Gao. Revisiting realistic test-time training: Sequential inference and adaptation by anchored clustering. InAdvances in Neural Information Processing Systems, volume 35, 2022

work page 2022

[20] [20]

Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Balaji Lakshminarayanan, and Arnaud Doucet

Dougal J. Sutherland, Hsiao-Yu Tung, Heiko Strathmann, Soumyajit De, Balaji Lakshminarayanan, and Arnaud Doucet. Generative models and model criticism via optimized maximum mean discrepancy. InInternational Conference on Learning Representations, 2017

work page 2017

[21] [21]

Sriperumbudur, Krik Muandet, and Bernhard Schölkopf

Ilya Tolstikhin, Bharath K. Sriperumbudur, Krik Muandet, and Bernhard Schölkopf. Minimax estimation of kernel mean embeddings.Journal of Machine Learning Research, 18:1–47, 2017

work page 2017

[22] [22]

Matthias C. M. Troffaes and Sébastien Destercke.Introduction to Imprecise Probabilities. Wiley, 2023

work page 2023

[23] [23]

Chapman and Hall, 1991

Peter Walley.Statistical Reasoning with Imprecise Probabilities. Chapman and Hall, 1991

work page 1991

[24] [24]

Tent: Fully test-time adaptation by entropy minimization

Dequan Wang, Evan Shelhamer, Fuxin Liu, Bruno Olshausen, and Trevor Darrell. Tent: Fully test-time adaptation by entropy minimization. InInternational Conference on Learning Representations, 2021

work page 2021

[25] [25]

Robust test-time adaptation in dynamic scenarios

Luyao Yuan, Yong Zhang, Xing Wang, and Liang Wang. Robust test-time adaptation in dynamic scenarios. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10512–10521, 2023

work page 2023

[26] [26]

Memo: Test time robustness via adaptation and augmentation

Marvin Zhang, Sergey Levine, and Chelsea Finn. Memo: Test time robustness via adaptation and augmentation. InAdvances in Neural Information Processing Systems, volume 35, 2022

work page 2022

[27] [27]

A survey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2210.05365, 2022

Yue Zhang, Mingmin Chen, Xiyuxing Zhang, and Liang Wang. A survey on test-time adaptation under distribution shifts.arXiv preprint arXiv:2210.05365, 2022. 7 MMD-Balls as Credal Sets: A PAC-Bayesian Framework for Epistemic Uncertainty in Test-Time AdaptationA PREPRINT A Proof of Theorem 1 We present the complete proof of the PAC-Bayesian bound with MMD shi...

work page arXiv 2022