arxiv: 2603.14094 · v2 · submitted 2026-03-14 · 📊 stat.ML · cs.LG· math.ST· stat.CO· stat.ME· stat.TH

Recognition: no theorem link

Maximin Robust Bayesian Experimental Design

Hany Abdulsamad , Sahel Iqbal , Christian A. Naesseth , Takuo Matsubara , Adrien Corenflos

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:07 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.STstat.COstat.MEstat.TH

keywords robust Bayesian experimental designSibson's alpha-mutual informationmaximin gameRényi divergencePAC-Bayes boundsmodel misspecificationinformation gainstochastic design policies

0 comments

The pith

Formulating Bayesian experimental design as a max-min game against an information-constrained adversary produces a robust objective governed by Sibson's α-mutual information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tackles the sensitivity of standard Bayesian experimental design when the assumed model does not match reality. It reframes the design choice as a game in which the experimenter seeks to maximize information gain while an adversary seeks to minimize it, with the adversary's actions limited only by information-theoretic constraints. Solving this game yields an objective defined by Sibson's α-mutual information, which in turn selects the α-tilted posterior as the appropriate robust belief update and Rényi divergence as the measure of conditional information gain. To make the resulting quantity estimable, the authors introduce a PAC-Bayes approach that searches over stochastic design policies and supplies high-probability lower bounds controlling finite-sample error.

Core claim

We formulate the Bayesian experimental design problem as a max-min game between the experimenter and an adversarial nature subject to information-theoretic constraints. This formulation produces a robust objective governed by Sibson's α-mutual information, identifies the α-tilted posterior as the robust belief update, and establishes the Rényi divergence as the appropriate measure of conditional information gain. To mitigate bias and variance of the nested Monte Carlo estimators required for Sibson's α-MI, we adopt a PAC-Bayes framework to optimize stochastic design policies, yielding rigorous high-probability lower bounds on the robust expected information gain that explicitly control for 1

What carries the argument

The max-min game between experimenter and information-constrained adversarial nature, which produces Sibson's α-mutual information as the governing robust objective and selects the α-tilted posterior for belief updates.

If this is right

The α-tilted posterior replaces the ordinary posterior as the belief update rule under this robust formulation.
Rényi divergence replaces mutual information as the measure of conditional information gain for design selection.
PAC-Bayes bounds supply explicit high-probability lower bounds on the robust objective when stochastic design policies are used.
Finite-sample bias and variance of nested Monte Carlo estimators for the robust quantity are controlled by the PAC-Bayes construction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same maximin structure could be applied to sequential decision problems such as active learning or policy optimization when model error is a concern.
Varying the parameter α offers a tunable knob between standard Bayesian behavior and more conservative robust behavior.
The approach suggests that information-theoretic constraints alone may suffice to encode robustness without enumerating explicit alternative models.
Similar game formulations might yield robust versions of other information-gain criteria used in statistics and machine learning.

Load-bearing premise

Model misspecification can be adequately captured by an adversarial nature whose possible actions are constrained solely by information-theoretic quantities.

What would settle it

Run a controlled simulation in which the true data-generating process differs from the working model in a known, quantifiable way; if designs obtained from the maximin objective do not outperform standard Bayesian designs on a held-out performance metric that reflects the actual misspecification, the robustness claim is falsified.

Figures

Figures reproduced from arXiv: 2603.14094 by Adrien Corenflos, Christian A. Naesseth, Hany Abdulsamad, Sahel Iqbal, Takuo Matsubara.

**Figure 1.** Figure 1: Comparison of realized information gains under nominal and robust formulations for linear regression (left) and A/B testing (right). Histograms show the empirical distributions of gains obtained from 104 simulations using optimal designs. The dashed line indicates the Sibson α-mutual information benchmark. 0 0.5 1 0 0.5 1 Expected Coverage Actual Coverage Linear Regression 0 0.5 1 Expected Coverage A/B Te… view at source ↗

**Figure 2.** Figure 2: Comparison of expected and actual coverage for nominal and robust posteriors given optimal and random designs in linear regression (left) and A/B testing (right). Nominal posteriors are overconfident, while robust posteriors are systematically conservative. Optimizing the design amplifies conservativeness further. design. The nominal approach appears to yield higher gains, but it is important to remember … view at source ↗

**Figure 3.** Figure 3: Empirical distributions of regret (left) and design optimality (right) across 1024 simulations for a naive optimizer and a PAC-Bayes policy for linear regression (top) an A/B testing (bottom). The naive optimizers exhibit higher regret and variability. Their designs are suboptimal, reflected in smaller design ratios relative to the theoretically optimal design. able predictive performance under misspecif… view at source ↗

**Figure 4.** Figure 4: Robust expected information gain for a two-dimensional linear regression problem with a correlated prior, as a function of α ∈ (0, 1). Contour lines depict the objective landscape over (ξ1, ξ2), highlighting how the optimal designs (⋆) shift with α. where the tilted marginal pα(x1:N | ξ1:N ) is defined as: pα(x1:N | ξ1:N ) ∝ Ep(θ) h p(x1:N | θ, ξ1:N ) α i 1/α . For this conjugate problem, the α-powered … view at source ↗

**Figure 5.** Figure 5: Robust expected information gain as a function of α ∈ (0, 1) for an A/B testing problem with 25 participants. The optimal allocation, highlighted in dark gray, shifts as α varies. Marginal and posterior. Under these assumptions, we can derive the marginal likelihood of the measurements p(x | ξ) by integrating out the parameters θ. This results in the Beta-Binomial distribution. For a single group k ∈ {a, b… view at source ↗

read the original abstract

We address the brittleness of Bayesian experimental design under model misspecification by formulating the problem as a max--min game between the experimenter and an adversarial nature subject to information-theoretic constraints. We demonstrate that this approach yields a robust objective governed by Sibson's $\alpha$-mutual information (MI), which identifies the $\alpha$-tilted posterior as the robust belief update and establishes the R\'enyi divergence as the appropriate measure of conditional information gain. To mitigate the bias and variance of nested Monte Carlo estimators needed to estimate Sibson's $\alpha$-MI, we adopt a PAC-Bayes framework to search over stochastic design policies, yielding rigorous high-probability lower bounds on the robust expected information gain that explicitly control finite-sample error.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper casts Bayesian experimental design as a max-min game under information constraints, which produces Sibson's α-MI as the objective along with PAC-Bayes bounds for stochastic policies.

read the letter

The main contribution is showing that a max-min game between the designer and an adversary whose actions are limited by information measures leads directly to Sibson's α-mutual information as the robust objective, with the α-tilted posterior as the corresponding belief update and Rényi divergence measuring conditional gain. They then use a PAC-Bayes approach to optimize over stochastic design policies and obtain high-probability lower bounds that control the error from nested Monte Carlo estimation of the objective. This gives a clean way to add robustness without ad-hoc regularization. The derivation from the game to these established quantities is direct and avoids circular definitions. The PAC-Bayes step is a practical addition that addresses finite-sample issues in estimating the information gain. The main limitation is that the information-theoretic constraints on the adversary may not capture all relevant forms of model misspecification, such as structural parametric errors or perturbations outside divergence balls; in those cases the saddle-point solution need not deliver the intended robustness. The abstract sketches the logic coherently, but the strength of the bounds and any empirical checks would need verification in the full text. This is aimed at researchers already using mutual-information criteria for experimental design who want a principled robustness layer. A reader comfortable with α-MI and PAC-Bayes would get immediate value from the connection and could adapt the bounds to their own problems. The formulation is new enough and the bounding technique grounded enough that it deserves a serious referee.

Referee Report

3 major / 2 minor

Summary. The manuscript formulates Bayesian experimental design under model misspecification as a max-min game between the experimenter and an adversarial nature subject to information-theoretic constraints on the adversary. It claims that this yields a robust objective governed by Sibson's α-mutual information, identifies the α-tilted posterior as the robust belief update, and establishes Rényi divergence as the measure of conditional information gain. A PAC-Bayes framework is introduced to search over stochastic design policies and obtain high-probability lower bounds on the robust expected information gain that control finite-sample bias and variance of nested Monte Carlo estimators.

Significance. If the central identification holds, the work supplies a principled information-theoretic route to robust BED together with PAC-Bayes guarantees that are absent from most existing robust-design literature. The explicit link to Sibson's α-MI and the Rényi divergence supplies a concrete, computable objective that could be adopted in settings where standard Bayesian designs degrade under misspecification.

major comments (3)

[§3] §3, formulation of the max-min game: the claim that information-theoretic constraints on the adversary (bounded mutual information or divergence balls) adequately encode model misspecification is load-bearing for the robustness interpretation. If structural or parametric misspecifications lie outside these sets, the saddle-point solution need not correspond to a robust belief update; a concrete counter-example or additional justification is required.
[Theorem 4.1] Theorem 4.1 (or equivalent derivation): the reduction of the max-min objective to Sibson's α-MI is asserted but the explicit steps equating the inner minimization over the adversary to the α-tilted posterior and the outer maximization to the Rényi conditional information gain are not fully expanded. Missing intermediate equalities prevent verification that no additional assumptions on the likelihood or prior are hidden.
[§5] §5, PAC-Bayes bound: the high-probability lower bound on the robust expected information gain controls finite-sample error for stochastic policies, yet the dependence of the bound on the choice of prior over policies and the resulting tightness for practical design spaces are not quantified. This affects whether the bound is useful for selecting designs or merely theoretical.

minor comments (2)

[Notation] Notation for the α-tilted posterior and the precise definition of Sibson's α-MI should be introduced with a dedicated display equation before the main theorems to improve readability.
[Abstract] The abstract states that the PAC-Bayes approach mitigates bias and variance of nested Monte Carlo estimators, but the exact form of the estimator (e.g., the number of inner samples, the importance weights) is not specified until later; a brief statement in the abstract or introduction would help.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below, indicating the revisions made to the manuscript.

read point-by-point responses

Referee: [§3] §3, formulation of the max-min game: the claim that information-theoretic constraints on the adversary (bounded mutual information or divergence balls) adequately encode model misspecification is load-bearing for the robustness interpretation. If structural or parametric misspecifications lie outside these sets, the saddle-point solution need not correspond to a robust belief update; a concrete counter-example or additional justification is required.

Authors: We agree that the information-theoretic constraints define a specific form of robustness to adversarial perturbations within a divergence ball around the nominal model. This does not encompass arbitrary structural or parametric misspecifications lying far outside the ball, which is a limitation of divergence-based robustness approaches in general. We will revise §3 to add explicit justification for the scope of the formulation together with a clarifying example distinguishing misspecifications captured by the constraint set from those that are not. revision: partial
Referee: [Theorem 4.1] Theorem 4.1 (or equivalent derivation): the reduction of the max-min objective to Sibson's α-MI is asserted but the explicit steps equating the inner minimization over the adversary to the α-tilted posterior and the outer maximization to the Rényi conditional information gain are not fully expanded. Missing intermediate equalities prevent verification that no additional assumptions on the likelihood or prior are hidden.

Authors: We thank the referee for highlighting the need for expanded detail. The revised version of Theorem 4.1 (and its proof) now includes all intermediate equalities, beginning from the saddle-point objective, invoking the variational representation of Sibson's α-MI, identifying the α-tilted posterior as the minimizing adversary, and arriving at the Rényi conditional information gain. No additional assumptions beyond standard positivity and integrability conditions on the likelihood and prior are required. revision: yes
Referee: [§5] §5, PAC-Bayes bound: the high-probability lower bound on the robust expected information gain controls finite-sample error for stochastic policies, yet the dependence of the bound on the choice of prior over policies and the resulting tightness for practical design spaces are not quantified. This affects whether the bound is useful for selecting designs or merely theoretical.

Authors: The dependence on the policy prior is a standard feature of PAC-Bayes bounds. We have added a remark in §5 discussing practical choices of the prior (e.g., a simple isotropic Gaussian centered on a nominal design) and its influence on bound tightness. We also include further numerical results in the experiments demonstrating that the bound remains sufficiently tight to be useful for design selection in the settings considered, while acknowledging that quantitative tightness is inherently problem-dependent. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation from external max-min game to established information measures is self-contained

full rationale

The paper starts from an external max-min game between experimenter and adversarial nature under information-theoretic constraints, then derives that the resulting robust objective is governed by Sibson's α-mutual information, with the α-tilted posterior as the update and Rényi divergence as the conditional gain. This is a direct mapping from the game formulation to pre-existing information-theoretic quantities rather than a self-definition, a fitted parameter renamed as a prediction, or a load-bearing self-citation chain. The subsequent adoption of a PAC-Bayes framework to bound nested Monte Carlo estimators for the α-MI is an external technique applied to estimation error control and does not reduce the central claim to its own inputs. No equations or steps in the provided text exhibit the enumerated circular patterns; the derivation remains independent of the target result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that model misspecification is representable via an information-constrained adversary and on the mathematical identification of the resulting value with Sibson's α-MI.

axioms (1)

domain assumption Model misspecification can be captured by an adversarial nature whose choices are restricted by information-theoretic constraints.
Invoked to justify the max-min game setup in the abstract.

pith-pipeline@v0.9.0 · 5441 in / 1228 out tokens · 93266 ms · 2026-05-15T11:07:06.039140+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages · 1 internal anchor

[1]

Z., Sloman, S

Barlas, Y . Z., Sloman, S. J., and Kaski, S. Robust experi- mental design via generalised Bayesian inference.arXiv preprint arXiv:2511.07671,

work page arXiv
[2]

R., Gastpar, M., and Issa, I

Esposito, A. R., Gastpar, M., and Issa, I. Sibson’s α-mutual information and its variational representations.arXiv preprint arXiv:2405.08352,

work page arXiv
[3]

Soft Actor-Critic Algorithms and Applications

Haarnoja, T., Zhou, A., Hartikainen, K., Tucker, G., Ha, S., Tan, J., Kumar, V ., Zhu, H., Gupta, A., Abbeel, P., et al. Soft actor-critic algorithms and applications.arXiv preprint arXiv:1812.05905,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

M., Holloway-Brown, J., and McGree, J

Overstall, A. M., Holloway-Brown, J., and McGree, J. M. Gibbs optimal design of experiments.arXiv preprint arXiv:2310.17440,

work page arXiv
[5]

J., and Kaski, S

Tang, R., Sloman, S. J., and Kaski, S. Generalization analy- sis for Bayesian optimal experiment design under model misspecification.arXiv preprint arXiv:2506.07805,

work page arXiv
[6]

and Martin, R

Wu, P.-S. and Martin, R. Calibrating generalized predictive distributions.arXiv preprint arXiv:2107.01688,

work page arXiv
[7]

Finally, Appendix F discusses different definitions for α-mutual information and their interpretations as adversarial games with differing constraints. B. Proofs B.1. Proof of Lemma 1 We decompose the Kullback–Leibler divergence as follows: DKL p(θ, x|ξ) µ(θ)ν(x|ξ) =D KL p(θ, x|ξ) p(θ)p(x|ξ) +D KL p(θ) µ(θ) +D KL p(x|ξ) ν(x|ξ) . The second and third terms...

work page 2004
[8]

Consequently, the log-integral term is strictly bounded: −∞<logC 0 ≤log Z Z p(θ)p(x|θ, ξ) α dθ 1/α dx≤0

By Assumption 1, we haveH 0(ξ)≥C 0 >0. Consequently, the log-integral term is strictly bounded: −∞<logC 0 ≤log Z Z p(θ)p(x|θ, ξ) α dθ 1/α dx≤0. This bound holds uniformly in ξ and α. Since the multiplier α/(1−α) converges to zero, the robust expected information gain vanishes asα→0, completing the proof. B.6. Proof of Proposition 4 This assumption, used i...

work page 2020
[9]

16 Maximin Robust Bayesian Experimental Design B.7

Finally, combining these two bounds yields the stated result. 16 Maximin Robust Bayesian Experimental Design B.7. Proof of Proposition 5 We briefly state a two-sided version of Hoeffding’s inequality for bounded random variables that aids in the proof. Lemma 5(Two-sided Hoeffding inequality for bounded random variables, Vershynin, 2018).Let X1, . . . , XN...

work page 2018
[10]

According to Lemma 3, the deterministic term E[˜g]−g is upper bounded byL hσw/ √ M

By inovking the Lipschitz continuity of fand applying the triangle inequality, we decompose the total error into astochasticanddeterministiccomponent: ˜I S α −I S α = f(˜g)−f(g) ≤L f ˜g−g ≤L f ˜g−E[˜g] | {z } stochastic +L f E[˜g]−g | {z } deterministic . According to Lemma 3, the deterministic term E[˜g]−g is upper bounded byL hσw/ √ M. To ensure the tot...

work page 2024
[11]

to the expectationEρ[J(ξ)]completes the proof. Employing the stochastic policy offers two primary benefits: (i) it permits a more tractable and flexible theoretical treatment by circumventing dependency on global complexity measures, and (ii) it promotes exploration across the design space Ξ, as discussed in Flynn et al. (2023). Furthermore, it may yield ...

work page 2023
[12]

corresponds to the most agnostic stance. Nature is free to manipulate the full joint in order to minimize the information over the ambiguity set, while the marginal decomposition is optimized to match the worst-case marginals: ILP α (ξ) := inf q∈Qρ inf µ,ν DKL h q(θ, x|ξ)∥µ(θ)ν(x|ξ) i . This metric evaluates an experiment’s ability to recoverθ under worst...

work page 2002