Rashomon-Seeded Annealing for Robust Bayesian Inference in Factorial Designs

Soumyakanti Pan; Tyler H. McCormick; Yiyang Fan

arxiv: 2606.02589 · v1 · pith:IIWPK4TUnew · submitted 2026-05-21 · 📊 stat.ME · stat.ML

Rashomon-Seeded Annealing for Robust Bayesian Inference in Factorial Designs

Yiyang Fan , Soumyakanti Pan , Tyler H. McCormick This is my paper

Pith reviewed 2026-06-30 16:30 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords Bayesian model averagingfactorial designsRashomon setsannealed importance samplingmodel uncertaintyposterior inferenceinteraction effects

0 comments

The pith

Rashomon sets initialize annealed importance sampling to recover consistent full posteriors over factorial model spaces without exhaustive enumeration.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that Rashomon sets of high-performing models can serve as starting points for annealed importance sampling in Bayesian model averaging for factorial designs. This seeding anchors the procedure in high-evidence regions while the annealing step corrects back to the full posterior distribution. A sympathetic reader would care because standard MCMC struggles with the multimodal posteriors created by combinatorial interaction effects, and this method avoids both truncation to the Rashomon set and the need to visit every model. The resulting self-normalized estimators then produce model-averaged cell means, credible intervals, and uncertainty measures directly.

Core claim

Rashomon-seeded annealing initializes annealed importance sampling by anchoring the starting density inside pre-identified Rashomon Partition Sets, then applies the annealing correction to restore unbiased inference over the entire model space, producing consistent self-normalized posterior summaries without enumerating the complete model space.

What carries the argument

Rashomon Partition Sets (RPS) as a certified seed constructor that supplies the initial density for AIS while preserving global support over the model space.

If this is right

Model-averaged cell means become available as consistent estimators.
Credible intervals and uncertainty summaries can be formed without visiting the full model space.
The procedure handles multimodal posteriors that defeat standard MCMC in factorial designs.
Any high-posterior seed set can serve as a proposal mechanism for AIS-based model averaging.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same seeding idea could be tested in other combinatorial model spaces where Rashomon sets are easy to locate.
Combining RPS seeds with different annealing schedules might further reduce variance in the self-normalized weights.
The approach suggests a general template for turning any computationally cheap high-evidence set into a starting distribution for full posterior sampling.

Load-bearing premise

Rashomon sets can be identified as effective high-evidence seeds that let annealed importance sampling restore unbiased full posterior inference while keeping global support.

What would settle it

In a small factorial design where exhaustive enumeration is feasible, the self-normalized cell means or credible intervals obtained from the seeded AIS differ systematically from the exact values computed by enumerating every model.

Figures

Figures reproduced from arXiv: 2606.02589 by Soumyakanti Pan, Tyler H. McCormick, Yiyang Fan.

**Figure 1.** Figure 1: Hasse diagrams for Example 1 with the third feature fixed at level x. (a) A permissible partition with two pools; (b) a non-permissible partition with three pools; and (c) the saturated case where each cell forms its own pool. Distinct colors distinguish separate pools. In practice, however, selecting a seed set from high-posterior regions, such as the Rashomon set, drastically improves efficiency; an init… view at source ↗

**Figure 2.** Figure 2: Inferential accuracy relative to the exact posterior: L1 deviation of posterior summaries across varying Rashomon thresholds ϵ. Rashomon-seeded annealing (AIS) consistently outperforms RPS-truncation and PAC-Bayesian (PB), approaching the MCMC posterior as the seed set expands. as a definitive baseline. Scenario 2 considers the setting with M = 3 features at R1 = 4, R2 = 3 and R3 = 3 levels (K = 36 cells… view at source ↗

**Figure 3.** Figure 3: A:Comparison between the loU metric for agreement of intervals produced by MCMC, for RPS, AIS, PB with different ϵ; B:running time for AIS and PB for different ϵ, and MCMC alignment with the reference interval, whereas lower values signify distortions from the reference interval. In Scenario 2, Rashomon-seeded annealing (henceforth AIS) demonstrates a superior ability to recover the global posterior mass… view at source ↗

read the original abstract

Integrating over model uncertainty in factorial designs via Bayesian model averaging is hindered by the combinatorial explosion of interpretable interaction effects, often yielding a multimodal posterior, where standard Markov chain Monte Carlo algorithms encounter significant convergence issues. We propose a general computational framework that repurposes Rashomon sets, collections of high-performing models traditionally valued for prediction and interpretability, as a strategic "warm start" for estimating the full posterior. Our method, Rashomon-seeded annealing, initializes annealed importance sampling (AIS) by anchoring the starting density within these pre-identified, high-evidence regions while preserving global support over the entire model space. Rather than restricting inference to the Rashomon set and understating uncertainty, the AIS correction restores full posterior inference, turning the Rashomon certificate from an inferential truncation into a proposal mechanism. We demonstrate this approach using Rashomon Partition Sets (RPS) as a rigorous, certified seed constructor for factorial designs. The resulting algorithm yields consistent self-normalized posterior summaries, such as model-averaged cell means, credible intervals, and uncertainty summaries without exhaustive enumeration of the complete model space. This bridges the gap between high-evidence model discovery and rigorous Bayesian inference, and outlines a general strategy in which any high-posterior seed set can provide computational leverage for AIS-based model averaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean way to seed AIS from Rashomon sets so you get consistent full-posterior summaries in factorial BMA without enumerating everything.

read the letter

The main point is that Rashomon sets supply a high-evidence starting density for annealed importance sampling, then the standard AIS correction recovers unbiased model-averaged quantities while keeping support over the whole model space.

What works is the framing: factorial designs produce too many interaction terms for ordinary MCMC, and simply restricting to the Rashomon set would understate uncertainty. By treating the set only as a proposal seed and letting the annealing path connect to the target, the method avoids that truncation. The consistency claim for the self-normalized estimators follows directly from existing AIS results once the seed is fixed.

The soft spot is practical rather than logical. The paper needs to show that the Rashomon Partition Sets actually hit the relevant modes in the examples they run; if the sets miss mass that matters for the cell means or credible intervals, the finite-sample behavior could degrade even if the asymptotic guarantee holds. I would also want to see how sensitive the results are to the choice of Rashomon threshold.

This is for people who already work on computational Bayesian model averaging in designed experiments. A reader who needs a new sampler for factorial data with many interactions will find a usable idea here.

It deserves a serious referee because the construction is internally consistent and the computational problem it targets is real.

Referee Report

3 major / 1 minor

Summary. The paper proposes Rashomon-seeded annealing, a framework that repurposes Rashomon sets (via Rashomon Partition Sets) as high-evidence seeds to initialize annealed importance sampling for Bayesian model averaging over the combinatorially large space of interaction models in factorial designs. It claims that anchoring the starting density in these regions while preserving global support, followed by the standard AIS correction, produces consistent self-normalized posterior summaries (model-averaged cell means, credible intervals) without exhaustive enumeration of the model space.

Significance. If the consistency claim holds and the method is shown to be correctly implemented, the approach would provide a computationally tractable route to full posterior inference in settings where standard MCMC fails due to multimodality, while leveraging existing Rashomon-set machinery for model discovery.

major comments (3)

[Abstract] Abstract: the assertion that 'the AIS correction restores full posterior inference' and 'yields consistent self-normalized posterior summaries' is made without any derivation, theorem statement, or statement of the required conditions on the annealing schedule and proposal construction.
[Abstract] Abstract: no simulation study, real-data example, convergence diagnostic, or comparison against standard AIS or MCMC is supplied, leaving the claim that the method 'avoids exhaustive enumeration' and resolves convergence issues unsupported by evidence.
[Abstract] Abstract: the description of how the Rashomon set supplies a starting density 'while preserving global support over the entire model space' is stated at a conceptual level only; the explicit form of the initial density, the annealing path, and the weight normalization that would guarantee the claimed consistency are absent.

minor comments (1)

[Abstract] Abstract: the parenthetical '(RPS)' is introduced without a prior definition or citation to the construction of Rashomon Partition Sets.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below, clarifying the theoretical basis drawn from standard AIS results and outlining planned revisions to strengthen the presentation of both theory and evidence.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'the AIS correction restores full posterior inference' and 'yields consistent self-normalized posterior summaries' is made without any derivation, theorem statement, or statement of the required conditions on the annealing schedule and proposal construction.

Authors: The consistency claim rests on the fact that the Rashomon-seeded initial density is constructed to have full support over the model space (via a mixture with a uniform component), after which the standard AIS estimator is consistent under the usual conditions on the annealing schedule (geometric path with sufficient intermediates to bound variance) and the proposal. We will add an explicit statement of these conditions together with a reference to the relevant AIS consistency theorems (e.g., Neal 2001) in both the abstract and the main theoretical section. revision: yes
Referee: [Abstract] Abstract: no simulation study, real-data example, convergence diagnostic, or comparison against standard AIS or MCMC is supplied, leaving the claim that the method 'avoids exhaustive enumeration' and resolves convergence issues unsupported by evidence.

Authors: The manuscript contains a demonstration of the RPS-based seed construction on factorial designs that illustrates avoidance of exhaustive enumeration. To address the concern directly, we will expand this demonstration into a fuller simulation study that includes comparisons against standard AIS and MCMC, along with convergence diagnostics such as effective sample size and autocorrelation, in the revised version. revision: yes
Referee: [Abstract] Abstract: the description of how the Rashomon set supplies a starting density 'while preserving global support over the entire model space' is stated at a conceptual level only; the explicit form of the initial density, the annealing path, and the weight normalization that would guarantee the claimed consistency are absent.

Authors: The explicit mixture form of the initial density (RPS models plus uniform component), the geometric annealing path, and the standard self-normalized AIS weight computation are defined in the methods section. We will add a concise summary of these explicit constructions to the abstract and ensure all formulas appear with clear notation in the main text. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper proposes a computational method that uses Rashomon sets (or RPS) as seeds to initialize annealed importance sampling while preserving global support over the model space, then applies standard AIS corrections to obtain consistent self-normalized posterior summaries. No equations, derivations, or load-bearing steps are visible in the provided text that reduce by construction to fitted inputs, self-definitions, or self-citation chains. The central claim follows from the known consistency properties of self-normalized AIS under the stated conditions and does not rely on any internal reduction to its own outputs. This is the expected honest finding for a methods paper whose contribution is algorithmic rather than a closed mathematical derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no free parameters, axioms, or invented entities can be extracted.

pith-pipeline@v0.9.1-grok · 5765 in / 980 out tokens · 36784 ms · 2026-06-30T16:30:08.379223+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 31 canonical work pages · 1 internal anchor

[1]

URLhttps://doi.org/10.1093/molbev/mss084

doi: 10.1093/molbev/mss084. URLhttps://doi.org/10.1093/molbev/mss084. Abhijit Banerjee, Arun G. Chandrasekhar, Suresh Dalpath, Esther Duflo, John Floretta, Matthew O. Jackson, Harini Kannan, Francine Loza, Anirudh Sankar, Anna Schrimpf, and Maheshwor Shrestha. Selecting the most effective nudge: Evidence from a large-scale experi- ment on immunization.Eco...

work page doi:10.1093/molbev/mss084
[2]

URL https://doi.org/10.3982/ECTA19739

doi: 10.3982/ECTA19739. URL https://doi.org/10.3982/ECTA19739. Leo Breiman. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author).Statistical Science, 16(3):199–231,

work page doi:10.3982/ecta19739
[3]

URLhttps: //doi.org/10.1214/ss/1009213726

doi: 10.1214/ss/1009213726. URLhttps: //doi.org/10.1214/ss/1009213726. Ben Calderhead and Mark Girolami. Estimating Bayes factors via thermodynamic integration and population MCMC.Computational Statistics & Data Analysis, 53(12):4028–4045,

work page doi:10.1214/ss/1009213726
[4]

doi: 10.1016/j.csda.2009.07.025

ISSN 0167-9473. doi: 10.1016/j.csda.2009.07.025. URLhttps://doi.org/10.1016/j.csda.2009.07

work page doi:10.1016/j.csda.2009.07.025 2009
[5]

URLhttps://doi.org/10.3390/e21111109

doi: 10.3390/e21111109. URLhttps://doi.org/10.3390/e21111109. Siddhartha Chib and Xiaming Zeng. Which factors are risk factors in asset pricing? A model scan framework.Journal of Business & Economic Statistics, 38(4):771–783,

work page doi:10.3390/e21111109
[6]

Current Principles of Motor Control , with Special Reference to Vertebrate Locomotion

doi: 10.1080/ 07350015.2019.1573684. URLhttps://doi.org/10.1080/07350015.2019.1573684. Jiayun Dong and Cynthia Rudin. Exploring the cloud of variable importance for the set of all good models.Nature Machine Intelligence, 2(12):810–824,

work page doi:10.1080/07350015.2019.1573684 2019
[7]

doi: 10.1038/ s42256-020-00264-0

ISSN 2522-5839. doi: 10.1038/ s42256-020-00264-0. URLhttps://doi.org/10.1038/s42256-020-00264-0. Yang Fan, Rongqi Wu, Ming-Hui Chen, Lynn Kuo, and Paul O. Lewis. Choosing among partition models in Bayesian phylogenetics.Molecular Biology and Evolution, 28(1):523–532,

work page doi:10.1038/s42256-020-00264-0
[8]

URLhttps://doi.org/10.1093/molbev/msq224

doi: 10.1093/molbev/msq224. URLhttps://doi.org/10.1093/molbev/msq224. Nial Friel and Anthony N. Pettitt. Marginal likelihood estimation via power posteriors.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3):589–607,

work page doi:10.1093/molbev/msq224
[9]

Journal of the Royal Statistical Society Series B , author=

1111/j.1467-9868.2007.00650.x. URLhttps://doi.org/10.1111/j.1467-9868.2007.00650.x. Andrew Gelman and Xiao-Li Meng. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling.Statistical Science, 13(2):163–185,

work page doi:10.1111/j.1467-9868.2007.00650.x 2007
[10]

URLhttps://doi.org/10.1214/ss/1028905934

doi: 10.1214/ss/ 1028905934. URLhttps://doi.org/10.1214/ss/1028905934. Edward I. George and Robert E. McCulloch. Variable selection via Gibbs sampling.Journal of the American Statistical Association, 88(423):881–889,

work page doi:10.1214/ss/
[11]

URLhttps://doi.org/10.1080/01621459.1993.10476353

doi: 10.1080/01621459.1993.10476353. URLhttps://doi.org/10.1080/01621459.1993.10476353. Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estima- tion.Journal of the American Statistical Association, 102(477):359–378,

work page doi:10.1080/01621459.1993.10476353 1993
[12]

and RAFTERY, A

doi: 10.1198/ 016214506000001437. URLhttps://doi.org/10.1198/016214506000001437. Yongtao Guan and Matthew Stephens. Bayesian variable selection regression for genome-wide association studies and other large-scale problems.The Annals of Applied Statistics, 5(3):1780– 1815,

work page doi:10.1198/016214506000001437
[13]

URLhttps://doi.org/10.1214/11-AOAS455

doi: 10.1214/11-AOAS455. URLhttps://doi.org/10.1214/11-AOAS455. 13 Benjamin Guedj. A primer on PAC-Bayesian learning,

work page doi:10.1214/11-aoas455
[14]

A primer on pac-bayesian learning.ArXiv, abs/1901.05353,

URLhttps://doi.org/10.48550/ arXiv.1901.05353. Chris Hans, Adrian Dobra, and Mike West. Shotgun stochastic search for Regression Variable Selection.Journal of the American Statistical Association, 102(478):507–516,

work page arXiv 1901
[15]

URLhttps://doi.org/10.1198/016214507000000121

doi: 10.1198/ 016214507000000121. URLhttps://doi.org/10.1198/016214507000000121. Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. Volinsky. Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draperand E. I. George, and a rejoinder by the authors).Statistical Science, 14(4):382–417,

work page doi:10.1198/016214507000000121
[16]

URL https://doi.org/10.1214/ss/1009212519

doi: 10.1214/ss/1009212519. URL https://doi.org/10.1214/ss/1009212519. Aliaksandr Hubin and Geir Storvik. Mode-jumping MCMC for Bayesian variable selection in generalized linear models.Computational Statistics & Data Analysis, 127:281–297,

work page doi:10.1214/ss/1009212519
[17]

URLhttps://doi.org/10.1016/j.csda.2018.05.020

doi: 10.1016/j.csda.2018.05.020. URLhttps://doi.org/10.1016/j.csda.2018.05.020. Dean Karlan and John A List. Does price matter in charitable giving? evidence from a large-scale natural field experiment.American Economic Review, 97(5):1774–1793,

work page doi:10.1016/j.csda.2018.05.020 2018
[18]

URLhttps://doi.org/10.7910/DVN/27853

doi: 10.7910/ DVN/27853. URLhttps://doi.org/10.7910/DVN/27853. Nicolas Lartillot and Hervé Philippe. Computing Bayes factors using thermodynamic integration. Systematic Biology, 55(2):195–207,

work page doi:10.7910/dvn/27853
[19]

URLhttps://doi

doi: 10.1080/10635150500433722. URLhttps://doi. org/10.1080/10635150500433722. David Madigan and Adrian E. Raftery. Model selection and accounting for model uncertainty in graphical models using occam’s window.Journal of the American Statistical Association, 89 (428):1535–1546,

work page doi:10.1080/10635150500433722
[20]

Robins, Andrea Rotnitzky, and Lue Ping Zhao

doi: 10.1080/01621459.1994.10476894. URLhttps://doi.org/10.1080/ 01621459.1994.10476894. David Madigan, Adrian E Raftery, C Volinsky, and Jennifer Hoeting. Bayesian model averaging. In Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR, pages 77–83,

work page doi:10.1080/01621459.1994.10476894 1994
[21]

URLhttps://doi.org/10.1145/307400.307435

doi: 10.1145/307400.307435. URLhttps://doi.org/10.1145/307400.307435. Radford M. Neal. Annealed importance sampling.Statistics and Computing, 11:125–139,

work page doi:10.1145/307400.307435
[22]

URLhttps://doi.org/10.1023/A:1008923215028

doi: 10.1023/A:1008923215028. URLhttps://doi.org/10.1023/A:1008923215028. Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting. Bayesian model averaging for linear regression models.Journal of the American Statistical Association, 92(437):179–191,

work page doi:10.1023/a:1008923215028
[23]

URLhttps://doi.org/10.1080/01621459.1997.10473615

doi: 10.1080/01621459.1997.10473615. URLhttps://doi.org/10.1080/01621459.1997.10473615. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1(5):206–215,

work page doi:10.1080/01621459.1997.10473615 1997
[24]

, year 2019

doi: 10.1038/ s42256-019-0048-x. URLhttps://doi.org/10.1038/s42256-019-0048-x. 14 Lesia Semenova, Cynthia Rudin, and Ronald Parr. On the existence of simpler machine learning models. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Trans- parency, FAccT ’22, page 1827–1858, New York, NY, USA,

work page doi:10.1038/s42256-019-0048-x 2022
[25]

doi: 10.1145/3531146.3533232

Association for Computing Machinery. doi: 10.1145/3531146.3533232. URLhttps://doi.org/10.1145/3531146.3533232. Surya T. Tokdar and Robert E. Kass. Importance sampling: a review.WIREs Computational Statistics, 2(1):54–60,

work page doi:10.1145/3531146.3533232
[26]

URLhttps://doi.org/10.1002/wics.56

doi: 10.1002/wics.56. URLhttps://doi.org/10.1002/wics.56. Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, and Tyler H. McCormick. Robustly estimating heterogeneity in factorial data using rashomon partitions,

work page doi:10.1002/wics.56
[27]

URLhttps: //doi.org/10.48550/arXiv.2404.02141. Chris T. Volinsky, David Madigan, Adrian E. Raftery, and Richard A. Kronmal. Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke.Applied Statistics, 46 (4):433–448,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.02141
[28]

URLhttps://doi.org/10.1111/1467-9876

doi: 10.1111/1467-9876.00082. URLhttps://doi.org/10.1111/1467-9876. 00082. Rui Xin, Chudi Zhong, Zhi Chen, Takuya Takagi, Margo Seltzer, and Cynthia Rudin. Exploring the whole rashomon set of sparse decision trees. InAdvances in Neu- ral Information Processing Systems, volume 35, pages 14071–14084. Curran Asso- ciates, Inc.,

work page doi:10.1111/1467-9876.00082
[29]

Yun Yang, Martin J

URLhttps://proceedings.neurips.cc/paper_files/paper/2022/file/ 5afaa8b4dd18eb1eed055d2d821b58ae-Paper-Conference.pdf. Yun Yang, Martin J. Wainwright, and Michael I. Jordan. On the computational complexity of MCMC-based Bayesian variable selection.The Annals of Statistics, 44(5):2025–2053,

2022
[30]

, Wainwright , Martin J M

doi: 10.1214/15-AOS1417. URLhttps://doi.org/10.1214/15-AOS1417. Arnold Zellner. On assessing prior distributions and Bayesian regression analysis withg-prior distributions. In Prem K. Goel and Arnold Zellner, editors,Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pages 233–243. Elsevier Science Publishers,

work page doi:10.1214/15-aos1417
[31]

URL https://doi.org/10.1007/BF02888369

doi: 10.1007/BF02888369. URL https://doi.org/10.1007/BF02888369. Yan Zhou, Adam M. Johansen, and John A.D. Aston. Toward automatic model comparison: An adaptive sequential Monte Carlo approach.Journal of Computational and Graphical Statistics, 25(3):701–726,

work page doi:10.1007/bf02888369
[32]

URLhttps://doi.org/10.1080/ 10618600.2015.1060885

doi: 10.1080/10618600.2015.1060885. URLhttps://doi.org/10.1080/ 10618600.2015.1060885. 15 AppendixA1.Proof of the theoretical results We prove the almost-sure consistency of the self-normalized AIS estimator stated in Theorem 1 and its corollaries. The notation is exactly that of Section 2:Qis the joint distribution of a single model–weight pair(M, w)prod...

work page doi:10.1080/10618600.2015.1060885 2015
[33]

incremental change

guarantees that there exists a constant C= CT C0 >0, whereC T = P M∈M ˜p(M| D)andC 0 = P M∈M q0(M;S)are the normalizing constants of the unnormalized posterior˜p(· | D)and the unnormalized initial densityq0(·;S), respectively. For any bounded measurable functionζ:M →R p, EQ w ζ(M) =CE M|D ζ(M) .(A1) Settingζas the unit function yieldsE Q[w] =C. A1.2.Proof...

2019

[1] [1]

URLhttps://doi.org/10.1093/molbev/mss084

doi: 10.1093/molbev/mss084. URLhttps://doi.org/10.1093/molbev/mss084. Abhijit Banerjee, Arun G. Chandrasekhar, Suresh Dalpath, Esther Duflo, John Floretta, Matthew O. Jackson, Harini Kannan, Francine Loza, Anirudh Sankar, Anna Schrimpf, and Maheshwor Shrestha. Selecting the most effective nudge: Evidence from a large-scale experi- ment on immunization.Eco...

work page doi:10.1093/molbev/mss084

[2] [2]

URL https://doi.org/10.3982/ECTA19739

doi: 10.3982/ECTA19739. URL https://doi.org/10.3982/ECTA19739. Leo Breiman. Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author).Statistical Science, 16(3):199–231,

work page doi:10.3982/ecta19739

[3] [3]

URLhttps: //doi.org/10.1214/ss/1009213726

doi: 10.1214/ss/1009213726. URLhttps: //doi.org/10.1214/ss/1009213726. Ben Calderhead and Mark Girolami. Estimating Bayes factors via thermodynamic integration and population MCMC.Computational Statistics & Data Analysis, 53(12):4028–4045,

work page doi:10.1214/ss/1009213726

[4] [4]

doi: 10.1016/j.csda.2009.07.025

ISSN 0167-9473. doi: 10.1016/j.csda.2009.07.025. URLhttps://doi.org/10.1016/j.csda.2009.07

work page doi:10.1016/j.csda.2009.07.025 2009

[5] [5]

URLhttps://doi.org/10.3390/e21111109

doi: 10.3390/e21111109. URLhttps://doi.org/10.3390/e21111109. Siddhartha Chib and Xiaming Zeng. Which factors are risk factors in asset pricing? A model scan framework.Journal of Business & Economic Statistics, 38(4):771–783,

work page doi:10.3390/e21111109

[6] [6]

Current Principles of Motor Control , with Special Reference to Vertebrate Locomotion

doi: 10.1080/ 07350015.2019.1573684. URLhttps://doi.org/10.1080/07350015.2019.1573684. Jiayun Dong and Cynthia Rudin. Exploring the cloud of variable importance for the set of all good models.Nature Machine Intelligence, 2(12):810–824,

work page doi:10.1080/07350015.2019.1573684 2019

[7] [7]

doi: 10.1038/ s42256-020-00264-0

ISSN 2522-5839. doi: 10.1038/ s42256-020-00264-0. URLhttps://doi.org/10.1038/s42256-020-00264-0. Yang Fan, Rongqi Wu, Ming-Hui Chen, Lynn Kuo, and Paul O. Lewis. Choosing among partition models in Bayesian phylogenetics.Molecular Biology and Evolution, 28(1):523–532,

work page doi:10.1038/s42256-020-00264-0

[8] [8]

URLhttps://doi.org/10.1093/molbev/msq224

doi: 10.1093/molbev/msq224. URLhttps://doi.org/10.1093/molbev/msq224. Nial Friel and Anthony N. Pettitt. Marginal likelihood estimation via power posteriors.Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(3):589–607,

work page doi:10.1093/molbev/msq224

[9] [9]

Journal of the Royal Statistical Society Series B , author=

1111/j.1467-9868.2007.00650.x. URLhttps://doi.org/10.1111/j.1467-9868.2007.00650.x. Andrew Gelman and Xiao-Li Meng. Simulating normalizing constants: from importance sampling to bridge sampling to path sampling.Statistical Science, 13(2):163–185,

work page doi:10.1111/j.1467-9868.2007.00650.x 2007

[10] [10]

URLhttps://doi.org/10.1214/ss/1028905934

doi: 10.1214/ss/ 1028905934. URLhttps://doi.org/10.1214/ss/1028905934. Edward I. George and Robert E. McCulloch. Variable selection via Gibbs sampling.Journal of the American Statistical Association, 88(423):881–889,

work page doi:10.1214/ss/

[11] [11]

URLhttps://doi.org/10.1080/01621459.1993.10476353

doi: 10.1080/01621459.1993.10476353. URLhttps://doi.org/10.1080/01621459.1993.10476353. Tilmann Gneiting and Adrian E Raftery. Strictly proper scoring rules, prediction, and estima- tion.Journal of the American Statistical Association, 102(477):359–378,

work page doi:10.1080/01621459.1993.10476353 1993

[12] [12]

and RAFTERY, A

doi: 10.1198/ 016214506000001437. URLhttps://doi.org/10.1198/016214506000001437. Yongtao Guan and Matthew Stephens. Bayesian variable selection regression for genome-wide association studies and other large-scale problems.The Annals of Applied Statistics, 5(3):1780– 1815,

work page doi:10.1198/016214506000001437

[13] [13]

URLhttps://doi.org/10.1214/11-AOAS455

doi: 10.1214/11-AOAS455. URLhttps://doi.org/10.1214/11-AOAS455. 13 Benjamin Guedj. A primer on PAC-Bayesian learning,

work page doi:10.1214/11-aoas455

[14] [14]

A primer on pac-bayesian learning.ArXiv, abs/1901.05353,

URLhttps://doi.org/10.48550/ arXiv.1901.05353. Chris Hans, Adrian Dobra, and Mike West. Shotgun stochastic search for Regression Variable Selection.Journal of the American Statistical Association, 102(478):507–516,

work page arXiv 1901

[15] [15]

URLhttps://doi.org/10.1198/016214507000000121

doi: 10.1198/ 016214507000000121. URLhttps://doi.org/10.1198/016214507000000121. Jennifer A. Hoeting, David Madigan, Adrian E. Raftery, and Chris T. Volinsky. Bayesian model averaging: a tutorial (with comments by M. Clyde, David Draperand E. I. George, and a rejoinder by the authors).Statistical Science, 14(4):382–417,

work page doi:10.1198/016214507000000121

[16] [16]

URL https://doi.org/10.1214/ss/1009212519

doi: 10.1214/ss/1009212519. URL https://doi.org/10.1214/ss/1009212519. Aliaksandr Hubin and Geir Storvik. Mode-jumping MCMC for Bayesian variable selection in generalized linear models.Computational Statistics & Data Analysis, 127:281–297,

work page doi:10.1214/ss/1009212519

[17] [17]

URLhttps://doi.org/10.1016/j.csda.2018.05.020

doi: 10.1016/j.csda.2018.05.020. URLhttps://doi.org/10.1016/j.csda.2018.05.020. Dean Karlan and John A List. Does price matter in charitable giving? evidence from a large-scale natural field experiment.American Economic Review, 97(5):1774–1793,

work page doi:10.1016/j.csda.2018.05.020 2018

[18] [18]

URLhttps://doi.org/10.7910/DVN/27853

doi: 10.7910/ DVN/27853. URLhttps://doi.org/10.7910/DVN/27853. Nicolas Lartillot and Hervé Philippe. Computing Bayes factors using thermodynamic integration. Systematic Biology, 55(2):195–207,

work page doi:10.7910/dvn/27853

[19] [19]

URLhttps://doi

doi: 10.1080/10635150500433722. URLhttps://doi. org/10.1080/10635150500433722. David Madigan and Adrian E. Raftery. Model selection and accounting for model uncertainty in graphical models using occam’s window.Journal of the American Statistical Association, 89 (428):1535–1546,

work page doi:10.1080/10635150500433722

[20] [20]

Robins, Andrea Rotnitzky, and Lue Ping Zhao

doi: 10.1080/01621459.1994.10476894. URLhttps://doi.org/10.1080/ 01621459.1994.10476894. David Madigan, Adrian E Raftery, C Volinsky, and Jennifer Hoeting. Bayesian model averaging. In Proceedings of the AAAI Workshop on Integrating Multiple Learned Models, Portland, OR, pages 77–83,

work page doi:10.1080/01621459.1994.10476894 1994

[21] [21]

URLhttps://doi.org/10.1145/307400.307435

doi: 10.1145/307400.307435. URLhttps://doi.org/10.1145/307400.307435. Radford M. Neal. Annealed importance sampling.Statistics and Computing, 11:125–139,

work page doi:10.1145/307400.307435

[22] [22]

URLhttps://doi.org/10.1023/A:1008923215028

doi: 10.1023/A:1008923215028. URLhttps://doi.org/10.1023/A:1008923215028. Adrian E. Raftery, David Madigan, and Jennifer A. Hoeting. Bayesian model averaging for linear regression models.Journal of the American Statistical Association, 92(437):179–191,

work page doi:10.1023/a:1008923215028

[23] [23]

URLhttps://doi.org/10.1080/01621459.1997.10473615

doi: 10.1080/01621459.1997.10473615. URLhttps://doi.org/10.1080/01621459.1997.10473615. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead.Nature Machine Intelligence, 1(5):206–215,

work page doi:10.1080/01621459.1997.10473615 1997

[24] [24]

, year 2019

doi: 10.1038/ s42256-019-0048-x. URLhttps://doi.org/10.1038/s42256-019-0048-x. 14 Lesia Semenova, Cynthia Rudin, and Ronald Parr. On the existence of simpler machine learning models. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Trans- parency, FAccT ’22, page 1827–1858, New York, NY, USA,

work page doi:10.1038/s42256-019-0048-x 2022

[25] [25]

doi: 10.1145/3531146.3533232

Association for Computing Machinery. doi: 10.1145/3531146.3533232. URLhttps://doi.org/10.1145/3531146.3533232. Surya T. Tokdar and Robert E. Kass. Importance sampling: a review.WIREs Computational Statistics, 2(1):54–60,

work page doi:10.1145/3531146.3533232

[26] [26]

URLhttps://doi.org/10.1002/wics.56

doi: 10.1002/wics.56. URLhttps://doi.org/10.1002/wics.56. Aparajithan Venkateswaran, Anirudh Sankar, Arun G. Chandrasekhar, and Tyler H. McCormick. Robustly estimating heterogeneity in factorial data using rashomon partitions,

work page doi:10.1002/wics.56

[27] [27]

URLhttps: //doi.org/10.48550/arXiv.2404.02141. Chris T. Volinsky, David Madigan, Adrian E. Raftery, and Richard A. Kronmal. Bayesian model averaging in proportional hazard models: Assessing the risk of a stroke.Applied Statistics, 46 (4):433–448,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.02141

[28] [28]

URLhttps://doi.org/10.1111/1467-9876

doi: 10.1111/1467-9876.00082. URLhttps://doi.org/10.1111/1467-9876. 00082. Rui Xin, Chudi Zhong, Zhi Chen, Takuya Takagi, Margo Seltzer, and Cynthia Rudin. Exploring the whole rashomon set of sparse decision trees. InAdvances in Neu- ral Information Processing Systems, volume 35, pages 14071–14084. Curran Asso- ciates, Inc.,

work page doi:10.1111/1467-9876.00082

[29] [29]

Yun Yang, Martin J

URLhttps://proceedings.neurips.cc/paper_files/paper/2022/file/ 5afaa8b4dd18eb1eed055d2d821b58ae-Paper-Conference.pdf. Yun Yang, Martin J. Wainwright, and Michael I. Jordan. On the computational complexity of MCMC-based Bayesian variable selection.The Annals of Statistics, 44(5):2025–2053,

2022

[30] [30]

, Wainwright , Martin J M

doi: 10.1214/15-AOS1417. URLhttps://doi.org/10.1214/15-AOS1417. Arnold Zellner. On assessing prior distributions and Bayesian regression analysis withg-prior distributions. In Prem K. Goel and Arnold Zellner, editors,Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, pages 233–243. Elsevier Science Publishers,

work page doi:10.1214/15-aos1417

[31] [31]

URL https://doi.org/10.1007/BF02888369

doi: 10.1007/BF02888369. URL https://doi.org/10.1007/BF02888369. Yan Zhou, Adam M. Johansen, and John A.D. Aston. Toward automatic model comparison: An adaptive sequential Monte Carlo approach.Journal of Computational and Graphical Statistics, 25(3):701–726,

work page doi:10.1007/bf02888369

[32] [32]

URLhttps://doi.org/10.1080/ 10618600.2015.1060885

doi: 10.1080/10618600.2015.1060885. URLhttps://doi.org/10.1080/ 10618600.2015.1060885. 15 AppendixA1.Proof of the theoretical results We prove the almost-sure consistency of the self-normalized AIS estimator stated in Theorem 1 and its corollaries. The notation is exactly that of Section 2:Qis the joint distribution of a single model–weight pair(M, w)prod...

work page doi:10.1080/10618600.2015.1060885 2015

[33] [33]

incremental change

guarantees that there exists a constant C= CT C0 >0, whereC T = P M∈M ˜p(M| D)andC 0 = P M∈M q0(M;S)are the normalizing constants of the unnormalized posterior˜p(· | D)and the unnormalized initial densityq0(·;S), respectively. For any bounded measurable functionζ:M →R p, EQ w ζ(M) =CE M|D ζ(M) .(A1) Settingζas the unit function yieldsE Q[w] =C. A1.2.Proof...

2019