Grouped Reverse Importance Sampling for the Partition Function

Neri Merhav

arxiv: 2606.26748 · v1 · pith:RAJDO7ZZnew · submitted 2026-06-25 · 💻 cs.IT · cond-mat.stat-mech· math.IT

Grouped Reverse Importance Sampling for the Partition Function

Neri Merhav This is my paper

Pith reviewed 2026-06-26 02:58 UTC · model grok-4.3

classification 💻 cs.IT cond-mat.stat-mechmath.IT

keywords reverse importance samplingpartition function estimationgrouped samplingmean squared errorchi-squared divergenceBoltzmann distributionimportance sampling weights

0 comments

The pith

Weight functions depending only on total group energy are sufficient for optimal grouped reverse importance sampling of partition functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces grouped variants of reverse importance sampling to estimate partition functions from samples drawn from a Boltzmann distribution. It shows that while weights must couple the samples within each group to improve on ordinary RIS, the optimal weights can be restricted to depend solely on the sum of the energies in the group. This restriction simplifies the search for good weights and leads to mean squared error reductions of 20 to 65 percent for group sizes two and three in non-overlapping groupings across examples. Additional sliding-window grouping schemes provide further improvements.

Core claim

The central finding is that without loss of optimality, it is sufficient to seek weight functions that depend only on the total energy sum_i U(x_i) of the group. A simple identity relates the normalized MSE to the chi-squared divergence between the joint-weight distribution and the distribution of the k-fold sum of independent energies. For k=2 and k=3, the MSE associated with non-overlapping groups is reduced by 20--65% across three examples. Product-form weight functions always worsen the MSE.

What carries the argument

Group-energy weight functions, which assign a weight based only on the sum of energies within each group of k samples.

If this is right

Any weight function that improves on ordinary single-sample RIS must couple the components of the group rather than using a product form.
Non-overlapping grouped RIS with energy-sum weights reduces MSE by 20-65% for k=2 and k=3 in the tested cases.
Fixed-weight sliding window grouping improves on non-overlapping grouping.
Variable-weight sliding window grouping improves even further than the fixed-weight version.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The energy-sum restriction may allow closed-form optimization of the weights for specific energy distributions.
This grouping technique could be combined with other variance reduction methods in Monte Carlo sampling.
Extensions to continuous state spaces or non-Boltzmann distributions might follow similar optimality arguments.

Load-bearing premise

The samples are independent draws from the Boltzmann distribution and a tractable joint weight function exists that can be optimized via chi-squared divergence to the k-fold energy sum distribution.

What would settle it

An explicit construction of a weight function that depends on individual energies or other features beyond their sum, yet yields a smaller chi-squared divergence and thus lower MSE than the best group-energy weight, would falsify the sufficiency claim.

Figures

Figures reproduced from arXiv: 2606.26748 by Neri Merhav.

**Figure 1.** Figure 1: Asymptotic MSE constant V nol k (ms) vs. shape exponent α for NOL grouping, k = 1, 2, 3 (Example 1). 1 2 3 4 5 6 7 8 Group size k 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Vk NOL FSW VSW VSW (upper bound) [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗

**Figure 2.** Figure 2: Asymptotic MSE constant Vk vs. group size k: NOL (blue), FSW (red), VSW (green; solid k ≤ 4 exact, dashed k ≥ 5 upper bound). 21 [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

read the original abstract

We introduce and analyze several grouped variants of the method of reverse importance sampling (RIS) for estimating a partition function from samples of the Boltzmann distribution $p(x)=e^{ \betaU(x)}/Z(\beta)$. Ordinary RIS weighs each sample separately. By contrast, our proposed grouped RIS (GRIS) methods are based on assigning the samples into groups (or batches) of size $k\ge 2$ and applying a joint weight function to each group. The focal point of the research is the quest for a tractable weight function that would yield the smallest possible mean squared error (MSE). A simple identity relates the normalized MSE to the chi-squared divergence between the joint-weight distribution and the distribution of the $k$-fold sum of independent energies. Our first theoretical finding is that any weight that improves on ordinary RIS ($k=1$) must couple the group components. In other words, it must not be a product-form function across those components, as product-form weight functions always worsen the MSE. Our second, and more important, finding is that, without loss of optimality, it is sufficient to seek weight functions that depend only on the total energy, $\sum_iU(x_i)$, of the group (group-energy weight functions); for the sliding-window variants, the analogous result is open. This finding simplifies both the theoretical analysis and the application of the method substantially. For $k=2$ and $k=3$, the MSE associated with non-overlapping (NOL) groups is reduced by $20$--$65\%$ across three examples. We then propose two additional variants of GRIS, both based on sliding-window grouping (as opposed to NOL grouping). The first applies a fixed weight sliding window (FSW) across all (cyclic) shifts of the sliding window, and the second allows a variable-weight sliding window (VSW). The FSW scheme improves on the NOL one, and the VSW improves even further, as will be demonstrated numerically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Merhav shows that for non-overlapping groups in reverse importance sampling, weights depending only on total group energy are optimal without loss, with 20-65% MSE cuts in the examples, but the sliding-window optimality claim stays open.

read the letter

The main thing to know is that this paper gives a clean optimality result for grouped reverse importance sampling on partition functions. For non-overlapping groups, any weight that improves on ordinary single-sample RIS must couple the members, and it is enough to search only over functions of the sum of the energies in the group. The argument rests on the chi-squared divergence identity for the normalized MSE plus permutation symmetry of the i.i.d. samples. Product weights are shown to always make things worse.

The work does what it sets out to do: it reduces the search for good weights to a one-dimensional problem and backs the claim with numerical runs on three examples for k=2 and k=3. Those runs report 20-65% MSE drops, which is a usable gain for an estimation task that matters in statistical mechanics. The two sliding-window variants are presented as further practical improvements over the non-overlapping version.

The soft spots are limited and proportionate. The analogous optimality result for the sliding-window schemes is left open, so those versions rest more on the reported numerics than on the same theoretical guarantee. The paper is an incremental sharpening of an existing technique rather than a new general framework, and the free parameter remains just the group size k. Because the review here starts from the abstract and the stress-test note, the tightness of the full derivations and the exact numerical protocol are not visible, but nothing in the stated claims shows internal inconsistency or hidden fitting.

This is for people already working with reverse importance sampling or variance reduction for partition functions. A reader who needs lower MSE on that specific task will find the grouping analysis and the concrete numbers useful. It deserves peer review because the central claims are specific, the symmetry argument is checkable, and the method has direct applicability even if the scope stays narrow.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces grouped reverse importance sampling (GRIS) variants for estimating the partition function from i.i.d. Boltzmann samples. It establishes an identity relating normalized MSE to chi-squared divergence between the joint weight distribution and the k-fold energy sum distribution, proves that product-form weights worsen MSE, shows that weights depending only on the group energy sum suffice without loss of optimality for non-overlapping (NOL) groups, and reports 20-65% MSE reductions for k=2,3 in NOL across three examples, with further gains from fixed-weight (FSW) and variable-weight (VSW) sliding-window schemes.

Significance. If the central identity and sufficiency result hold, the work supplies a principled, symmetry-based simplification for optimizing RIS estimators via grouping, with the chi-squared objective providing a concrete minimization target and the NOL group-energy restriction reducing the search space. The reported numerical improvements and the extension to sliding-window variants indicate practical value for variance reduction in partition function estimation.

minor comments (2)

[Numerical experiments section] The abstract and numerical results refer to 'three examples' without naming the underlying models (e.g., specific Ising or other spin systems) or providing the exact parameter settings used for the reported 20-65% reductions; this should be added for reproducibility.
[Numerical experiments section] Error bars, standard deviations, or number of independent trials are not mentioned for the MSE reduction percentages; including these would strengthen the empirical claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. We are pleased that the central identity, the necessity of coupling in weights, and the sufficiency of group-energy dependence for NOL groups are viewed as providing a principled simplification for RIS optimization.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives its key claims from an identity linking normalized MSE to chi-squared divergence between the joint-weight distribution and the k-fold energy sum, combined with i.i.d. sampling symmetry under permutation of group components. The result that weight functions of the total energy sum suffice without loss of optimality is a direct mathematical consequence of this setup and does not reduce to any fitted parameter, self-referential definition, or load-bearing self-citation. Numerical MSE improvements are reported as empirical outcomes on specific examples rather than predictions forced by construction. The overall derivation is self-contained against the stated assumptions of independent Boltzmann samples and existence of a tractable joint weight function.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumption that independent samples from the target Boltzmann distribution are available and that a joint weight function can be chosen to minimize the relevant chi-squared divergence; no new physical entities are introduced.

free parameters (1)

group size k
User-selected integer k >= 2 that controls batching and is shown to affect MSE reduction in the reported examples.

axioms (1)

domain assumption Samples are i.i.d. draws from p(x) = exp(β U(x)) / Z(β)
Explicitly stated as the source distribution for the reverse importance sampling procedure.

pith-pipeline@v0.9.1-grok · 5896 in / 1270 out tokens · 59451 ms · 2026-06-26T02:58:57.317084+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 2 canonical work pages · 1 internal anchor

[1]

M. A. Newton and A. E. Raftery, Approximate Bayesian inference with the weighted likelihood bootstrap,J. R. Stat. Soc. Ser. B, 56(1):3–48, 1994

1994
[2]

Simulating ratios of normalizing constants via a simple identity,

X.-L. Meng and W. H. Wong, “Simulating ratios of normalizing constants via a simple identity,”Statistica Sinica, 6(4):831–860, 1996

1996
[3]

Estimating the partition function by discriminance sampling,

Q. Liu, J. Peng, A. Ihler, and J. Fisher III, “Estimating the partition function by discriminance sampling,” inProc. UAI, pp. 514–522, 2015

2015
[4]

R. M. Neal, Annealed importance sampling,Stat. Comput., 11(2):125–139, 2001

2001
[5]

Importance Weighted Autoencoders

Y. Burda, R. Grosse, and R. Salakhutdinov, inProc. 4th Intl. Conf. on Learning Representations (ICLR 2016), San Juan, Puerto Rico, May 2016 (arXiv:1509.00519)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[6]

Gelman and X.-L

A. Gelman and X.-L. Meng, Simulating normalizing constants: from importance sampling to bridge sampling to path sampling,Statist. Sci., 13(2):163–185, 1998

1998
[7]

Del Moral, A

P. Del Moral, A. Doucet, and A. Jasra, Sequential Monte Carlo samplers,J. R. Stat. Soc. Ser. B, 68(3):411–436, 2006

2006
[8]

A. B. Owen and Y. Zhou, Safe and effective importance sampling,J. Amer. Statist. Assoc., 95(449):135–143, 2000

2000
[9]

C. P. Robert and G. Casella,Monte Carlo Statistical Methods, 2nd ed., Springer, New York, 2004

2004
[10]

Branchini and V

N. Branchini and V. Elvira, Generalizing self-normalized importance sampling with couplings, arXiv:2406.19974, 2024

work page arXiv 2024
[11]

Veach and L

E. Veach and L. J. Guibas, Optimally combining sampling techniques for Monte Carlo rendering, inProc. SIGGRAPH, pp. 419–428, 1995

1995
[12]

Elvira, L

V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, Generalized multiple importance sampling,Stat. Sci., 34(1):129–155, 2019

2019
[13]

A. W. van der Vaart,Asymptotic Statistics, Cambridge University Press, Cambridge, 1998

1998
[14]

J. M. Hammersley and K. W. Morton, A new Monte Carlo technique: antithetic variates, Math. Proc. Camb. Phil. Soc., 52(3):449–475, 1956

1956
[15]

Huang,Statistical Mechanics, 2nd ed., Wiley, New York, 1987

K. Huang,Statistical Mechanics, 2nd ed., Wiley, New York, 1987

1987
[16]

R. K. Pathria and P. D. Beale,Statistical Mechanics, 3rd ed., Elsevier, Oxford, 2011. 35

2011

[1] [1]

M. A. Newton and A. E. Raftery, Approximate Bayesian inference with the weighted likelihood bootstrap,J. R. Stat. Soc. Ser. B, 56(1):3–48, 1994

1994

[2] [2]

Simulating ratios of normalizing constants via a simple identity,

X.-L. Meng and W. H. Wong, “Simulating ratios of normalizing constants via a simple identity,”Statistica Sinica, 6(4):831–860, 1996

1996

[3] [3]

Estimating the partition function by discriminance sampling,

Q. Liu, J. Peng, A. Ihler, and J. Fisher III, “Estimating the partition function by discriminance sampling,” inProc. UAI, pp. 514–522, 2015

2015

[4] [4]

R. M. Neal, Annealed importance sampling,Stat. Comput., 11(2):125–139, 2001

2001

[5] [5]

Importance Weighted Autoencoders

Y. Burda, R. Grosse, and R. Salakhutdinov, inProc. 4th Intl. Conf. on Learning Representations (ICLR 2016), San Juan, Puerto Rico, May 2016 (arXiv:1509.00519)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[6] [6]

Gelman and X.-L

A. Gelman and X.-L. Meng, Simulating normalizing constants: from importance sampling to bridge sampling to path sampling,Statist. Sci., 13(2):163–185, 1998

1998

[7] [7]

Del Moral, A

P. Del Moral, A. Doucet, and A. Jasra, Sequential Monte Carlo samplers,J. R. Stat. Soc. Ser. B, 68(3):411–436, 2006

2006

[8] [8]

A. B. Owen and Y. Zhou, Safe and effective importance sampling,J. Amer. Statist. Assoc., 95(449):135–143, 2000

2000

[9] [9]

C. P. Robert and G. Casella,Monte Carlo Statistical Methods, 2nd ed., Springer, New York, 2004

2004

[10] [10]

Branchini and V

N. Branchini and V. Elvira, Generalizing self-normalized importance sampling with couplings, arXiv:2406.19974, 2024

work page arXiv 2024

[11] [11]

Veach and L

E. Veach and L. J. Guibas, Optimally combining sampling techniques for Monte Carlo rendering, inProc. SIGGRAPH, pp. 419–428, 1995

1995

[12] [12]

Elvira, L

V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, Generalized multiple importance sampling,Stat. Sci., 34(1):129–155, 2019

2019

[13] [13]

A. W. van der Vaart,Asymptotic Statistics, Cambridge University Press, Cambridge, 1998

1998

[14] [14]

J. M. Hammersley and K. W. Morton, A new Monte Carlo technique: antithetic variates, Math. Proc. Camb. Phil. Soc., 52(3):449–475, 1956

1956

[15] [15]

Huang,Statistical Mechanics, 2nd ed., Wiley, New York, 1987

K. Huang,Statistical Mechanics, 2nd ed., Wiley, New York, 1987

1987

[16] [16]

R. K. Pathria and P. D. Beale,Statistical Mechanics, 3rd ed., Elsevier, Oxford, 2011. 35

2011