pith. sign in

arxiv: 2606.26748 · v1 · pith:RAJDO7ZZnew · submitted 2026-06-25 · 💻 cs.IT · cond-mat.stat-mech· math.IT

Grouped Reverse Importance Sampling for the Partition Function

Pith reviewed 2026-06-26 02:58 UTC · model grok-4.3

classification 💻 cs.IT cond-mat.stat-mechmath.IT
keywords reverse importance samplingpartition function estimationgrouped samplingmean squared errorchi-squared divergenceBoltzmann distributionimportance sampling weights
0
0 comments X

The pith

Weight functions depending only on total group energy are sufficient for optimal grouped reverse importance sampling of partition functions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces grouped variants of reverse importance sampling to estimate partition functions from samples drawn from a Boltzmann distribution. It shows that while weights must couple the samples within each group to improve on ordinary RIS, the optimal weights can be restricted to depend solely on the sum of the energies in the group. This restriction simplifies the search for good weights and leads to mean squared error reductions of 20 to 65 percent for group sizes two and three in non-overlapping groupings across examples. Additional sliding-window grouping schemes provide further improvements.

Core claim

The central finding is that without loss of optimality, it is sufficient to seek weight functions that depend only on the total energy sum_i U(x_i) of the group. A simple identity relates the normalized MSE to the chi-squared divergence between the joint-weight distribution and the distribution of the k-fold sum of independent energies. For k=2 and k=3, the MSE associated with non-overlapping groups is reduced by 20--65% across three examples. Product-form weight functions always worsen the MSE.

What carries the argument

Group-energy weight functions, which assign a weight based only on the sum of energies within each group of k samples.

If this is right

  • Any weight function that improves on ordinary single-sample RIS must couple the components of the group rather than using a product form.
  • Non-overlapping grouped RIS with energy-sum weights reduces MSE by 20-65% for k=2 and k=3 in the tested cases.
  • Fixed-weight sliding window grouping improves on non-overlapping grouping.
  • Variable-weight sliding window grouping improves even further than the fixed-weight version.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The energy-sum restriction may allow closed-form optimization of the weights for specific energy distributions.
  • This grouping technique could be combined with other variance reduction methods in Monte Carlo sampling.
  • Extensions to continuous state spaces or non-Boltzmann distributions might follow similar optimality arguments.

Load-bearing premise

The samples are independent draws from the Boltzmann distribution and a tractable joint weight function exists that can be optimized via chi-squared divergence to the k-fold energy sum distribution.

What would settle it

An explicit construction of a weight function that depends on individual energies or other features beyond their sum, yet yields a smaller chi-squared divergence and thus lower MSE than the best group-energy weight, would falsify the sufficiency claim.

Figures

Figures reproduced from arXiv: 2606.26748 by Neri Merhav.

Figure 1
Figure 1. Figure 1: Asymptotic MSE constant V nol k (ms) vs. shape exponent α for NOL grouping, k = 1, 2, 3 (Example 1). 1 2 3 4 5 6 7 8 Group size k 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 Vk NOL FSW VSW VSW (upper bound) [PITH_FULL_IMAGE:figures/full_fig_p021_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Asymptotic MSE constant Vk vs. group size k: NOL (blue), FSW (red), VSW (green; solid k ≤ 4 exact, dashed k ≥ 5 upper bound). 21 [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗
read the original abstract

We introduce and analyze several grouped variants of the method of reverse importance sampling (RIS) for estimating a partition function from samples of the Boltzmann distribution $p(x)=e^{ \betaU(x)}/Z(\beta)$. Ordinary RIS weighs each sample separately. By contrast, our proposed grouped RIS (GRIS) methods are based on assigning the samples into groups (or batches) of size $k\ge 2$ and applying a joint weight function to each group. The focal point of the research is the quest for a tractable weight function that would yield the smallest possible mean squared error (MSE). A simple identity relates the normalized MSE to the chi-squared divergence between the joint-weight distribution and the distribution of the $k$-fold sum of independent energies. Our first theoretical finding is that any weight that improves on ordinary RIS ($k=1$) must couple the group components. In other words, it must not be a product-form function across those components, as product-form weight functions always worsen the MSE. Our second, and more important, finding is that, without loss of optimality, it is sufficient to seek weight functions that depend only on the total energy, $\sum_iU(x_i)$, of the group (group-energy weight functions); for the sliding-window variants, the analogous result is open. This finding simplifies both the theoretical analysis and the application of the method substantially. For $k=2$ and $k=3$, the MSE associated with non-overlapping (NOL) groups is reduced by $20$--$65\%$ across three examples. We then propose two additional variants of GRIS, both based on sliding-window grouping (as opposed to NOL grouping). The first applies a fixed weight sliding window (FSW) across all (cyclic) shifts of the sliding window, and the second allows a variable-weight sliding window (VSW). The FSW scheme improves on the NOL one, and the VSW improves even further, as will be demonstrated numerically.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces grouped reverse importance sampling (GRIS) variants for estimating the partition function from i.i.d. Boltzmann samples. It establishes an identity relating normalized MSE to chi-squared divergence between the joint weight distribution and the k-fold energy sum distribution, proves that product-form weights worsen MSE, shows that weights depending only on the group energy sum suffice without loss of optimality for non-overlapping (NOL) groups, and reports 20-65% MSE reductions for k=2,3 in NOL across three examples, with further gains from fixed-weight (FSW) and variable-weight (VSW) sliding-window schemes.

Significance. If the central identity and sufficiency result hold, the work supplies a principled, symmetry-based simplification for optimizing RIS estimators via grouping, with the chi-squared objective providing a concrete minimization target and the NOL group-energy restriction reducing the search space. The reported numerical improvements and the extension to sliding-window variants indicate practical value for variance reduction in partition function estimation.

minor comments (2)
  1. [Numerical experiments section] The abstract and numerical results refer to 'three examples' without naming the underlying models (e.g., specific Ising or other spin systems) or providing the exact parameter settings used for the reported 20-65% reductions; this should be added for reproducibility.
  2. [Numerical experiments section] Error bars, standard deviations, or number of independent trials are not mentioned for the MSE reduction percentages; including these would strengthen the empirical claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. We are pleased that the central identity, the necessity of coupling in weights, and the sufficiency of group-energy dependence for NOL groups are viewed as providing a principled simplification for RIS optimization.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper derives its key claims from an identity linking normalized MSE to chi-squared divergence between the joint-weight distribution and the k-fold energy sum, combined with i.i.d. sampling symmetry under permutation of group components. The result that weight functions of the total energy sum suffice without loss of optimality is a direct mathematical consequence of this setup and does not reduce to any fitted parameter, self-referential definition, or load-bearing self-citation. Numerical MSE improvements are reported as empirical outcomes on specific examples rather than predictions forced by construction. The overall derivation is self-contained against the stated assumptions of independent Boltzmann samples and existence of a tractable joint weight function.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the standard assumption that independent samples from the target Boltzmann distribution are available and that a joint weight function can be chosen to minimize the relevant chi-squared divergence; no new physical entities are introduced.

free parameters (1)
  • group size k
    User-selected integer k >= 2 that controls batching and is shown to affect MSE reduction in the reported examples.
axioms (1)
  • domain assumption Samples are i.i.d. draws from p(x) = exp(β U(x)) / Z(β)
    Explicitly stated as the source distribution for the reverse importance sampling procedure.

pith-pipeline@v0.9.1-grok · 5896 in / 1270 out tokens · 59451 ms · 2026-06-26T02:58:57.317084+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    M. A. Newton and A. E. Raftery, Approximate Bayesian inference with the weighted likelihood bootstrap,J. R. Stat. Soc. Ser. B, 56(1):3–48, 1994

  2. [2]

    Simulating ratios of normalizing constants via a simple identity,

    X.-L. Meng and W. H. Wong, “Simulating ratios of normalizing constants via a simple identity,”Statistica Sinica, 6(4):831–860, 1996

  3. [3]

    Estimating the partition function by discriminance sampling,

    Q. Liu, J. Peng, A. Ihler, and J. Fisher III, “Estimating the partition function by discriminance sampling,” inProc. UAI, pp. 514–522, 2015

  4. [4]

    R. M. Neal, Annealed importance sampling,Stat. Comput., 11(2):125–139, 2001

  5. [5]

    Importance Weighted Autoencoders

    Y. Burda, R. Grosse, and R. Salakhutdinov, inProc. 4th Intl. Conf. on Learning Representations (ICLR 2016), San Juan, Puerto Rico, May 2016 (arXiv:1509.00519)

  6. [6]

    Gelman and X.-L

    A. Gelman and X.-L. Meng, Simulating normalizing constants: from importance sampling to bridge sampling to path sampling,Statist. Sci., 13(2):163–185, 1998

  7. [7]

    Del Moral, A

    P. Del Moral, A. Doucet, and A. Jasra, Sequential Monte Carlo samplers,J. R. Stat. Soc. Ser. B, 68(3):411–436, 2006

  8. [8]

    A. B. Owen and Y. Zhou, Safe and effective importance sampling,J. Amer. Statist. Assoc., 95(449):135–143, 2000

  9. [9]

    C. P. Robert and G. Casella,Monte Carlo Statistical Methods, 2nd ed., Springer, New York, 2004

  10. [10]

    Branchini and V

    N. Branchini and V. Elvira, Generalizing self-normalized importance sampling with couplings, arXiv:2406.19974, 2024

  11. [11]

    Veach and L

    E. Veach and L. J. Guibas, Optimally combining sampling techniques for Monte Carlo rendering, inProc. SIGGRAPH, pp. 419–428, 1995

  12. [12]

    Elvira, L

    V. Elvira, L. Martino, D. Luengo, and M. F. Bugallo, Generalized multiple importance sampling,Stat. Sci., 34(1):129–155, 2019

  13. [13]

    A. W. van der Vaart,Asymptotic Statistics, Cambridge University Press, Cambridge, 1998

  14. [14]

    J. M. Hammersley and K. W. Morton, A new Monte Carlo technique: antithetic variates, Math. Proc. Camb. Phil. Soc., 52(3):449–475, 1956

  15. [15]

    Huang,Statistical Mechanics, 2nd ed., Wiley, New York, 1987

    K. Huang,Statistical Mechanics, 2nd ed., Wiley, New York, 1987

  16. [16]

    R. K. Pathria and P. D. Beale,Statistical Mechanics, 3rd ed., Elsevier, Oxford, 2011. 35