Unfolding with a Wasserstein Loss

Benjamin Faktor; Benjamin Nachman; Katy Craig

arxiv: 2603.20903 · v2 · submitted 2026-03-21 · 🧮 math.OC · hep-ph· stat.ML

Unfolding with a Wasserstein Loss

Katy Craig , Benjamin Faktor , Benjamin Nachman This is my paper

Pith reviewed 2026-05-15 06:37 UTC · model grok-4.3

classification 🧮 math.OC hep-phstat.ML

keywords Wasserstein distancedata unfoldingRichardson-LucySinkhorn algorithmoptimal transportdeconvolutionnoise modelparticle physics

0 comments

The pith

Wasserstein distance ensures existence and uniqueness of solutions for unfolding noisy data under transport map noise models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes using the Wasserstein distance instead of Kullback-Leibler divergence to formulate the unfolding problem, which removes noise from measurements without requiring overlapping supports enforced by binning. It proves sharp conditions for the existence and uniqueness of the optimal unfolded distribution when the noise is modeled by a transport map. A generalized Sinkhorn algorithm is developed that converges to the solution using only empirical samples of the data and noise, scaling with sample size. Numerical tests on one- and two-dimensional problems show improved robustness over classical Richardson-Lucy deconvolution when binning artifacts are present.

Core claim

For noise processes representable as transport maps, the optimization problem of minimizing the Wasserstein distance between measured data and the image of the unfolded data under the noise map admits unique optimizers under sharp conditions on the measures. A generalized Sinkhorn algorithm computes approximate solutions provably and requires only samples rather than the explicit noise kernel.

What carries the argument

The Wasserstein loss for the unfolding objective, minimized via a generalized Sinkhorn algorithm that iterates between marginal projections on empirical samples.

If this is right

The algorithm scales computationally with the number of samples instead of the ambient dimension.
It avoids numerical errors from binning that are common in traditional methods.
It provides necessary conditions for uniqueness in transport-map noise models, resolving open questions from prior work.
Performance remains accurate in one- and two-dimensional test cases inspired by particle physics jet mass unfolding.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to other inverse problems where noise is a pushforward map.
Empirical convergence could be tested on real experimental datasets from high-energy physics.
Extensions to unbalanced optimal transport might handle cases with mass creation or destruction in measurements.

Load-bearing premise

The noise process must be representable as a transport map between probability measures, allowing the loss to be computed from samples without the full kernel.

What would settle it

Finding a transport map noise model where the Wasserstein unfolding objective has at least two distinct minimizers, or observing divergence of the generalized Sinkhorn iterates on a known test problem.

read the original abstract

Data unfolding -- the removal of noise or artifacts from measurements -- is a fundamental task across the experimental sciences. Of particular interest are applications in physics, where the dominant approach is Richardson-Lucy (RL) deconvolution. The classical RL approach aims to find denoised data that, once passed through the noise model, is as close as possible to the measured data in terms of Kullback-Leibler (KL) divergence. This requires that the support of the measured data overlaps with the output of the noise model, a hypothesis typically enforced by binning, which introduces numerical error. As a counterpoint, the present work studies an alternative formulation using a Wasserstein loss. We establish sharp conditions for existence and uniqueness of optimizers, answering open questions of Li, et al., regarding necessary conditions for uniqueness in the case of transport map noise models. We then develop a provably convergent generalized Sinkhorn algorithm to compute approximate optimizers. Our algorithm requires only empirical observations of the noise model and measured data and scales with the size of the data, rather than the ambient dimension. Numerical experiments on one- and two-dimensional problems inspired by jet mass unfolding in particle physics demonstrate that the optimal transport approach offers robust, accurate performance compared to classical RL deconvolution, particularly when binning artifacts are significant.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The Wasserstein unfolding paper adds useful uniqueness results and a sample-based algorithm, though the empirical stability needs more work.

read the letter

The paper replaces the KL loss in Richardson-Lucy unfolding with a Wasserstein distance. This lets it handle cases where supports don't overlap without forcing binning, which is a practical plus for collider data where binning can smear things out. The main new pieces are sharp existence and uniqueness conditions for the case where the noise is given by a transport map. That directly answers some open questions from Li et al. on what is needed for uniqueness. They also give a generalized Sinkhorn scheme that is proved to converge and only needs samples of the noise model and the measured data. It scales with the number of samples rather than the ambient dimension, which is useful for high-dimensional problems. The numerical experiments on one- and two-dimensional problems inspired by jet mass unfolding in particle physics show that the optimal transport approach is more robust and accurate than classical RL, especially when binning artifacts are significant. That part looks solid and relevant. The soft spot is the move from population measures to finite samples. The uniqueness results are for the continuous problem under the transport-map noise model. The algorithm is defined on empirical measures, but there are no explicit approximation bounds or stability results showing that the discrete problem inherits uniqueness or that the solution stays close when the empirical transport map approximates the population one. In regimes with limited samples or when the map is only approximately deterministic, multiple near-optimizers could appear. The assumption that the noise process is exactly representable as a transport map is also worth checking against real detector data. This work is for researchers in optimal transport who want to apply it to unfolding tasks, and for experimental physicists looking for alternatives to Richardson-Lucy. A reader who cares about avoiding binning errors and has access to samples will find the algorithm and the experiments useful. I would send it to peer review. The theoretical claims are grounded in standard OT, the algorithm is practical, and the experiments are on relevant problems, even though the empirical stability could use more attention.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an optimal-transport formulation for data unfolding that replaces the KL divergence of classical Richardson-Lucy deconvolution with a Wasserstein loss. Under a transport-map noise model it derives sharp existence and uniqueness conditions for the optimizers (resolving open questions of Li et al.), constructs a generalized Sinkhorn algorithm that is proved convergent on empirical samples, and reports numerical experiments on 1-D and 2-D jet-mass unfolding problems showing improved robustness relative to binned RL.

Significance. If the population-level uniqueness carries over to the empirical setting and the algorithm converges to a meaningful approximation of the continuous optimizer, the work supplies a theoretically grounded, binning-free alternative for unfolding that scales with sample size rather than ambient dimension. The resolution of the uniqueness question and the provision of a provably convergent empirical algorithm would be useful contributions to the optimal-transport treatment of inverse problems in experimental physics.

major comments (2)

[§3] §3 (existence/uniqueness theorems): the sharp conditions are stated for the population measures under the exact transport-map noise model. The generalized Sinkhorn algorithm and all numerical results are defined on finite empirical samples of both the noise model and the observed data; no quantitative stability or approximation bound is supplied showing that uniqueness (or even existence) persists or that the discrete optimizer converges to the population one in Wasserstein distance as the sample size grows. This gap directly affects the claim that the algorithm computes approximate optimizers of the continuous problem.
[§4] §4 (convergence analysis of the generalized Sinkhorn scheme): convergence is asserted for the empirical problem, yet the proof sketch does not quantify the dependence of the iteration count or the attained accuracy on the sample size or on the Wasserstein distance between the empirical and population measures. Without such rates the numerical experiments cannot be used to support the practical superiority claim when sample sizes are modest.

minor comments (2)

[Abstract / §1] The citation to Li et al. in the abstract and introduction should include the precise reference (arXiv number or journal) so readers can locate the open questions being answered.
[§5] Figure captions in the numerical section should explicitly list the sample sizes used for both the noise model and the data, as well as the binning parameters employed in the RL baseline, to permit direct reproduction.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive feedback on our manuscript. The comments correctly identify gaps between the population-level theory and the empirical algorithm. We address each major comment below and indicate planned revisions.

read point-by-point responses

Referee: [§3] §3 (existence/uniqueness theorems): the sharp conditions are stated for the population measures under the exact transport-map noise model. The generalized Sinkhorn algorithm and all numerical results are defined on finite empirical samples of both the noise model and the observed data; no quantitative stability or approximation bound is supplied showing that uniqueness (or even existence) persists or that the discrete optimizer converges to the population one in Wasserstein distance as the sample size grows. This gap directly affects the claim that the algorithm computes approximate optimizers of the continuous problem.

Authors: We agree that the existence and uniqueness theorems are formulated at the population level for the exact transport-map noise model, while the generalized Sinkhorn algorithm and experiments operate on finite empirical samples. The manuscript does not supply quantitative stability bounds or convergence rates in Wasserstein distance between the empirical and population optimizers. This is a genuine limitation of the current analysis. In the revised version we will add a dedicated paragraph in Section 3 acknowledging the gap and listing it as an important direction for future work. We maintain that the empirical formulation is the practically relevant one, as the algorithm is proved to converge to the unique empirical optimizer and the numerical results on jet-mass data illustrate its robustness relative to binned Richardson-Lucy. revision: partial
Referee: [§4] §4 (convergence analysis of the generalized Sinkhorn scheme): convergence is asserted for the empirical problem, yet the proof sketch does not quantify the dependence of the iteration count or the attained accuracy on the sample size or on the Wasserstein distance between the empirical and population measures. Without such rates the numerical experiments cannot be used to support the practical superiority claim when sample sizes are modest.

Authors: The convergence proof establishes that the generalized Sinkhorn iterates converge to the unique minimizer of the finite-sample problem, but it does not provide explicit rates that depend on sample size or on the Wasserstein distance to the population measures. We acknowledge that this prevents a fully rigorous transfer of the theoretical guarantees to the continuous problem for modest sample sizes. In the revision we will expand the discussion in Section 4 to include this caveat and will add a short numerical study showing how the attained accuracy behaves with increasing sample size on the 1-D jet-mass example. These additions will clarify the scope of the current claims without altering the core contribution. revision: partial

standing simulated objections not resolved

Quantitative stability or approximation bounds relating the empirical optimizer to the population optimizer under the Wasserstein unfolding loss.

Circularity Check

0 steps flagged

No circularity: uniqueness and convergence results rest on standard OT theory and sample-based iteration without reduction to internal fits

full rationale

The paper derives existence/uniqueness conditions for the Wasserstein-unfolding problem directly from optimal transport theory applied to transport-map noise models, citing Li et al. (external) for the open question it resolves. The generalized Sinkhorn algorithm is proved convergent on the empirical measures using standard entropic OT arguments that do not rely on any parameter fitted inside the paper. No equation equates a reported performance metric or uniqueness statement to a quantity defined by the same paper's own optimization; the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard results from optimal transport theory; no free parameters are introduced to fit the central claims, and no new entities are postulated.

axioms (1)

domain assumption The noise model admits a representation as a transport map between probability measures
Invoked to obtain the sharp uniqueness conditions for optimizers.

pith-pipeline@v0.9.0 · 5527 in / 1303 out tokens · 40114 ms · 2026-05-15T06:37:12.614839+00:00 · methodology

Unfolding with a Wasserstein Loss

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)