arxiv: 2603.15867 · v2 · submitted 2026-03-16 · 💻 cs.LG

Recognition: no theorem link

Evaluating Black-Box Vulnerabilities with Wasserstein-Constrained Data Perturbations

Adriana Laurindo Monteiro , Jean-Michel Loubes

Authors on Pith no claims yet

Pith reviewed 2026-05-15 09:52 UTC · model grok-4.3

classification 💻 cs.LG

keywords black-box explainabilityoptimal transportWasserstein distancedata perturbationsdistributionally robust optimizationmodel robustnessvulnerability diagnosis

0 comments

The pith

A framework uses Wasserstein-constrained perturbations to diagnose vulnerabilities in black-box machine learning models while preserving feature statistics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a global explainability method that generates data perturbations under optimal transport constraints to test how models respond to shifts in key statistics such as brightness or age distribution. By combining distributionally robust optimization with Wasserstein distance bounds, the approach creates realistic changes that keep semantic structure intact across tabular and image data. This yields a model-agnostic diagnostic tool with theoretical backing that goes beyond accuracy metrics to reveal robustness issues. If the method holds, practitioners gain interpretable ways to audit models against controlled real-world variations without manual data crafting. The work validates the idea on actual datasets to show its practical diagnostic value.

Core claim

By solving a distributionally robust optimization problem with Wasserstein constraints on selected feature statistics, the method produces perturbations that remain distributionally close yet expose model weaknesses, delivering a unified diagnostic bench for tabular and image domains that includes provable guarantees on the perturbation process.

What carries the argument

The Wasserstein-constrained perturbation generator, which applies optimal transport to enforce limits on feature-level statistics and produces semantically preserved data shifts for vulnerability testing.

If this is right

Models can be probed for sensitivity to specific shifts like lighting changes in images or demographic adjustments in tabular records.
The diagnostics complement accuracy, fairness, and standard robustness checks with interpretable outputs.
Theoretical guarantees bound the distance of perturbations, ensuring controlled evaluation across data types.
The same machinery applies uniformly to both tabular datasets and image inputs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could integrate with automated retraining loops to target fixes for detected weak points.
Testing with alternative transport costs might uncover different classes of real-world drift vulnerabilities.
Applying the framework to sequential data or reinforcement learning policies would extend its reach to dynamic settings.

Load-bearing premise

Limiting perturbations to bounds on a chosen set of feature statistics via Wasserstein distance suffices to produce changes that stay realistic and representative for diagnosing model vulnerabilities.

What would settle it

A set of generated perturbed samples judged by domain experts as semantically altered or unrealistic, or failure of the method to flag known failure modes in standard benchmark models under the same constraints.

read the original abstract

The growing use of Machine Learning (ML) tools comes with critical challenges, such as limited model explainability. We propose a global explainability framework that leverages Optimal Transport and Distributionally Robust Optimization to analyze how ML algorithms respond to constrained data perturbations. Our approach enforces constraints on feature-level statistics (e.g., brightness, age distribution), generating realistic perturbations that preserve semantic structure. We provide a model-agnostic diagnostic bench that applies to both tabular and image domains with solid theoretical guarantees. We validate the approach on real-world datasets providing interpretable robustness diagnostics that complement standard evaluation and fairness auditing tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Wasserstein-constrained perturbations give a model-agnostic robustness diagnostic, but the claim that marginal feature controls preserve semantic structure needs checking, especially for images.

read the letter

The main thing here is a framework that perturbs data under Wasserstein constraints on selected statistics like brightness or age, then uses DRO to measure model performance under those shifts. This gives a model-agnostic way to diagnose vulnerabilities and add to explainability. What stands out is the attempt to make perturbations realistic by preserving some structure through optimal transport. It claims to work on both tabular data and images, which is a plus for broader applicability. The validation on real-world datasets suggests they have some empirical backing for the diagnostics. The approach builds on established ideas in OT and DRO, but the integration for global vulnerability assessment looks like a fresh angle. It could be useful for auditing pipelines where you want to see sensitivity to specific feature changes without retraining. On the downside, the core claim that these perturbations stay semantically meaningful rests on controlling only marginal feature stats. For images, Wasserstein distance on low-order moments often allows samples that look wrong or change identity, even if the stats match. The stress test highlights this, and the abstract does not spell out extra mechanisms to fix it. Theoretical guarantees are mentioned for the DRO part, but whether they extend to semantic fidelity is unclear without the full math. The experiments are described as providing interpretable results, but without details on how they chose the constraints or measured realism, it's hard to judge if the method holds up. This paper is aimed at people working on ML evaluation, robustness testing, and fairness auditing. A reader interested in new diagnostic tools for black-box models would get something from it, especially if they already know OT and DRO. It deserves a serious referee because the idea has potential and the claims are specific enough to check. The math and experiments need close look, but it's not obviously flawed on the surface.

Referee Report

2 major / 2 minor

Summary. The paper claims to introduce a global explainability framework for black-box ML models that leverages Optimal Transport and Distributionally Robust Optimization to generate data perturbations constrained via the Wasserstein distance on selected feature-level statistics (e.g., brightness for images or age distributions for tabular data). It asserts that these perturbations are realistic and preserve semantic structure, yielding a model-agnostic diagnostic benchmark applicable to tabular and image domains, supported by theoretical guarantees, and validated on real-world datasets to produce interpretable robustness diagnostics that complement standard evaluation and fairness auditing.

Significance. If the claims on semantic preservation and theoretical grounding hold, the work would provide a useful model-agnostic tool for assessing ML vulnerabilities under constrained distributional shifts, extending beyond standard adversarial or random perturbations. The integration of OT and DRO for global analysis could strengthen robustness and fairness auditing practices across domains.

major comments (2)

[Abstract] Abstract: The central claim that Wasserstein constraints on marginal feature statistics (brightness, age, etc.) suffice to generate perturbations that 'preserve semantic structure' is load-bearing for the diagnostic bench's validity in image domains. The skeptic note correctly identifies that the Wasserstein ball on low-order moments is known to admit semantically invalid points that alter higher-order structure or object identity; the manuscript does not provide a concrete argument or bound showing how the selected marginal constraints close this gap, undermining the model-agnostic robustness diagnostics.
[Theoretical guarantees section] Theoretical guarantees section: The assertion of 'solid theoretical guarantees' for distributional robustness does not automatically address semantic fidelity. A specific derivation or proposition is needed showing that the Wasserstein ball radius and chosen statistics prevent drift in higher-order moments or semantic content; without it, the guarantees remain limited to marginal control and do not fully support the vulnerability-diagnosis use case.

minor comments (2)

[Abstract] The abstract would benefit from naming the specific real-world datasets used for validation to allow immediate assessment of domain coverage.
Notation for the Wasserstein distance, constraint sets, and DRO formulation should be introduced with explicit definitions and a running example early in the paper for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the scope and limitations of our claims regarding semantic preservation and theoretical guarantees. We address each major point below and commit to revisions that strengthen the manuscript without overstating our current results.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that Wasserstein constraints on marginal feature statistics (brightness, age, etc.) suffice to generate perturbations that 'preserve semantic structure' is load-bearing for the diagnostic bench's validity in image domains. The skeptic note correctly identifies that the Wasserstein ball on low-order moments is known to admit semantically invalid points that alter higher-order structure or object identity; the manuscript does not provide a concrete argument or bound showing how the selected marginal constraints close this gap, undermining the model-agnostic robustness diagnostics.

Authors: We agree that the Wasserstein ball on marginal statistics alone does not automatically preclude semantically invalid points, as higher-order structure can drift. Our defense rests on the domain-specific selection of statistics (e.g., brightness histograms for images) that are chosen precisely because they correlate strongly with semantic content in the target domains, combined with the optimal transport plan that minimizes displacement cost. However, we acknowledge the manuscript lacks an explicit bound or proposition linking these choices to semantic fidelity. In the revision we will add a new paragraph in the abstract and a supporting discussion in Section 3 that cites relevant OT literature on moment control and includes a simple sufficient condition under which the chosen marginals limit identity-altering drift, together with additional perceptual similarity metrics in the experiments. revision: yes
Referee: [Theoretical guarantees section] Theoretical guarantees section: The assertion of 'solid theoretical guarantees' for distributional robustness does not automatically address semantic fidelity. A specific derivation or proposition is needed showing that the Wasserstein ball radius and chosen statistics prevent drift in higher-order moments or semantic content; without it, the guarantees remain limited to marginal control and do not fully support the vulnerability-diagnosis use case.

Authors: We concur that the existing theoretical results focus on DRO robustness under Wasserstein marginal constraints and do not yet derive explicit bounds on higher-order moment drift or semantic content. The current guarantees establish that the perturbed distribution remains within the Wasserstein ball of the original, thereby preserving the selected statistics, but they stop short of semantic-level control. In the revised manuscript we will insert a new proposition in the theoretical guarantees section that provides a sufficient condition on the ball radius and statistic choice to bound the expected change in selected higher-order moments (e.g., via Lipschitz continuity arguments on the transport map). This will be accompanied by a short proof sketch and a note on its empirical validation in the image and tabular experiments. revision: yes

Circularity Check

0 steps flagged

No circularity detected; framework rests on independent OT and DRO foundations

full rationale

The provided abstract and context describe a model-agnostic framework that applies established Optimal Transport and Distributionally Robust Optimization tools to enforce Wasserstein constraints on selected feature statistics for generating perturbations. No derivation chain is shown that reduces any central prediction or guarantee to a fitted parameter or self-citation by construction; the approach is presented as building on prior independent literature with theoretical guarantees that do not appear tautological within the paper's own equations. The reader's assessment of score 2.0 is consistent with this, as any self-citation would be non-load-bearing and the core claims retain external grounding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the Wasserstein metric and DRO without introducing new free parameters, axioms beyond established math, or invented entities in the abstract description.

axioms (2)

standard math Wasserstein distance defines a valid metric for measuring and constraining data perturbations while preserving selected statistics
Invoked to generate realistic perturbations that respect feature-level constraints such as brightness or age distribution.
standard math Distributionally robust optimization can be used to analyze worst-case model responses within the constrained perturbation set
Leveraged to provide the diagnostic analysis of ML algorithm behavior.

pith-pipeline@v0.9.0 · 5393 in / 1421 out tokens · 53253 ms · 2026-05-15T09:52:37.336831+00:00 · methodology