pith. machine review for the scientific record. sign in

arxiv: 2604.18540 · v1 · submitted 2026-04-20 · 🧮 math.AP · cs.LG· math.FA· math.OC

Recognition: unknown

Duality for the Adversarial Total Variation

Authors on Pith no claims yet

Pith reviewed 2026-05-10 03:39 UTC · model grok-4.3

classification 🧮 math.AP cs.LGmath.FAmath.OC
keywords adversarial trainingnonlocal total variationdualitysubdifferentialnonlocal gradientintegration by partsmetric spacesregularized risk minimization
0
0 comments X

The pith

Duality techniques yield a dual representation of the nonlocal total variation from adversarial training together with an integration-by-parts formula.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper starts from the observation that adversarial training of binary classifiers is equivalent to a regularized risk minimization problem whose regularizer is a nonlocal total variation. It then derives a dual representation of this total variation and an accompanying integration-by-parts identity that involves a nonlocal gradient and a nonlocal divergence. These identities are proved both for continuous functions vanishing at infinity on proper metric spaces and for essentially bounded functions on Euclidean domains. Under additional conditions the dual representation is used to characterize the subdifferential of the nonlocal total variation in each setting. A reader would care because an explicit dual form and subdifferential description give concrete tools for analyzing the optimization landscape of adversarial training.

Core claim

We establish a characterization of the subdifferential of this total variation using duality techniques. To achieve this, we derive a dual representation of the nonlocal total variation and a related integration of parts formula, involving a nonlocal gradient and divergence. We provide such duality statements both in the space of continuous functions vanishing at infinity on proper metric spaces and for the space of essentially bounded functions on Euclidean domains. Furthermore, under some additional conditions we provide characterizations of the subdifferential in these settings.

What carries the argument

The dual representation of the nonlocal total variation, obtained via a pairing between a nonlocal gradient and a nonlocal divergence, that enables the integration-by-parts formula and subsequent subdifferential characterization.

If this is right

  • The nonlocal total variation admits an explicit dual form in both the continuous vanishing-at-infinity setting on metric spaces and the L^infty setting on Euclidean domains.
  • An integration-by-parts formula holds that links the nonlocal gradient and divergence operators.
  • Under further conditions the subdifferential of the nonlocal total variation can be described explicitly in each of the two function spaces.
  • These duality results apply directly to the regularizer appearing in the reformulated adversarial-training problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dual representation may simplify the design of proximal operators or dual-based algorithms for minimizing the adversarial risk functional.
  • The same nonlocal gradient and divergence pairing could be useful for studying other nonlocal variational problems that arise in imaging or graph-based learning.
  • If the additional conditions can be relaxed, the subdifferential description might extend to more general data distributions or to multi-class classifiers.

Load-bearing premise

The subdifferential characterizations are stated only under additional conditions whose precise form and necessity are not fully detailed.

What would settle it

A concrete function in one of the function spaces for which the proposed dual representation fails to recover the value of the nonlocal total variation or for which the predicted subdifferential element does not satisfy the defining inequality.

Figures

Figures reproduced from arXiv: 2604.18540 by Leon Bungert, Lucas Schmitt.

Figure 1
Figure 1. Figure 1: Schematic visualization of the construction by squeezing and mollifying. [PITH_FULL_IMAGE:figures/full_fig_p025_1.png] view at source ↗
read the original abstract

Adversarial training of binary classifiers can be reformulated as regularized risk minimization involving a nonlocal total variation. Building on this perspective, we establish a characterization of the subdifferential of this total variation using duality techniques. To achieve this, we derive a dual representation of the nonlocal total variation and a related integration of parts formula, involving a nonlocal gradient and divergence. We provide such duality statements both in the space of continuous functions vanishing at infinity on proper metric spaces and for the space of essentially bounded functions on Euclidean domains. Furthermore, under some additional conditions we provide characterizations of the subdifferential in these settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript reformulates adversarial training of binary classifiers as regularized risk minimization with a nonlocal total variation functional. It derives a dual representation of this nonlocal total variation together with a nonlocal integration-by-parts formula involving a nonlocal gradient and divergence. These duality statements are established in two settings: continuous functions vanishing at infinity on proper metric spaces, and essentially bounded functions on Euclidean domains. Under additional conditions, the authors provide characterizations of the subdifferential of the nonlocal total variation in both settings.

Significance. If the derivations hold, the work supplies a convex-analytic toolkit for a functional that appears in adversarial robustness analysis. The dual representation and integration-by-parts formula are obtained via standard techniques applied to a newly introduced nonlocal operator, which is a methodological strength. The extension to proper metric spaces broadens the setting beyond Euclidean domains. The explicit flagging of extra hypotheses for the subdifferential results is transparent, though it indicates that the strongest claims require those hypotheses.

major comments (2)
  1. Abstract and §1: The subdifferential characterizations are asserted only 'under some additional conditions' whose precise hypotheses, necessity, and restrictiveness are not stated in the abstract or introduction. Because these conditions are load-bearing for the applicability of the main results, they must be formulated explicitly at the outset so that readers can immediately assess the scope.
  2. The nonlocal integration-by-parts formula (used to obtain the dual representation) is central to all subsequent claims. The precise assumptions on the test functions and on the integrability of the nonlocal gradient/divergence pair should be isolated in a dedicated lemma or proposition so that the formula can be verified independently of the subdifferential application.
minor comments (3)
  1. The notation for the nonlocal gradient and divergence operators should be introduced with a self-contained definition (including domain and codomain) before the integration-by-parts formula is stated.
  2. In the Euclidean L^∞ setting, it would be helpful to include a short remark comparing the resulting dual representation with the metric-space version, highlighting any simplifications or additional technicalities that arise.
  3. A brief discussion of whether the additional conditions for the subdifferential can be relaxed or are sharp would strengthen the presentation, even if only as a remark.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and the positive evaluation of the manuscript. The suggestions for improving clarity are well taken, and we will revise the paper accordingly.

read point-by-point responses
  1. Referee: Abstract and §1: The subdifferential characterizations are asserted only 'under some additional conditions' whose precise hypotheses, necessity, and restrictiveness are not stated in the abstract or introduction. Because these conditions are load-bearing for the applicability of the main results, they must be formulated explicitly at the outset so that readers can immediately assess the scope.

    Authors: We agree that the precise hypotheses under which the subdifferential characterizations hold should be stated explicitly already in the abstract and introduction. In the revised manuscript we will formulate these conditions at the outset, indicating their necessity and the settings in which they apply, so that readers can immediately gauge the scope of the results. revision: yes

  2. Referee: The nonlocal integration-by-parts formula (used to obtain the dual representation) is central to all subsequent claims. The precise assumptions on the test functions and on the integrability of the nonlocal gradient/divergence pair should be isolated in a dedicated lemma or proposition so that the formula can be verified independently of the subdifferential application.

    Authors: We concur that the nonlocal integration-by-parts formula is foundational and that its hypotheses deserve to be stated independently. In the revision we will extract the precise assumptions on the test functions and the required integrability of the nonlocal gradient/divergence pair into a dedicated lemma, allowing the formula to be verified on its own before its use in the duality and subdifferential arguments. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard duality derivations are self-contained

full rationale

The paper applies established duality techniques from convex analysis to derive a dual representation of the nonlocal total variation and a nonlocal integration-by-parts formula, then uses these to characterize the subdifferential under explicitly stated additional conditions. These steps are presented as direct mathematical consequences in two function-space settings (C_0 on proper metric spaces and L^∞ on Euclidean domains) without any reduction to fitted parameters, self-definitional loops, or load-bearing self-citations. The abstract and summary flag the extra hypotheses openly rather than smuggling them in, confirming the derivation chain remains independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard functional-analysis background and introduces the dual representation as its main new object; no free parameters or invented entities are indicated.

axioms (2)
  • standard math Standard properties of proper metric spaces and the space of continuous functions vanishing at infinity.
    Used to set the functional setting for the first duality result.
  • domain assumption Existence and basic properties of nonlocal gradient and divergence operators.
    Invoked to state the integration-by-parts formula.

pith-pipeline@v0.9.0 · 5391 in / 1166 out tokens · 39433 ms · 2026-05-10T03:39:17.573628+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 3 canonical work pages · 2 internal anchors

  1. [1]

    Achim Klenke.Probability theory—a comprehensive course. Third. Universitext. Springer, Cham, [2020]©2020, pp. xiv+716.isbn: 978-3-030-56402-5; 978-3-030-56401-8 (cit. on p. 8)

  2. [2]

    Adversarial flows: A gradient flow char- acterization of adversarial attacks

    Lukas Weigand, Tim Roith, and Martin Burger. “Adversarial flows: A gradient flow char- acterization of adversarial attacks”. In:European Journal of Applied Mathematics(2026), pp. 1–57 (cit. on p. 2)

  3. [3]

    Adversarial Training: Existence for general loss functions and asymp- totics

    Lennart Siethoff. “Adversarial Training: Existence for general loss functions and asymp- totics”. Provided upon request toleon.bungert@uni-wuerzburg.de. MA thesis. University of Würzburg, 2025 (cit. on pp. 2, 3)

  4. [4]

    A mean curvature flow arising in adversarial training

    Leon Bungert, Tim Laux, and Kerrek Stinson. “A mean curvature flow arising in adversarial training”. In:J. Math. Pures Appl. (9)192 (2024), p. 103625.issn: 0021-7824,1776-3371 (cit. on pp. 2, 3, 19, 20)

  5. [5]

    Gamma-convergence of a nonlocal perimeter arising in adversarial machine learning

    Leon Bungert and Kerrek Stinson. “Gamma-convergence of a nonlocal perimeter arising in adversarial machine learning”. In:Calculus of Variations and Partial Differential Equations 63.5 (2024), p. 114 (cit. on pp. 2, 3, 20)

  6. [6]

    On the existence of solutions to adversarialtraininginmulticlassclassification

    Nicolás García Trillos, Matt Jacobs, and Jakwang Kim. “On the existence of solutions to adversarialtraininginmulticlassclassification”.In:European Journal of Applied Mathematics (2024), pp. 1–21 (cit. on pp. 2, 3)

  7. [7]

    An optimal transport approach for computing adversarial training lower bounds in multiclass classifica- tion

    Nicolás García Trillos, Matt Jacobs, Jakwang Kim, and Matthew Werenski. “An optimal transport approach for computing adversarial training lower bounds in multiclass classifica- tion”. In:Journal of machine learning research25.393 (2024), pp. 1–45 (cit. on p. 2)

  8. [8]

    The geometry of adversarial train- ing in binary classification

    Leon Bungert, Nicolás García Trillos, and Ryan Murray. “The geometry of adversarial train- ing in binary classification”. In:Inf. Inference12.2 (2023), pp. 921–968.issn: 2049-8764,2049- 8772 (cit. on pp. 2, 3, 5, 6)

  9. [9]

    An analytical and geometric perspective on ad- versarial robustness

    Nicolás García Trillos and Matt Jacobs. “An analytical and geometric perspective on ad- versarial robustness”. In:Notices of the American Mathematical Society70.08 (2023), p. 2 (cit. on p. 2)

  10. [10]

    The multimarginal optimal trans- port formulation of adversarial multiclass classification

    Nicolás García Trillos, Matt Jacobs, and Jakwang Kim. “The multimarginal optimal trans- port formulation of adversarial multiclass classification”. In:Journal of machine learning research24.45 (2023), pp. 1–56 (cit. on p. 2)

  11. [11]

    Universal and Transferable Adversarial Attacks on Aligned Language Models

    Andy Zou, Zifan Wang, Nicholas Carlini, Milad Nasr, J. Zico Kolter, and Matt Fredrikson. Universal and Transferable Adversarial Attacks on Aligned Language Models. 2023. arXiv: 2307.15043 [cs.CL](cit. on p. 2)

  12. [12]

    Adversarial classification: Necessary conditions and geometric flows

    Nicolás García Trillos and Ryan Murray. “Adversarial classification: Necessary conditions and geometric flows”. In:Journal of Machine Learning Research23.187 (2022), pp. 1–38 (cit. on p. 2)

  13. [13]

    On the existence of the adversar- ial bayes classifier

    Pranjal Awasthi, Natalie Frank, and Mehryar Mohri. “On the existence of the adversar- ial bayes classifier”. In:Advances in Neural Information Processing Systems. Vol. 34. 2021, pp. 2978–2990 (cit. on p. 2)

  14. [14]

    Statistical analysis of Wasserstein distributionally robust estimators

    Jose Blanchet, Karthyek Murthy, and Viet Anh Nguyen. “Statistical analysis of Wasserstein distributionally robust estimators”. In:Tutorials in Operations Research: Emerging optimiza- tion methods and modeling techniques with applications. INFORMS, 2021, pp. 227–254 (cit. on p. 3)

  15. [15]

    Nonlinear spectral decompositions by gradient flows of one-homogeneous functionals

    Leon Bungert, Martin Burger, Antonin Chambolle, and Matteo Novaga. “Nonlinear spectral decompositions by gradient flows of one-homogeneous functionals”. In:Analysis & PDE14.3 (2021), pp. 823–860 (cit. on p. 3). 35

  16. [16]

    The total variation flow in metric random walk spaces

    José M. Mazón, Marcos Solera, and Julián Toledo. “The total variation flow in metric random walk spaces”. In:Calc. Var. Partial Differential Equations59.1(2020), PaperNo. 29, 64.issn: 0944-2669,1432-0835 (cit. on pp. 5, 10)

  17. [17]

    Robust Wasserstein profile inference and applications to machine learning

    Jose Blanchet, Yang Kang, and Karthyek Murthy. “Robust Wasserstein profile inference and applications to machine learning”. In:Journal of Applied Probability56.3 (2019), pp. 830–857 (cit. on p. 3)

  18. [18]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. “Towards Deep Learning Models Resistant to Adversarial Attacks”. In:Proceedings of the 6th International Conference on Learning Representations (ICLR). Conference Track Proceedings. 2018 (cit. on p. 2)

  19. [19]

    Bredies and M

    K. Bredies and M. Holler.A pointwise characterization of the subdifferential of the total variation functional. 2016. arXiv:1609.08918 [math.FA](cit. on pp. 3, 4, 15, 34)

  20. [20]

    Explaining and Harnessing Adversarial Examples

    Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy.Explaining and Harnessing Ad- versarial Examples. 2015. arXiv:1412.6572 [stat.ML](cit. on p. 2)

  21. [21]

    Intriguing Properties of Neural Networks

    Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. “Intriguing Properties of Neural Networks”. In:Proceedings of the 2nd International Conference on Learning Representations (ICLR). Ed. by Yoshua Bengio and Yann LeCun. Conference Track Proceedings. 2014 (cit. on p. 2)

  22. [22]

    A first-order primal-dual algorithm for convex prob- lems with applications to imaging

    Antonin Chambolle and Thomas Pock. “A first-order primal-dual algorithm for convex prob- lems with applications to imaging”. In:J. Math. Imaging Vision40.1 (2011), pp. 120–145. issn: 0924-9907,1573-7683 (cit. on p. 4)

  23. [23]

    Springer, 2011 (cit

    Erhan Çinlar.Probability and stochastics. Springer, 2011 (cit. on p. 16)

  24. [24]

    Nonlocal operators with applications to image processing

    Guy Gilboa and Stanley Osher. “Nonlocal operators with applications to image processing”. In:Multiscale Modeling & Simulation7.3 (2009), pp. 1005–1028 (cit. on p. 29)

  25. [25]

    Cédric Villani.Optimal transport. Vol. 338. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Old and new. Springer-Verlag, Berlin, 2009, pp. xxii+973.isbn: 978-3-540-71049-3 (cit. on p. 9)

  26. [26]

    Springer Monographs in Mathematics

    Irene Fonseca and Giovanni Leoni.Modern methods in the calculus of variations:Lp spaces. Springer Monographs in Mathematics. Springer, New York, 2007, pp. xiv+599.isbn: 978-0- 387-35784-3 (cit. on pp. 9, 37)

  27. [27]

    Aliprantis and Kim C

    Charalambos D. Aliprantis and Kim C. Border.Infinite dimensional analysis. Third. A hitch- hiker’s guide. Springer, Berlin, 2006, pp. xxii+703.isbn: 978-3-540-32696-0; 3-540-32696-0 (cit. on pp. 9–11, 33, 37, 38)

  28. [28]

    An algorithm for total variation minimization and applications

    Antonin Chambolle. “An algorithm for total variation minimization and applications”. In: Journal of Mathematical imaging and vision20.1 (2004), pp. 89–97 (cit. on p. 4)

  29. [29]

    Some quali- tative properties for the total variation flow

    Fuensanta Andreu, Vicent Caselles, Jesus Ildefonso Díaz, and José M Mazón. “Some quali- tative properties for the total variation flow”. In:Journal of functional analysis188.2 (2002), pp. 516–547 (cit. on p. 4)

  30. [30]

    Minimizing total variation flow

    F. Andreu, C. Ballester, V. Caselles, and J. M. Mazón. “Minimizing total variation flow”. In: Differential and Integral Equations14.3 (2001), pp. 321–360 (cit. on p. 4)

  31. [31]

    Oxford Mathematical Monographs

    Luigi Ambrosio, Nicola Fusco, and Diego Pallara.Functions of bounded variation and free discontinuity problems. Oxford Mathematical Monographs. The Clarendon Press, Oxford University Press, New York, 2000, pp. xviii+434.isbn: 0-19-850245-1 (cit. on pp. 3, 15, 16). 36

  32. [32]

    Ivar Ekeland and Roger Témam.Convex analysis and variational problems. English. Vol. 28. Classics in Applied Mathematics. Translated from the French. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1999, pp. xiv+402.isbn: 0-89871-450-8 (cit. on pp. 4, 18)

  33. [33]

    Folland.Real analysis

    Gerald B. Folland.Real analysis. Second. Pure and Applied Mathematics (New York). Mod- ern techniques and their applications, A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1999, pp. xvi+386.isbn: 0-471-31716-0 (cit. on p. 21)

  34. [34]

    Tyrrell Rockafellar and Roger J.-B

    R. Tyrrell Rockafellar and Roger J.-B. Wets.Variational analysis. Vol. 317. Grundlehren der mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1998, pp. xiv+733.isbn: 3-540-62772-3 (cit. on pp. 9, 37)

  35. [35]

    Schwartz.Linear operators

    Nelson Dunford and Jacob T. Schwartz.Linear operators. Part I. Wiley Classics Library. General theory, With the assistance of William G. Bade and Robert G. Bartle, Reprint of the 1958 original, A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York, 1988, pp. xiv+858.isbn: 0-471-60848-3 (cit. on pp. 5, 33)

  36. [36]

    Pairings between measures and bounded functions and compensated compactness

    Gabriele Anzellotti. “Pairings between measures and bounded functions and compensated compactness”. In:Annali di Matematica pura ed applicata135.1 (1983), pp. 293–318 (cit. on pp. 3, 34)

  37. [37]

    Surlesrapportsentrel’existencedesintègrales R 1 0 f(x, y)dx, R 1 0 f(x, y)dy et R 1 0 dx R 1 0 f(x, y)dy

    WacławSierpiński.“Surlesrapportsentrel’existencedesintègrales R 1 0 f(x, y)dx, R 1 0 f(x, y)dy et R 1 0 dx R 1 0 f(x, y)dy”. In:Fundamenta Mathematicae1 (1920), pp. 142–147 (cit. on p. 28). Appendix A Adapted Measurable Maximum Theorem The techniques used in this paper rely heavily on the theory of measurable selectors for measur- able correspondences. A ...