pith. sign in

arxiv: 2510.10988 · v2 · pith:ZGMGXXKQnew · submitted 2025-10-13 · 📊 stat.ML · cs.LG

Adversarial Robustness in One-Stage Learning-to-Defer

Pith reviewed 2026-05-21 20:47 UTC · model grok-4.3

classification 📊 stat.ML cs.LG
keywords adversarial robustnesslearning to deferone-stage trainingsurrogate lossesconsistency guaranteesclassificationregressionhybrid decision making
0
0 comments X

The pith

A new framework secures one-stage learning-to-defer against adversarial attacks on both predictions and deferral decisions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces the first framework for adversarial robustness in one-stage learning-to-defer, where a predictor and deferral mechanism train jointly rather than in separate stages. It formalizes attacks that can flip both outputs, introduces cost-sensitive adversarial surrogate losses for training, and proves consistency guarantees of H, (R, F), and Bayes type for classification and regression. Experiments on standard benchmarks show the methods raise robustness to untargeted and targeted attacks while keeping accuracy on clean inputs comparable to non-robust baselines. A sympathetic reader would care because prior robustness work left the joint-training case open, so attacks could silently change which expert or model handles an input.

Core claim

We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including H, (R, F), and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

What carries the argument

Cost-sensitive adversarial surrogate losses that jointly optimize the predictor and deferral rule under formal attack models.

If this is right

  • Robustness to untargeted and targeted attacks improves in one-stage L2D without degrading clean accuracy.
  • The same loss construction applies to both classification and regression deferral problems.
  • Theoretical guarantees cover H-consistency, (R,F)-consistency, and Bayes consistency.
  • The framework closes the gap left by prior two-stage robustness analyses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same cost-sensitive construction might extend to settings with multiple experts or sequential deferral decisions.
  • Real-world hybrid systems that route safety-critical inputs could adopt the joint-training recipe to limit attack surface.
  • Future work could test whether the surrogate losses remain effective when the attack budget varies across different input regions.

Load-bearing premise

The cost-sensitive adversarial surrogate losses can be jointly optimized in the one-stage setting to achieve the stated consistency guarantees.

What would settle it

An explicit counter-example input distribution where the proposed surrogate losses produce a deferral rule that is neither H-consistent nor (R,F)-consistent under the formalized attack model.

read the original abstract

Learning-to-Defer (L2D) enables hybrid decision-making by routing inputs either to a predictor or to external experts. While promising, L2D is highly vulnerable to adversarial perturbations, which can not only flip predictions but also manipulate deferral decisions. Prior robustness analyses focus solely on two-stage settings, leaving open the end-to-end (one-stage) case where predictor and allocation are trained jointly. We introduce the first framework for adversarial robustness in one-stage L2D, covering both classification and regression. Our approach formalizes attacks, proposes cost-sensitive adversarial surrogate losses, and establishes theoretical guarantees including $\mathcal{H}$, $(\mathcal{R }, \mathcal{F})$, and Bayes consistency. Experiments on benchmark datasets confirm that our methods improve robustness against untargeted and targeted attacks while preserving clean performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces the first framework for adversarial robustness in one-stage Learning-to-Defer (L2D), covering both classification and regression. It formalizes attacks on the joint predictor-deferral decisions, proposes cost-sensitive adversarial surrogate losses, and claims theoretical guarantees of H-consistency, (R, F)-consistency, and Bayes consistency. Experiments on benchmark datasets are reported to show improved robustness to untargeted and targeted attacks while preserving clean performance.

Significance. If the consistency guarantees are shown to hold under joint one-stage optimization, the work would establish a foundational approach for robust end-to-end L2D systems, extending prior two-stage analyses and providing practical surrogate losses for hybrid decision-making under adversarial conditions.

major comments (2)
  1. [§4.2, Theorem 3] §4.2, Theorem 3 (H-consistency): the proof appears to extend the two-stage surrogate-loss calibration directly to the joint setting, but the one-stage formulation couples the predictor and deferral parameters through a shared network and single loss; without an explicit re-derivation showing that the adversarial perturbation set and cost matrix preserve the required fixed-point property under joint gradients, the guarantee does not automatically transfer.
  2. [§4.3] §4.3, (R, F)-consistency claim: the argument relies on the cost-sensitive adversarial loss maintaining Bayes consistency when optimized jointly, yet the manuscript provides no separate analysis of how the perturbation ball interacts with the coupled objective; this is load-bearing for the overall theoretical contribution.
minor comments (2)
  1. [§5.1] §5.1: the description of the attack generation procedure (PGD steps, epsilon values) could be expanded with explicit pseudocode or parameter tables for reproducibility.
  2. [Table 2] Table 2: the clean vs. adversarial accuracy columns would benefit from standard-error bars or multiple random seeds to support the reported improvements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on the theoretical contributions of our manuscript. We address each major comment point by point below, providing clarifications and committing to revisions that strengthen the presentation of the consistency results without altering the core claims.

read point-by-point responses
  1. Referee: [§4.2, Theorem 3] §4.2, Theorem 3 (H-consistency): the proof appears to extend the two-stage surrogate-loss calibration directly to the joint setting, but the one-stage formulation couples the predictor and deferral parameters through a shared network and single loss; without an explicit re-derivation showing that the adversarial perturbation set and cost matrix preserve the required fixed-point property under joint gradients, the guarantee does not automatically transfer.

    Authors: We appreciate the referee's careful scrutiny of the proof strategy. Theorem 3 establishes H-consistency with respect to the joint hypothesis class that encompasses both the predictor and deferral functions under simultaneous optimization. The adversarial perturbation set is defined over the combined output space of predictions and deferral decisions, and the cost matrix enters the surrogate loss in a manner that preserves the calibration property for the joint objective. Nevertheless, we agree that an explicit re-derivation would improve clarity and rigor. In the revised manuscript we will insert a dedicated supporting lemma immediately preceding Theorem 3 that re-derives the fixed-point property under joint gradient flow, explicitly accounting for the shared network parameters and the interaction between the perturbation ball and the cost-sensitive loss. revision: yes

  2. Referee: [§4.3] §4.3, (R, F)-consistency claim: the argument relies on the cost-sensitive adversarial loss maintaining Bayes consistency when optimized jointly, yet the manuscript provides no separate analysis of how the perturbation ball interacts with the coupled objective; this is load-bearing for the overall theoretical contribution.

    Authors: We thank the referee for identifying this point. The (R, F)-consistency argument proceeds by showing that any minimizer of the joint adversarial surrogate loss yields the Bayes-optimal combined decision rule under the given cost structure. The perturbation ball is incorporated by taking the supremum over perturbations inside the ball for each input, which is already reflected in the definition of the adversarial risk. We acknowledge, however, that a more granular analysis of how the radius of the ball couples with the shared parameters would make the load-bearing step fully transparent. In the revision we will add a short subsection (or appendix paragraph) that isolates this interaction, deriving an explicit bound on the consistency gap in terms of the perturbation radius and the joint optimization. revision: yes

Circularity Check

0 steps flagged

No circularity: new one-stage framework and consistency claims derived independently

full rationale

The paper introduces a novel framework for adversarial robustness in one-stage L2D, formalizes attacks on both classification and regression, proposes cost-sensitive adversarial surrogate losses, and establishes H, (R,F), and Bayes consistency guarantees. No quoted equations or sections reduce these guarantees by construction to fitted parameters, internal definitions, or unverified self-citations; the one-stage joint optimization is presented as a direct extension with its own theoretical analysis rather than a renaming or load-bearing reuse of prior two-stage results. The derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review prevents identification of specific free parameters, axioms, or invented entities; the work appears to rely on standard supervised learning assumptions such as differentiability of losses and existence of experts.

pith-pipeline@v0.9.0 · 5673 in / 1141 out tokens · 43558 ms · 2026-05-21T20:47:06.752053+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 4 internal anchors

  1. [1]

    Convexity, classification, and risk bounds

    doi: 10.1198/016214505000000907. Nina L Corvelo Benz and Manuel Gomez Rodriguez. Counterfactual inference of second opinions. InUncertainty in Artificial Intelligence, pages 453–463. PMLR,

  2. [2]

    Skin Lesion Analysis Toward Melanoma Detection 2018: A Challenge Hosted by the International Skin Imaging Collaboration (ISIC)

    14 Adversarial Robustness in One-Stage Learning-to-Defer Noel Codella, Veronica Rotemberg, Philipp Tschandl, M Emre Celebi, Stephen Dusza, David Gutman, Brian Helba, Aadi Kalloo, Konstantinos Liopyris, Michael Marchetti, et al. Skin lesion analysis toward melanoma detection 2018: A challenge hosted by the international skin imaging collaboration (isic).ar...

  3. [3]

    Aritra Ghosh, Himanshu Kumar, and P. S. Sastry. Robust loss functions under label noise for deep neural networks. InProceedings of the Thirty-First AAAI Conference on Artificial Intelligence, AAAI’17, page 1919–1925. AAAI Press,

  4. [4]

    Explaining and Harnessing Adversarial Examples

    Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples.arXiv preprint arXiv:1412.6572,

  5. [5]

    Uncovering the limits of adversarial training against norm-bounded adversarial examples,

    Sven Gowal, Chongli Qin, Jonathan Uesato, Timothy A. Mann, and Pushmeet Kohli. Uncov- ering the limits of adversarial training against norm-bounded adversarial examples.ArXiv, abs/2010.03593,

  6. [6]

    Adversarial Examples for Evaluating Reading Comprehension Systems

    doi: 10.24963/ijcai.2022/344. URL https://doi.org/10.24963/ijcai.2022/344. Main Track. Robin Jia and Percy Liang. Adversarial examples for evaluating reading comprehension systems.arXiv preprint arXiv:1707.07328,

  7. [7]

    Learning-to-defer for sequential medical decision-making under uncertainty.arXiv preprint arXiv:2109.06312,

    Shalmali Joshi, Sonali Parbhoo, and Finale Doshi-Velez. Learning-to-defer for sequential medical decision-making under uncertainty.arXiv preprint arXiv:2109.06312,

  8. [8]

    Vijay Keswani, Matthew Lease, and Krishnaram Kenthapadi

    URL https://proceedings.neurips.cc/paper_files/paper/2021/file/ 234b941e88b755b7a72a1c1dd5022f30-Paper.pdf. Vijay Keswani, Matthew Lease, and Krishnaram Kenthapadi. Towards unbiased and accurate deferral to multiple experts. InProceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, AIES ’21, page 154–165, New York, NY, USA,

  9. [9]

    ISBN 9781450384735

    Association 15 Montreuil Carlier, Yu, Ng, Ooi for Computing Machinery. ISBN 9781450384735. doi: 10.1145/3461702.3462516. URL https://doi.org/10.1145/3461702.3462516. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980,

  10. [10]

    Towards Deep Learning Models Resistant to Adversarial Attacks

    Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards deep learning models resistant to adversarial attacks.ArXiv, abs/1706.06083,

  11. [11]

    Anqi Mao, Christopher Mohri, Mehryar Mohri, and Yutao Zhong

    URLhttps://api.semanticscholar.org/CorpusID:3488815. Anqi Mao, Christopher Mohri, Mehryar Mohri, and Yutao Zhong. Two-stage learning to defer with multiple experts. InThirty-seventh Conference on Neural Information Processing Systems, 2023a. URLhttps://openreview.net/forum?id=GIlsH0T4b2. Anqi Mao, Mehryar Mohri, and Yutao Zhong. Cross-entropy loss functio...

  12. [12]

    Two-stage learning-to-defer for multi-task learning, 2024

    URL https://openreview.net/forum?id= 2KlxjR6lsd. Yannis Montreuil, Shu Heng Yeo, Axel Carlier, Lai Xing Ng, and Wei Tsang Ooi. Two-stage learning-to-defer for multi-task learning.arXiv preprint arXiv:2410.15729,

  13. [13]

    Adversarial robustness in two-stage learning-to-defer: Algorithms and guarantees, 2025

    Yannis Montreuil, Axel Carlier, Lai Xing Ng, and Wei Tsang Ooi. Adversarial robustness in two-stage learning-to-defer: Algorithms and guarantees.arXiv preprint arXiv:2502.01027, 2025a. 16 Adversarial Robustness in One-Stage Learning-to-Defer Yannis Montreuil, Axel Carlier, Lai Xing Ng, and Wei Tsang Ooi. Why ask one when you can ask k? two-stage learning-...

  14. [14]

    R A Ohn Aldrich

    URL https://proceedings.neurips.cc/paper_files/paper/2022/file/ bc8f76d9caadd48f77025b1c889d2e2d-Paper-Conference.pdf. R A Ohn Aldrich. Fisher and the making of maximum likelihood 1912-1922.Statistical Science, 12(3):162–179,

  15. [15]

    Peter Putten

    URL https://openreview.net/forum?id= mkkFubLdNW. Peter Putten. Insurance Company Benchmark (COIL 2000). UCI Machine Learning Repository,

  16. [16]

    Michael Redmond

    DOI: https://doi.org/10.24432/C5630S. Michael Redmond. Communities and Crime. UCI Machine Learning Repository,

  17. [17]

    Ingo Steinwart

    DOI: https://doi.org/10.24432/C53W3X. Ingo Steinwart. How to compare different loss functions and their risks.Constructive Approximation, 26:225–287,

  18. [18]

    Statistical behavior and consistency of classification methods based on convex risk minimization

    doi: 10.1214/aos/1079120130. Zhilu Zhang and Mert Sabuncu. Generalized cross entropy loss for training deep neural networks with noisy labels.Advances in neural information processing systems, 31,

  19. [19]

    18 Adversarial Robustness in One-Stage Learning-to-Defer Appendix A. Appendix A.1 Important Definitions, Lemmas, and Theorems Definition 19(Symmetric Hypothesis Class).Let A denote the set of possible actions (predictions and deferrals), and let Q be a class of hypotheses q : X → A . We say that Q is symmetricif it is closed under permutations of A, i.e.,...

  20. [20]

    A.2 Proof of Lemma 8 Lemma 8(Smooth Adversarial Surrogate Losses).Let x∈ X be a clean input, and let ρ > 0and κ > 0be hyperparameters. The smooth adversarial surrogate losses are defined as eΦu cls,s(h, x, j) = Φu cls(h(x)/ρ, j) +κsup x′ j ∈Bp(x,γ) ∆h(x′ j, j)− ∆h(x, j) 2, 19 Montreuil Carlier, Yu, Ng, Ooi ProofFix a target classj∈ A. Define Φρ,u cls (h(x...

  21. [21]

    The model is set as a linear layer

    for 25 epochs. The model is set as a linear layer. As experts, we employ four regression MLP, each focusing on different customer segments (demographics, product ownership, high-value customers) and generate predictions using rules and noise; their accuracies are reported in Appendix A.8.2. The consultation costs are set as follows: β1 = 0 for the main pr...