Recognition: 1 theorem link
· Lean TheoremDistributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Pith reviewed 2026-05-13 09:14 UTC · model grok-4.3
The pith
Regularization enables group DRO to achieve high worst-group accuracy on overparameterized neural networks
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Naively applying group DRO to overparameterized networks yields models with vanishing worst-case training loss yet poor test-time worst-group performance; adding stronger regularization restores high worst-group accuracy on held-out data from the same groups.
What carries the argument
Coupling group DRO with stronger-than-typical L2 regularization or early stopping to prevent overfitting on minority groups while minimizing worst-case loss.
Load-bearing premise
The failure of naive group DRO comes from poor generalization on groups rather than optimization difficulty, and the pre-defined training groups match the groups that matter at test time.
What would settle it
An experiment in which increasing regularization leaves worst-group accuracy unchanged or in which naive group DRO already reaches high worst-group accuracy without extra regularization on the same datasets.
read the original abstract
Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization---a stronger-than-typical L2 penalty or early stopping---we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that naive application of group DRO to overparameterized neural networks fails to improve worst-group test accuracy because models achieve vanishing worst-case training loss (any low-average-loss model already has low worst-case loss on the training groups), with failures instead arising from poor generalization on some groups. Coupling group DRO with stronger regularization (increased L2 penalty or early stopping) yields 10-40 percentage point gains in worst-group accuracy on an NLI task and two image tasks while preserving high average accuracy. The authors also introduce a stochastic optimization algorithm for group DRO with convergence guarantees.
Significance. If the empirical results hold and the gains are attributable to generalization rather than optimization, the work is significant for demonstrating that regularization remains crucial for worst-group generalization even in the overparameterized regime where it is often unnecessary for average generalization. The practical improvements and the proposed algorithm with guarantees represent concrete contributions to distributionally robust learning.
major comments (2)
- [Abstract and §3 (method)] The assertion that overparameterized models achieve vanishing worst-case training loss under naive group DRO (any model with vanishing average training loss already has vanishing worst-case loss) is load-bearing for the narrative that failures are due to generalization rather than optimization. Given the non-convex, non-smooth min-max objective, the manuscript should explicitly report the achieved worst-group training losses (e.g., in §4 or Table 1) to confirm the optimizer reaches this regime on the reported tasks.
- [§5 (experiments)] The 10-40 pp worst-group improvements rely on tuning regularization strength (L2 coefficient or early-stopping epoch), listed as a free parameter. The central claim would be strengthened by showing that these gains are robust across a range of regularization values and that the optimal regularization for worst-group accuracy differs systematically from that for average accuracy (e.g., via additional curves in §5).
minor comments (2)
- [§4 (algorithm)] The convergence guarantees for the proposed stochastic algorithm are stated but the precise assumptions (e.g., on the loss smoothness or step-size schedule) and any empirical verification of convergence rates could be expanded for clarity.
- [Figures in §5] Figure captions and legends should explicitly note the number of random seeds or runs used to generate error bars when comparing average vs. worst-group accuracies.
Simulated Author's Rebuttal
We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript to incorporate the suggested changes.
read point-by-point responses
-
Referee: [Abstract and §3 (method)] The assertion that overparameterized models achieve vanishing worst-case training loss under naive group DRO (any model with vanishing average training loss already has vanishing worst-case loss) is load-bearing for the narrative that failures are due to generalization rather than optimization. Given the non-convex, non-smooth min-max objective, the manuscript should explicitly report the achieved worst-group training losses (e.g., in §4 or Table 1) to confirm the optimizer reaches this regime on the reported tasks.
Authors: We agree that explicitly reporting the achieved worst-group training losses would strengthen the claim that the optimizer reaches the regime where average and worst-case training losses both vanish. In the revised manuscript we will add these values to Table 1 and the corresponding discussion in §4, confirming that worst-group training loss approaches zero under naive group DRO on the reported tasks. revision: yes
-
Referee: [§5 (experiments)] The 10-40 pp worst-group improvements rely on tuning regularization strength (L2 coefficient or early-stopping epoch), listed as a free parameter. The central claim would be strengthened by showing that these gains are robust across a range of regularization values and that the optimal regularization for worst-group accuracy differs systematically from that for average accuracy (e.g., via additional curves in §5).
Authors: We appreciate the suggestion to demonstrate robustness across regularization values. In the revised manuscript we will add plots in §5 showing worst-group and average accuracy as functions of the L2 coefficient and of the early-stopping epoch for both group DRO and ERM. These curves will illustrate that the improvements are robust over a range of regularization strengths and that the regularization level optimal for worst-group accuracy is systematically stronger than the level optimal for average accuracy. revision: yes
Circularity Check
No significant circularity; empirical results and algorithm are self-contained
full rationale
The paper's core argument rests on direct experimental observations that overparameterized models achieve vanishing worst-case training loss under naive group DRO (any low-average-loss model has low worst-case loss) and that stronger regularization yields 10-40 point worst-group gains. This is presented as an empirical finding on concrete tasks rather than a derivation that reduces by construction to fitted parameters or self-citations. The introduced stochastic optimizer is accompanied by stated convergence guarantees, supplying independent mathematical content. No load-bearing step invokes a uniqueness theorem from the authors' prior work, renames a known pattern, or defines a prediction in terms of its own inputs. The results remain externally falsifiable via replication on the reported NLI and image datasets.
Axiom & Free-Parameter Ledger
free parameters (1)
- regularization strength (L2 coefficient or early-stopping epoch)
axioms (2)
- domain assumption The training groups are known and fixed in advance.
- standard math Standard neural network training dynamics apply.
Forward citations
Cited by 26 Pith papers
-
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
-
Structure from Strategic Interaction & Uncertainty: Risk Sensitive Games for Robust Preference Learning
Risk-sensitive preference games retain monotonicity via translation-invariant risk measures, enabling convergent self-play algorithms with stability bounds and empirical robustness across data strata.
-
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study
A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or ...
-
eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts
eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.
-
Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift
Semantic segmentation models produce label flips within foreground regions under correlation shift, quantified by a new Flip diagnostic and an entropy-based flip-risk score.
-
Learning from Synthetic Data via Provenance-Based Input Gradient Guidance
A framework that applies provenance-based guidance to input gradients during synthetic data training to promote learning from target regions only.
-
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs
Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation w...
-
DuetFair: Coupling Inter- and Intra-Subgroup Robustness for Fair Medical Image Segmentation
DuetFair couples inter-subgroup adaptation with intra-subgroup robustness via FairDRO (dMoE plus subgroup-conditioned DRO) to boost worst-case and equity-scaled performance on medical segmentation benchmarks.
-
Structure from Strategic Interaction & Uncertainty: Risk Sensitive Games for Robust Preference Learning
Risk-sensitive preference games using convex risk measures produce policies that are robust across data strata and match or exceed standard Nash learning performance without added cost.
-
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
-
Robust Conditional Conformal Prediction via Branched Normalizing Flow
Branched Normalizing Flow improves conditional coverage robustness of conformal prediction under distribution shift by normalizing test inputs to the calibration distribution and mapping prediction sets back.
-
Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning
CHCL aligns a Cheeger-Hodge joint signature across graph augmentations to produce embeddings that remain stable under local structural changes.
-
Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts
The authors introduce predicted-weighted balanced accuracy (pBA), a utility-weighted evaluation metric that uses predicted subconcept posteriors to reduce bias from within-class heterogeneity in imbalanced data.
-
MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment
MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.
-
CrossPan: A Comprehensive Benchmark for Cross-Sequence Pancreas MRI Segmentation and Generalization
CrossPan benchmark shows cross-sequence MRI domain shifts cause pancreas segmentation models to fail catastrophically, establishing sequence generalization as the primary barrier to clinical deployment over center var...
-
CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization
CrossFlowDG bridges the modality gap in domain generalization by learning a continuous transformation that moves image embeddings to matching text embeddings using noise-free cross-modal flow matching.
-
Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization
RIA uses adversarial exploration of counterfactual graph environments via label-invariant augmentations to improve OoD generalization in graph classification tasks.
-
Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings
Circuit-based metrics from Vision Transformer internals provide better label-free proxies for generalization under distribution shift than existing methods like model confidence.
-
Visual prompting reimagined: The power of the Activation Prompts
Activation prompts on intermediate layers outperform input-level visual prompting and parameter-efficient fine-tuning in accuracy and efficiency across 29 datasets.
-
Robust Learning of Heterogeneous Dynamic Systems
A distributionally robust ODE learning framework for heterogeneous systems that uses worst-case optimization over convex derivative combinations to produce a stabilized weighted estimator with theoretical guarantees.
-
Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging
A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classificati...
-
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
-
A Toolkit for Detecting Spurious Correlations in Speech Datasets
A toolkit flags spurious correlations in speech datasets by checking if non-speech regions predict the target class better than chance.
-
Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning
BRAL-T uses TrustSet-guided reinforcement learning for batch active learning and reports state-of-the-art results on 10 image classification benchmarks plus 2 fine-tuning tasks.
-
Robust Deepfake Detection, NTIRE 2026 Challenge: Report
The NTIRE 2026 challenge finds that large foundation models combined with ensembles and degradation-aware training produce the most robust deepfake detectors.
-
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Reference graph
Works this paper leans on
-
[1]
M. Arjovsky, L. Bottou, I. Gulrajani, and D. Lopez-Paz. Invariant risk minimization. arXiv preprint arXiv:1907.02893, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[2]
M. A. Badgeley, J. R. Zech, L. Oakden-Rayner, B. S. Glicksberg, M. Liu, W. Gale, M. V. McConnell, B. Percha, T. M. Snyder, and J. T. Dudley. Deep learning predicts hip fracture using confounding patient and healthcare variables. npj Digital Medicine, 2, 2019
work page 2019
-
[3]
S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira. Analysis of representations for domain adaptation. In Advances in Neural Information Processing Systems (NeurIPS), pp.\ 137--144, 2006
work page 2006
-
[4]
A. Ben-Tal, D. den Hertog, A. D. Waegenaere, B. Melenberg, and G. Rennen. Robust solutions of optimization problems affected by uncertain probabilities. Management Science, 59: 0 341--357, 2013
work page 2013
-
[5]
D. P. Bertsekas. Convex Optimization Theory. Athena Scientific Belmont, 2009
work page 2009
-
[6]
D. Bertsimas, V. Gupta, and N. Kallus. Data-driven robust optimization. Mathematical Programming Series A, 167, 2018
work page 2018
-
[7]
J. Blanchet and K. Murthy. Quantifying distributional model risk via optimal transport. Mathematics of Operations Research, 44 0 (2): 0 565--600, 2019
work page 2019
-
[8]
S. L. Blodgett, L. Green, and B. O'Connor. Demographic dialectal variation in social media: A case study of A frican- A merican E nglish. In Empirical Methods in Natural Language Processing (EMNLP), pp.\ 1119--1130, 2016
work page 2016
-
[9]
S. Boyd and L. Vandenberghe. Convex Optimization . Cambridge University Press, 2004
work page 2004
-
[10]
M. Buda, A. Maki, and M. A. Mazurowski. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 106: 0 249--259, 2018
work page 2018
-
[11]
P. B\"uhlmann and N. Meinshausen. Magging: maximin aggregation for inhomogeneous large-scale data. In IEEE, 2016
work page 2016
-
[12]
J. Buolamwini and T. Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency, pp.\ 77--91, 2018
work page 2018
-
[13]
J. Byrd and Z. Lipton. What is the effect of importance weighting in deep learning? In International Conference on Machine Learning (ICML), pp.\ 872--881, 2019
work page 2019
-
[14]
K. Cao, C. Wei, A. Gaidon, N. Arechiga, and T. Ma. Learning imbalanced datasets with label-distribution-aware margin loss. In Advances in Neural Information Processing Systems (NeurIPS), 2019
work page 2019
-
[15]
Y. Cui, M. Jia, T. Lin, Y. Song, and S. Belongie. Class-balanced loss based on effective number of samples. In Computer Vision and Pattern Recognition (CVPR), pp.\ 9268--9277, 2019
work page 2019
- [16]
-
[17]
J. Duchi and H. Namkoong. Learning models with uniform performance via distributionally robust optimization. arXiv preprint arXiv:1810.08750, 2018
- [18]
- [19]
- [20]
-
[21]
P. M. Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Mathematical Programming, 171 0 (1): 0 115--166, 2018
work page 2018
-
[22]
Y. Ganin and V. Lempitsky. Unsupervised domain adaptation by backpropagation. In International Conference on Machine Learning (ICML), pp.\ 1180--1189, 2015
work page 2015
-
[23]
S. Gururangan, S. Swayamdipta, O. Levy, R. Schwartz, S. Bowman, and N. A. Smith. Annotation artifacts in natural language inference data. In Association for Computational Linguistics (ACL), pp.\ 107--112, 2018
work page 2018
- [24]
- [25]
-
[26]
T. B. Hashimoto, M. Srivastava, H. Namkoong, and P. Liang. Fairness without demographics in repeated loss minimization. In International Conference on Machine Learning (ICML), 2018
work page 2018
-
[27]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In Computer Vision and Pattern Recognition (CVPR), 2016
work page 2016
-
[28]
C. Heinze-Deml and N. Meinshausen. Conditional variance penalties and domain shift robustness. arXiv preprint arXiv:1710.11469, 2017
- [29]
-
[30]
D. Hovy and A. Søgaard. Tagging performance correlates with age. In Association for Computational Linguistics (ACL), pp.\ 483--488, 2015
work page 2015
-
[31]
W. Hu, G. Niu, I. Sato, and M. Sugiyama. Does distributionally robust supervised learning give robust classifiers? In International Conference on Machine Learning (ICML), 2018
work page 2018
-
[32]
S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International Conference on Machine Learning (ICML), pp.\ 448--456, 2015
work page 2015
-
[33]
D. Jurgens, Y. Tsvetkov, and D. Jurafsky. Incorporating dialectal variability for socially equitable language identification. In Association for Computational Linguistics (ACL), pp.\ 51--57, 2017
work page 2017
-
[34]
J. Kleinberg, S. Mullainathan, and M. Raghavan. Inherent trade-offs in the fair determination of risk scores. In Innovations in Theoretical Computer Science (ITCS), 2017
work page 2017
- [35]
-
[36]
J. T. Leek, R. B. Scharpf, H. C. Bravo, D. Simcha, B. Langmead, W. E. Johnson, D. Geman, K. Baggerly, and R. A. Irizarry. Tackling the widespread and critical impact of batch effects in high-throughput data. Nature Reviews Genetics, 11 0 (10), 2010
work page 2010
-
[37]
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, pp.\ 3730--3738, 2015
work page 2015
-
[38]
A. Maurer and M. Pontil. Empirical bernstein bounds and sample variance penalization. In Conference on Learning Theory (COLT), 2009
work page 2009
-
[39]
R. T. McCoy, E. Pavlick, and T. Linzen. Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference. In Association for Computational Linguistics (ACL), 2019
work page 2019
-
[40]
N. Meinshausen and P. B\"uhlmann. Maximin effects in inhomogeneous large-scale data. Annals of Statistics, 43, 2015
work page 2015
- [41]
-
[42]
H. Namkoong and J. Duchi. Stochastic gradient methods for distributionally robust optimization with f-divergences. In Advances in Neural Information Processing Systems (NeurIPS), 2016
work page 2016
-
[43]
H. Namkoong and J. Duchi. Variance regularization with convex objectives. In Advances in Neural Information Processing Systems (NeurIPS), 2017
work page 2017
-
[44]
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on optimization, 19 0 (4): 0 1574--1609, 2009
work page 2009
-
[45]
L. Oakden-Rayner, J. Dunnmon, G. Carneiro, and C. R \'e . Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. arXiv preprint arXiv:1909.12475, 2019
-
[46]
Y. Oren, S. Sagawa, T. Hashimoto, and P. Liang. Distributionally robust language modeling. In Empirical Methods in Natural Language Processing (EMNLP), 2019
work page 2019
- [47]
-
[48]
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1 0 (8), 2019
work page 2019
-
[49]
M. T. Ribeiro, S. Singh, and C. Guestrin. "why should I trust you?": Explaining the predictions of any classifier. In International Conference on Knowledge Discovery and Data Mining (KDD), 2016
work page 2016
-
[50]
D. Rothenh\"ausler, P. B\"uhlmann, N. Meinshausen, and J. Peters. Anchor regression: heterogeneous data meets causality. arXiv preprint arXiv:1801.06229, 2018
-
[51]
S. Shafieezadeh-Abadeh, P. M. Esfahani, and D. Kuhn. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems (NeurIPS), 2015
work page 2015
-
[52]
L. Shen, Z. Lin, and Q. Huang. Relay backpropagation for effective learning of deep convolutional neural networks. In European Conference on Computer Vision, pp.\ 467--482, 2016
work page 2016
-
[53]
H. Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function. Journal of Statistical Planning and Inference, 90: 0 227--244, 2000
work page 2000
- [54]
-
[55]
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research (JMLR), 15 0 (1): 0 1929--1958, 2014
work page 1929
-
[56]
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the I nception architecture for computer vision. In Computer Vision and Pattern Recognition (CVPR), pp.\ 2818--2826, 2016
work page 2016
-
[57]
R. Tatman. Gender and dialect bias in youtube’s automatic captions. In Workshop on Ethics in Natural Langauge Processing, volume 1, pp.\ 53--59, 2017
work page 2017
-
[58]
V. Vapnik. Principles of risk minimization for learning theory. In Advances in Neural Information Processing Systems, pp.\ 831--838, 1992
work page 1992
-
[59]
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. The Caltech - UCSD Birds -200-2011 dataset. Technical report, California Institute of Technology, 2011
work page 2011
-
[60]
J. Wen, C. Yu, and R. Greiner. Robust learning under uncertain test distributions: Relating covariate shift to model misspecification. In International Conference on Machine Learning (ICML), pp.\ 631--639, 2014
work page 2014
-
[61]
A. Williams, N. Nangia, and S. Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Association for Computational Linguistics (ACL), pp.\ 1112--1122, 2018
work page 2018
-
[62]
F. Yang, Z. Wang, and C. Heinze-Deml. Invariance-inducing regularization using worst-case transformations suffices to boost accuracy and spatial robustness. In Advances in Neural Information Processing Systems (NeurIPS), 2019
work page 2019
- [63]
-
[64]
B. Zhou, A. Lapedriza, A. Khosla, A. Oliva, and A. Torralba. Places: A 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40 0 (6): 0 1452--1464, 2017
work page 2017
-
[65]
O. Bastani and Y. Ioannou and L. Lampropoulos and D. Vytiniotis and A. Nori and A. Criminisi , booktitle =. Measuring neural net robustness with constraints , year =
-
[66]
E. Wong and J. Z. Kolter , booktitle =. Provable defenses against adversarial examples via the convex outer adversarial polytope , year =
-
[67]
K. Dvijotham and R. Stanforth and S. Gowal and T. Mann and P. Kohli , journal =. A Dual Approach to Scalable Verification of Deep Networks , year =
-
[68]
M. Hein and M. Andriushchenko , booktitle =. Formal guarantees on the robustness of a classifier against adversarial manipulation , year =
-
[69]
A. A. Ahmadi and A. Majumdar , journal =
-
[70]
K. Dvijotham and S. Gowal and R. Stanforth and R. Arandjelovic and B. O'Donoghue and J. Uesato and P. Kohli , journal =. Training verified learners with learned verifiers , year =
-
[71]
E. Wong and F. Schmidt and J. H. Metzen and J. Z. Kolter , booktitle =. Scaling provable adversarial defenses , year =
-
[72]
S. Gowal and K. Dvijotham and R. Stanforth and R. Bunel and C. Qin and J. Uesato and T. Mann and P. Kohli , journal =. On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models , year =
-
[73]
Y. Belinkov and Y. Bisk , booktitle =. Synthetic and natural noise both break neural machine translation , year =
-
[74]
J. Ebrahimi and A. Rao and D. Lowd and D. Dou , booktitle =. Hotflip: White-box adversarial examples for text classification , year =
-
[75]
D. Tsipras and S. Santurkar and L. Engstrom and A. Turner and A. Madry , journal =. There is no free lunch in adversarial robustness (but there are unexpected benefits) , year =
-
[76]
L. Schmidt and S. Santurkar and D. Tsipras and K. Talwar and A. Madry , booktitle =. Adversarially robust generalization requires more data , year =
-
[77]
H. Zhang and Y. Yu and J. Jiao and E. P. Xing and L. E. Ghaoui and M. I. Jordan , booktitle =. Theoretically principled trade-off between robustness and accuracy , year =
-
[78]
S. Zheng and Y. Song and T. Leung and I. Goodfellow , booktitle =. Improving the robustness of deep neural networks via stability training , year =
-
[79]
J. M. Cohen and E. Rosenfeld and J. Z. Kolter , booktitle =. Certified adversarial robustness via randomized smoothing , year =
-
[80]
C. Rosenberg and M. Hebert and H. Schneiderman , booktitle =. Semi-supervised self-training of object detection models , year =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.