hub Mixed citations

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, Percy Liang · 2019 · cs.LG · arXiv 1911.08731

Mixed citation behavior. Most common role is background (67%).

40 Pith papers citing it

Background 67% of classified citations

open full Pith review browse 40 citing papers arXiv PDF

abstract

Overparameterized neural networks can be highly accurate on average on an i.i.d. test set yet consistently fail on atypical groups of the data (e.g., by learning spurious correlations that hold on average but not in such groups). Distributionally robust optimization (DRO) allows us to learn models that instead minimize the worst-case training loss over a set of pre-defined groups. However, we find that naively applying group DRO to overparameterized neural networks fails: these models can perfectly fit the training data, and any model with vanishing average training loss also already has vanishing worst-case training loss. Instead, the poor worst-case performance arises from poor generalization on some groups. By coupling group DRO models with increased regularization---a stronger-than-typical L2 penalty or early stopping---we achieve substantially higher worst-group accuracies, with 10-40 percentage point improvements on a natural language inference task and two image tasks, while maintaining high average accuracies. Our results suggest that regularization is important for worst-group generalization in the overparameterized regime, even if it is not needed for average generalization. Finally, we introduce a stochastic optimization algorithm, with convergence guarantees, to efficiently train group DRO models.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6 method 2 dataset 1

citation-polarity summary

background 6 use method 2 use dataset 1

representative citing papers

$A^2$: Smaller Self-Supervised ViTs Localize Better than Larger Ones

cs.CV · 2026-06-02 · unverdicted · novelty 7.0

Smaller self-supervised ViTs localize objects better via attention than larger ViTs, enabling A² to decouple localization from feature extraction for competitive performance on distribution-shifted benchmarks.

D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training

cs.CL · 2026-05-29 · unverdicted · novelty 7.0

D³ introduces a dynamic directional graph-constrained framework that models sample interactions via loss dependencies to derive an optimized training sequence for LLMs.

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

cs.LG · 2026-05-29 · unverdicted · novelty 7.0

Post-hoc truncation of the tail of the SVD of ΔW reduces spurious-group gaps by up to 5× with <2 pp accuracy loss across 0.5B–7B models and four benchmarks.

CB-SLICE: Concept-Based Interpretable Error Slice Discovery

cs.LG · 2026-05-28 · unverdicted · novelty 7.0

CB-SLICE uses concept mispredictions from CBMs to discover and explain error slices, claiming better performance than existing methods on benchmarks for bias detection.

Continual Learning of Domain-Invariant Representations

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Introduces replay-based continual learning with sequential invariance alignment to learn domain-invariant representations, outperforming baselines on generalization to unseen domains across six datasets in vision, medicine, manufacturing, and ecology.

Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.

eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.

Learning from Synthetic Data via Provenance-Based Input Gradient Guidance

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

A framework that applies provenance-based guidance to input gradients during synthetic data training to promote learning from target regions only.

BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation

cs.CL · 2026-02-27 · unverdicted · novelty 7.0

BRIDGE reduces bias against high-scoring ELL students in automated scoring by generating synthetic samples via inter-group content pasting and quality discrimination, achieving fairness gains comparable to additional real data.

Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

cs.LG · 2025-07-21 · unverdicted · novelty 7.0

An RL agent learns domain re-weighting policies from evaluation feedback to improve balanced performance in continual pre-training of LLMs across source and target domains.

The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter

cs.LG · 2024-11-07 · unverdicted · novelty 7.0

ML researchers assess spurious correlations via four pragmatic frames (relevance, generalizability, human-likeness, harmfulness) rather than a fixed statistical definition.

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

SW-DRSO optimizes a tractable surrogate of worst-case expected loss over plausible inference-time corruptions using a barycentric adversary approximated via simplex weights.

Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

cs.LG · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.

Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation with no minority examples in training.

DuetFair: Coupling Inter- and Intra-Subgroup Robustness for Fair Medical Image Segmentation

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

DuetFair couples inter-subgroup adaptation with intra-subgroup robustness via FairDRO (dMoE plus subgroup-conditioned DRO) to boost worst-case and equity-scaled performance on medical segmentation benchmarks.

Structure from Strategic Interaction & Uncertainty: Risk Sensitive Games for Robust Preference Learning

cs.GT · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

Risk-sensitive preference games using convex risk measures produce policies that are robust across data strata and match or exceed standard Nash learning performance without added cost.

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

cs.LG · 2026-05-10 · unverdicted · novelty 6.0

Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.

Robust Conditional Conformal Prediction via Branched Normalizing Flow

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

Branched Normalizing Flow improves conditional coverage robustness of conformal prediction under distribution shift by normalizing test inputs to the calibration distribution and mapping prediction sets back.

Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning

cs.LG · 2026-04-29 · unverdicted · novelty 6.0

CHCL aligns a Cheeger-Hodge joint signature across graph augmentations to produce embeddings that remain stable under local structural changes.

Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts

cs.LG · 2026-04-28 · unverdicted · novelty 6.0

The authors introduce predicted-weighted balanced accuracy (pBA), a utility-weighted evaluation metric that uses predicted subconcept posteriors to reduce bias from within-class heterogeneity in imbalanced data.

MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment

cs.LG · 2026-04-22 · unverdicted · novelty 6.0

MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.

CrossPan: A Comprehensive Benchmark for Cross-Sequence Pancreas MRI Segmentation and Generalization

cs.CV · 2026-04-20 · unverdicted · novelty 6.0

CrossPan benchmark shows cross-sequence MRI domain shifts cause pancreas segmentation models to fail catastrophically, establishing sequence generalization as the primary barrier to clinical deployment over center variability or architecture choices.

citing papers explorer

Showing 40 of 40 citing papers.

$A^2$: Smaller Self-Supervised ViTs Localize Better than Larger Ones cs.CV · 2026-06-02 · unverdicted · none · ref 30 · internal anchor
Smaller self-supervised ViTs localize objects better via attention than larger ViTs, enabling A² to decouple localization from feature extraction for competitive performance on distribution-shifted benchmarks.
D$^3$: Dynamic Directional Graph-Constrained Data Scheduling for LLM Training cs.CL · 2026-05-29 · unverdicted · none · ref 22 · internal anchor
D³ introduces a dynamic directional graph-constrained framework that models sample interactions via loss dependencies to derive an optimized training sequence for LLMs.
Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates cs.LG · 2026-05-29 · unverdicted · none · ref 9 · internal anchor
Post-hoc truncation of the tail of the SVD of ΔW reduces spurious-group gaps by up to 5× with <2 pp accuracy loss across 0.5B–7B models and four benchmarks.
CB-SLICE: Concept-Based Interpretable Error Slice Discovery cs.LG · 2026-05-28 · unverdicted · none · ref 8 · internal anchor
CB-SLICE uses concept mispredictions from CBMs to discover and explain error slices, claiming better performance than existing methods on benchmarks for bias detection.
Continual Learning of Domain-Invariant Representations cs.LG · 2026-05-15 · unverdicted · none · ref 14 · internal anchor
Introduces replay-based continual learning with sequential invariance alignment to learn domain-invariant representations, outperforming baselines on generalization to unseen domains across six datasets in vision, medicine, manufacturing, and ecology.
Distributionally Robust Multi-Task Reinforcement Learning via Adaptive Task Sampling cs.LG · 2026-05-14 · unverdicted · none · ref 5 · internal anchor
DRATS derives a minimax objective from a feasibility formulation of MTRL to adaptively sample tasks with the largest return gaps, leading to better worst-task performance on MetaWorld benchmarks.
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study cs.CV · 2026-05-07 · unverdicted · none · ref 38 · internal anchor
A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.
eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts cs.CV · 2026-05-07 · unverdicted · none · ref 24 · internal anchor
eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.
Learning from Synthetic Data via Provenance-Based Input Gradient Guidance cs.CV · 2026-04-03 · unverdicted · none · ref 31 · internal anchor
A framework that applies provenance-based guidance to input gradients during synthetic data training to promote learning from target regions only.
BRIDGE the Gap: Mitigating Bias Amplification in Automated Scoring of English Language Learners via Inter-group Data Augmentation cs.CL · 2026-02-27 · unverdicted · none · ref 22 · internal anchor
BRIDGE reduces bias against high-scoring ELL students in automated scoring by generating synthetic samples via inter-group content pasting and quality discrimination, achieving fairness gains comparable to additional real data.
Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training cs.LG · 2025-07-21 · unverdicted · none · ref 27 · internal anchor
An RL agent learns domain re-weighting policies from evaluation feedback to improve balanced performance in continual pre-training of LLMs across source and target domains.
The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter cs.LG · 2024-11-07 · unverdicted · none · ref 6 · internal anchor
ML researchers assess spurious correlations via four pragmatic frames (relevance, generalizability, human-likeness, harmfulness) rather than a fixed statistical definition.
Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption cs.LG · 2026-05-28 · unverdicted · none · ref 11 · internal anchor
SW-DRSO optimizes a tractable surrogate of worst-case expected loss over plausible inference-time corruptions using a barycentric adversary approximated via simplex weights.
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics cs.LG · 2026-05-21 · unverdicted · none · ref 121 · internal anchor
SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs cs.LG · 2026-05-18 · unverdicted · none · ref 28 · 2 links · internal anchor
S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.
Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs cs.CV · 2026-05-11 · unverdicted · none · ref 34 · internal anchor
Exploiting linear structure in VLM embeddings, a synthetic-data pre-training method yields background-invariant representations that exceed 90% worst-group accuracy on Waterbirds even under 100% spurious correlation with no minority examples in training.
DuetFair: Coupling Inter- and Intra-Subgroup Robustness for Fair Medical Image Segmentation cs.CV · 2026-05-11 · unverdicted · none · ref 10 · internal anchor
DuetFair couples inter-subgroup adaptation with intra-subgroup robustness via FairDRO (dMoE plus subgroup-conditioned DRO) to boost worst-case and equity-scaled performance on medical segmentation benchmarks.
Structure from Strategic Interaction & Uncertainty: Risk Sensitive Games for Robust Preference Learning cs.GT · 2026-05-11 · unverdicted · none · ref 37 · 2 links · internal anchor
Risk-sensitive preference games using convex risk measures produce policies that are robust across data strata and match or exceed standard Nash learning performance without added cost.
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory cs.LG · 2026-05-10 · unverdicted · none · ref 32 · internal anchor
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
Robust Conditional Conformal Prediction via Branched Normalizing Flow cs.LG · 2026-05-03 · unverdicted · none · ref 27 · internal anchor
Branched Normalizing Flow improves conditional coverage robustness of conformal prediction under distribution shift by normalizing test inputs to the calibration distribution and mapping prediction sets back.
Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning cs.LG · 2026-04-29 · unverdicted · none · ref 75 · internal anchor
CHCL aligns a Cheeger-Hodge joint signature across graph augmentations to produce embeddings that remain stable under local structural changes.
Correcting Performance Estimation Bias in Imbalanced Classification with Minority Subconcepts cs.LG · 2026-04-28 · unverdicted · none · ref 22 · internal anchor
The authors introduce predicted-weighted balanced accuracy (pBA), a utility-weighted evaluation metric that uses predicted subconcept posteriors to reduce bias from within-class heterogeneity in imbalanced data.
MGDA-Decoupled: Geometry-Aware Multi-Objective Optimisation for DPO-based LLM Alignment cs.LG · 2026-04-22 · unverdicted · none · ref 65 · internal anchor
MGDA-Decoupled applies geometry-based multi-objective optimization within the DPO framework to find shared descent directions that account for each objective's convergence dynamics, yielding higher win rates on UltraFeedback.
CrossPan: A Comprehensive Benchmark for Cross-Sequence Pancreas MRI Segmentation and Generalization cs.CV · 2026-04-20 · unverdicted · none · ref 7 · internal anchor
CrossPan benchmark shows cross-sequence MRI domain shifts cause pancreas segmentation models to fail catastrophically, establishing sequence generalization as the primary barrier to clinical deployment over center variability or architecture choices.
CrossFlowDG: Bridging the Modality Gap with Cross-modal Flow Matching for Domain Generalization cs.CV · 2026-04-18 · unverdicted · none · ref 36 · internal anchor
CrossFlowDG bridges the modality gap in domain generalization by learning a continuous transformation that moves image embeddings to matching text embeddings using noise-free cross-modal flow matching.
Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization cs.LG · 2026-04-09 · unverdicted · none · ref 16 · internal anchor
RIA uses adversarial exploration of counterfactual graph environments via label-invariant augmentations to improve OoD generalization in graph classification tasks.
Inside-Out: Measuring Generalization in Vision Transformers Through Inner Workings cs.LG · 2026-04-09 · unverdicted · none · ref 61 · internal anchor
Circuit-based metrics from Vision Transformer internals provide better label-free proxies for generalization under distribution shift than existing methods like model confidence.
Visual prompting reimagined: The power of the Activation Prompts cs.CV · 2026-04-07 · unverdicted · none · ref 74 · internal anchor
Activation prompts on intermediate layers outperform input-level visual prompting and parameter-efficient fine-tuning in accuracy and efficiency across 29 datasets.
Robust Learning of Heterogeneous Dynamic Systems stat.ME · 2026-04-07 · unverdicted · none · ref 4 · internal anchor
A distributionally robust ODE learning framework for heterogeneous systems that uses worst-case optimization over convex derivative combinations to produce a stabilized weighted estimator with theoretical guarantees.
Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study cs.CV · 2026-02-17 · unverdicted · none · ref 54 · internal anchor
Benchmark shows that combining data rebalancing with feature disentanglement mitigates shortcut learning more effectively than rebalancing alone in medical imaging models.
Robust Policy Optimization to Prevent Catastrophic Forgetting cs.LG · 2026-02-09 · unverdicted · none · ref 43 · internal anchor
FRPO applies a max-min robust optimization over KL-bounded policy neighborhoods during RLHF to reduce catastrophic forgetting of safety and accuracy under subsequent SFT or RL fine-tuning.
Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding cs.CV · 2025-11-11 · conditional · none · ref 8 · internal anchor
A plug-and-play Anonymizing Adapter Module removes private information from video latent features using self-supervised privacy objectives and consistency losses while retaining utility on action recognition, temporal detection, and anomaly tasks.
Minimax Regret Estimation for Generalizing Heterogeneous Treatment Effects with Multisite Data stat.ME · 2024-12-15 · unverdicted · none · ref 57 · internal anchor
Proposes a minimax-regret framework for learning generalizable CATE models from multisite data by minimizing worst-case regret over convex combinations of site-specific CATEs.
Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging cs.CV · 2026-05-14 · unverdicted · none · ref 180 · internal anchor
A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classification tasks.
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models cs.LG · 2026-05-07 · unverdicted · none · ref 20 · internal anchor
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
A Toolkit for Detecting Spurious Correlations in Speech Datasets cs.SD · 2026-04-29 · unverdicted · none · ref 12 · internal anchor
A toolkit flags spurious correlations in speech datasets by checking if non-speech regions predict the target class better than chance.
Labeled TrustSet Guided: Batch Active Learning with Reinforcement Learning cs.LG · 2026-04-14 · unverdicted · none · ref 9 · internal anchor
BRAL-T uses TrustSet-guided reinforcement learning for batch active learning and reports state-of-the-art results on 10 image classification benchmarks plus 2 fine-tuning tasks.
Robust Deepfake Detection, NTIRE 2026 Challenge: Report cs.CV · 2026-04-27 · unverdicted · none · ref 63 · internal anchor
The NTIRE 2026 challenge finds that large foundation models combined with ensembles and degradation-aware training produce the most robust deepfake detectors.
Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers math.OC · 2026-04-13 · unverdicted · none · ref 110 · internal anchor
A tutorial framing deep learning as a complement to optimization for sequential decision-making under uncertainty, with applications in supply chains, healthcare, and energy.
Right Regions, Wrong Labels: Semantic Label Flips in Segmentation under Correlation Shift cs.CV · 2026-04-14 · unreviewed · ref 5 · internal anchor

Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer