hub Canonical reference

Invariant Risk Minimization

· 2019 · stat.ML · arXiv 1907.02893

Canonical reference. 71% of citing Pith papers cite this work as background.

75 Pith papers citing it

Background 71% of classified citations

open full Pith review browse 75 citing papers arXiv PDF

abstract

We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 12 method 1 other 1

citation-polarity summary

background 10 unclear 3 use method 1

claims ledger

abstract We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.
background Θ ⊆ Rd are convex and compact, and letθ∗ ∈ Θ be a minimizer of the worst-group objectiveR(θ). Then there exists a distributionQ∗ ∈ Q such thatθ∗ ∈ arg minθ Ez∼Q∗[ℓ(θ;z)]. However, this equivalence breaks down when the lossℓ is non-convex: Counterexample 1. Consider a uniform data distributionP supported on two points Z = {z1,z 2}, and letℓ(θ;z) be as in Figure 4, withΘ = [0, 1]. The DRO solutionθ∗ achieves a worst-case loss of R(θ∗) = 0.6. Now consider any weights (w1,w 2) ∈ ∆2 and w.l.o.g. letw

co-cited works

representative citing papers

The Statistical Cost of Adaptation in Multi-Source Transfer Learning

math.ST · 2026-05-10 · unverdicted · novelty 8.0

Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.

I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

I-SAFE uses Wasserstein Coherence Metrics to audit distributional coherence of scientific AI models under structurally guided perturbations, revealing differences among DTI predictors that accuracy metrics miss.

Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations

cs.LG · 2026-05-20 · unverdicted · novelty 7.0

CAML meta-learns a progressively refined inductive bias from active-learning queries to improve robustness to spurious correlations, reporting accuracy gains on minority groups across several benchmarks.

Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

Establishes component-wise identifiability guarantees for partially shared causal latents in multimodal nonlinear mixing and introduces a differentiable Wasserstein-based module for recovery.

Prediction-Intervention Games and Invariant Sets

stat.ML · 2026-05-16 · unverdicted · novelty 7.0

In prediction-intervention games, stable-blanket predictors are at least as good as causal-parent predictors for two classes of follower objectives and can be worst-case optimal under additional conditions.

Continual Learning of Domain-Invariant Representations

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

Introduces replay-based continual learning with sequential invariance alignment to learn domain-invariant representations, outperforming baselines on generalization to unseen domains across six datasets in vision, medicine, manufacturing, and ecology.

TILT: Target-induced loss tilting under covariate shift

cs.LG · 2026-05-14 · conditional · novelty 7.0

TILT adds a target-data penalty on an auxiliary predictor component to induce effective importance weighting for unsupervised domain adaptation under covariate shift.

Separating Shortcut Transition from Cross-Family OOD Failure in a Minimal Model

cs.LG · 2026-05-13 · conditional · novelty 7.0

A minimal model analytically separates shortcut attraction during training from the switch to a shortcut rule and from cross-family out-of-distribution failure.

Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Spectral Gradient Surgery disentangles class-discriminative and domain-specific signals in distribution-matching distilled datasets by analyzing gradient agreement in the spectral domain, yielding better out-of-distribution performance.

Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection

cs.CV · 2026-05-09 · unverdicted · novelty 7.0

A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.

Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning

cs.LG · 2026-05-08 · unverdicted · novelty 7.0

Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.

Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.

eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.

Domain Generalization through Spatial Relation Induction over Visual Primitives

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

PARSE improves domain generalization accuracy by factoring recognition into visual primitives and their spatial relational compositions learned end-to-end with differentiable predicates.

ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction

cs.LG · 2026-05-03 · unverdicted · novelty 7.0

ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.

Robust and Clinically Reliable EEG Biomarkers: A Cross Population Framework for Generalizable Parkinson's Disease Detection

cs.LG · 2026-04-27 · conditional · novelty 7.0

A cross-population framework for EEG Parkinson's detection using exhaustive 75 directional evaluations and nested validation shows asymmetric transfer and accuracy up to 94.1% when training diversity increases, supported by mixture risk theory.

Synthetic Designed Experiments for Diagnosing Vision Model Failure

cs.CV · 2026-03-30 · unverdicted · novelty 7.0

SDRS uses designed experiments and ANOVA decomposition on synthetic data to identify Type I coverage gaps and Type II spurious dependencies in vision models, then generates targeted data to improve performance.

The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter

cs.LG · 2024-11-07 · unverdicted · novelty 7.0

ML researchers assess spurious correlations via four pragmatic frames (relevance, generalizability, human-likeness, harmfulness) rather than a fixed statistical definition.

Towards Context-Invariant Safety Alignment for Large Language Models

cs.CL · 2026-05-20 · unverdicted · novelty 6.0

Introduces AIR, an asymmetric regularization that anchors open-ended safety prompts to verifiable ones via stop-gradient, improving invariance and accuracy when combined with group preference optimization.

S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs

cs.LG · 2026-05-18 · unverdicted · novelty 6.0 · 2 refs

S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.

When Molecular Similarity Works: Property Cliffs Reveal Hidden Errors

cs.LG · 2026-05-17 · unverdicted · novelty 6.0

CliffSplit exposes at least 15% higher errors in cliff-heavy regions of QM9 while CliffLoss narrows the cliff-to-smooth error gap by up to 30% and improves overall MAE by 9.7% across several molecular tasks and backbones.

Rethinking Molecular OOD Generalization via Target-Aware Source Selection

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

SCOPE-BENCH shows state-of-the-art molecular models suffer up to 8x higher errors under extreme OOD, while POMA reduces mean absolute error by up to 11.2% via target-aware source selection and dual-scale adaptation.

Understanding Generalization through Decision Pattern Shift

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

DPS quantifies deviation of per-sample decision patterns from class averages and shows linear correlation with generalization gaps while unifying degradation scenarios into a continuous trajectory.

DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

DeconDTN-Toolkit simulates provenance shifts to expose ERM vulnerabilities and provides tools plus a robust OOD indicator for mitigating confounding by data provenance.

citing papers explorer

Showing 50 of 75 citing papers.

The Statistical Cost of Adaptation in Multi-Source Transfer Learning math.ST · 2026-05-10 · unverdicted · none · ref 210 · internal anchor
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
I-SAFE: Wasserstein Coherence Metrics for Structural Auditing of Scientific AI Models cs.LG · 2026-05-20 · unverdicted · none · ref 3 · internal anchor
I-SAFE uses Wasserstein Coherence Metrics to audit distributional coherence of scientific AI models under structurally guided perturbations, revealing differences among DTI predictors that accuracy metrics miss.
Cumulative Meta-Learning from Active Learning Queries for Robustness to Spurious Correlations cs.LG · 2026-05-20 · unverdicted · none · ref 69 · internal anchor
CAML meta-learns a progressively refined inductive bias from active-learning queries to improve robustness to spurious correlations, reporting accuracy gains on minority groups across several benchmarks.
Identifiable Multimodal Causal Representation Learning under Partial Latent Sharing cs.LG · 2026-05-18 · unverdicted · none · ref 1 · internal anchor
Establishes component-wise identifiability guarantees for partially shared causal latents in multimodal nonlinear mixing and introduces a differentiable Wasserstein-based module for recovery.
Prediction-Intervention Games and Invariant Sets stat.ML · 2026-05-16 · unverdicted · none · ref 1 · internal anchor
In prediction-intervention games, stable-blanket predictors are at least as good as causal-parent predictors for two classes of follower objectives and can be worst-case optimal under additional conditions.
Continual Learning of Domain-Invariant Representations cs.LG · 2026-05-15 · unverdicted · none · ref 1 · internal anchor
Introduces replay-based continual learning with sequential invariance alignment to learn domain-invariant representations, outperforming baselines on generalization to unseen domains across six datasets in vision, medicine, manufacturing, and ecology.
TILT: Target-induced loss tilting under covariate shift cs.LG · 2026-05-14 · conditional · none · ref 118 · internal anchor
TILT adds a target-data penalty on an auxiliary predictor component to induce effective importance weighting for unsupervised domain adaptation under covariate shift.
Separating Shortcut Transition from Cross-Family OOD Failure in a Minimal Model cs.LG · 2026-05-13 · conditional · none · ref 1 · internal anchor
A minimal model analytically separates shortcut attraction during training from the switch to a shortcut rule and from cross-family out-of-distribution failure.
Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation cs.LG · 2026-05-13 · unverdicted · none · ref 28 · internal anchor
Spectral Gradient Surgery disentangles class-discriminative and domain-specific signals in distribution-matching distilled datasets by analyzing gradient agreement in the spectral domain, yielding better out-of-distribution performance.
Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection cs.CV · 2026-05-09 · unverdicted · none · ref 28 · internal anchor
A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.
Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning cs.LG · 2026-05-08 · unverdicted · none · ref 30 · internal anchor
Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new SOTA on DomainBed.
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study cs.CV · 2026-05-07 · unverdicted · none · ref 1 · internal anchor
A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or missing modalities.
eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts cs.CV · 2026-05-07 · unverdicted · none · ref 3 · internal anchor
eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.
Domain Generalization through Spatial Relation Induction over Visual Primitives cs.CV · 2026-05-07 · unverdicted · none · ref 1 · internal anchor
PARSE improves domain generalization accuracy by factoring recognition into visual primitives and their spatial relational compositions learned end-to-end with differentiable predicates.
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction cs.LG · 2026-05-03 · unverdicted · none · ref 4 · internal anchor
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
Robust and Clinically Reliable EEG Biomarkers: A Cross Population Framework for Generalizable Parkinson's Disease Detection cs.LG · 2026-04-27 · conditional · none · ref 54 · internal anchor
A cross-population framework for EEG Parkinson's detection using exhaustive 75 directional evaluations and nested validation shows asymmetric transfer and accuracy up to 94.1% when training diversity increases, supported by mixture risk theory.
Synthetic Designed Experiments for Diagnosing Vision Model Failure cs.CV · 2026-03-30 · unverdicted · none · ref 2 · internal anchor
SDRS uses designed experiments and ANOVA decomposition on synthetic data to identify Type I coverage gaps and Type II spurious dependencies in vision models, then generates targeted data to improve performance.
The Pragmatic Frames of Spurious Correlations in Machine Learning: Interpreting How and Why They Matter cs.LG · 2024-11-07 · unverdicted · none · ref 1 · internal anchor
ML researchers assess spurious correlations via four pragmatic frames (relevance, generalizability, human-likeness, harmfulness) rather than a fixed statistical definition.
Towards Context-Invariant Safety Alignment for Large Language Models cs.CL · 2026-05-20 · unverdicted · none · ref 69 · internal anchor
Introduces AIR, an asymmetric regularization that anchors open-ended safety prompts to verifiable ones via stop-gradient, improving invariance and accuracy when combined with group preference optimization.
S2Aligner: Pair-Efficient and Transferable Pre-Training for Sparse Text-Attributed Graphs cs.LG · 2026-05-18 · unverdicted · none · ref 1 · 2 links · internal anchor
S2Aligner decouples semantic and structural components in LLM-as-Aligner pre-training for sparse TAGs and uses structure-oriented reconstruction plus domain risk balancing to improve transferability and reduce generalization gaps.
When Molecular Similarity Works: Property Cliffs Reveal Hidden Errors cs.LG · 2026-05-17 · unverdicted · none · ref 2 · internal anchor
CliffSplit exposes at least 15% higher errors in cliff-heavy regions of QM9 while CliffLoss narrows the cliff-to-smooth error gap by up to 30% and improves overall MAE by 9.7% across several molecular tasks and backbones.
Rethinking Molecular OOD Generalization via Target-Aware Source Selection cs.LG · 2026-05-13 · unverdicted · none · ref 1 · internal anchor
SCOPE-BENCH shows state-of-the-art molecular models suffer up to 8x higher errors under extreme OOD, while POMA reduces mean absolute error by up to 11.2% via target-aware source selection and dual-scale adaptation.
Understanding Generalization through Decision Pattern Shift cs.LG · 2026-05-13 · unverdicted · none · ref 63 · internal anchor
DPS quantifies deviation of per-sample decision patterns from class averages and shows linear correlation with generalization gaps while unifying degradation scenarios into a continuous trajectory.
DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift cs.LG · 2026-05-11 · unverdicted · none · ref 55 · internal anchor
DeconDTN-Toolkit simulates provenance shifts to expose ERM vulnerabilities and provides tools plus a robust OOD indicator for mitigating confounding by data provenance.
Intervention-Based Time Series Causal Discovery via Simulator-Generated Interventional Distributions cs.LG · 2026-05-11 · unverdicted · none · ref 139 · internal anchor
SVAR-FM uses simulator clamping to produce interventional distributions and flow matching to identify time series causal structures, with an error bound that predicts sign reversal of causal effects below a simulator accuracy threshold.
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory cs.LG · 2026-05-10 · unverdicted · none · ref 2 · internal anchor
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators cs.AI · 2026-05-09 · unverdicted · none · ref 52 · internal anchor
CauSim turns scarce causal reasoning labels into scalable supervised data by having LLMs incrementally construct complex executable structural causal models.
Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability cs.LG · 2026-05-08 · unverdicted · none · ref 25 · internal anchor
EEG model predictions on the same brain signals flip for up to 42% of trials under different preprocessing choices, with new tools introduced to measure and mitigate the resulting instability.
Anatomy of a failure: When, how, and why deep vision fails in scientific domains cs.CV · 2026-05-05 · unverdicted · none · ref 103 · internal anchor
Deep learning on information-rich scientific images collapses to one-dimensional predictions due to a mismatch between data priors and the model's simplicity bias, even after robustification techniques.
Learning to Theorize the World from Observation cs.LG · 2026-05-05 · unverdicted · none · ref 231 · internal anchor
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
Attribution-Guided Masking for Robust Cross-Domain Sentiment Classification cs.LG · 2026-05-04 · unverdicted · none · ref 1 · internal anchor
AGM adds a gradient-based masking loss during fine-tuning to suppress reliance on spurious tokens, achieving competitive zero-shot transfer on sentiment tasks while providing token-level interpretability.
Deciphering Shortcut Learning from an Evolutionary Game Theory Perspective cs.AI · 2026-05-04 · unverdicted · none · ref 15 · internal anchor
Evolutionary game theory shows gradient descent and stochastic gradient descent drive neural networks to distinct stable states favoring shortcut or core subnetworks, with data and optimization noise shaping shortcut bias formation.
Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning cs.LG · 2026-04-29 · unverdicted · none · ref 73 · internal anchor
CHCL aligns a Cheeger-Hodge joint signature across graph augmentations to produce embeddings that remain stable under local structural changes.
Robust Representation Learning through Explicit Environment Modeling stat.ML · 2026-04-28 · unverdicted · none · ref 2 · internal anchor
Explicitly modeling and marginalizing environment variation via generalized random-intercept models produces representations that support robust average prediction across unseen environments and outperform invariant-learning methods in challenging distribution-shift settings.
Bayesian Environment Invariant Regression stat.ME · 2026-04-28 · unverdicted · none · ref 2 · internal anchor
A Bayesian spike-and-slab model separates invariant regression mechanisms from environment-specific associations, with proven selection consistency and posterior contraction under a working model.
Deep sprite-based image models: An analysis cs.CV · 2026-04-21 · unverdicted · none · ref 1 · internal anchor
A deep sprite-based image decomposition method matches SOTA unsupervised class-aware segmentation on CLEVR, scales linearly with objects, explicitly identifies categories, and fully models images interpretably.
Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization cs.LG · 2026-04-09 · unverdicted · none · ref 3 · internal anchor
RIA uses adversarial exploration of counterfactual graph environments via label-invariant augmentations to improve OoD generalization in graph classification tasks.
Learning Stable Predictors from Weak Supervision under Distribution Shift cs.LG · 2026-04-05 · conditional · none · ref 1 · 2 links · internal anchor
Weak supervision supports in-domain prediction of guide efficacy in CRISPR-Cas13d data but collapses under temporal shifts due to changing feature-label associations, while cross-cell-line transfer remains partial.
Mitigating Shortcut Learning via Feature Disentanglement in Medical Imaging: A Benchmark Study cs.CV · 2026-02-17 · unverdicted · none · ref 4 · internal anchor
Benchmark shows that combining data rebalancing with feature disentanglement mitigates shortcut learning more effectively than rebalancing alone in medical imaging models.
Tracing Moral Foundations in Large Language Models cs.CL · 2026-01-09 · unverdicted · none · ref 1 · 2 links · internal anchor
LLMs encode moral foundations in human-aligned, layered representations that arise from pretraining and can be steered via dense vectors or sparse SAE features.
Invariant Feature Extraction Through Conditional Independence and the Optimal Transport Barycenter Problem: the Gaussian case math.ST · 2025-12-24 · unverdicted · none · ref 4 · internal anchor
In the Gaussian case, invariant features predicting Y independent of confounders Z are given by the top d eigenvectors of a matrix derived from the optimal transport barycenter of Z given Y.
The Impact of Off-Policy Training Data on Probe Generalisation cs.AI · 2025-11-21 · unverdicted · none · ref 2 · internal anchor
Off-policy training data for LLM behavior probes causes significant generalization failures especially for intent-based behaviors like deception, and performance on coerced incentivised data correlates with real on-policy success.
Downgrade to Upgrade: Optimizer Simplification Enhances Robustness in LLM Unlearning cs.LG · 2025-10-01 · conditional · none · ref 20 · internal anchor
Downgrading optimizers to lower-information variants during LLM unlearning yields more robust forgetting on MUSE and WMDP benchmarks by converging to harder-to-perturb loss basins.
Franca: Nested Matryoshka Clustering for Scalable Visual Representation Learning cs.CV · 2025-07-18 · conditional · none · ref 45 · internal anchor
Franca introduces nested Matryoshka clustering and positional disentanglement in a transparent SSL pipeline to deliver open-source vision models competitive with closed proprietary systems.
Doubly robust identification of treatment effects from multiple environments stat.ML · 2025-03-18 · unverdicted · none · ref 3 · internal anchor
RAMEN identifies treatment effects from multiple environments in a doubly robust manner by leveraging data heterogeneity without requiring the causal graph.
TabICL: A Tabular Foundation Model for In-Context Learning on Large Data cs.LG · 2025-02-08 · unverdicted · none · ref 14 · internal anchor
TabICL scales in-context learning to large tabular data via column-then-row attention for row embeddings followed by a transformer, matching TabPFNv2 speed and performance while outperforming it and CatBoost on datasets over 10K samples.
Discrepancies are Virtue: Weak-to-Strong Generalization through Lens of Intrinsic Dimension cs.LG · 2025-02-07 · unverdicted · none · ref 2 · internal anchor
In ridgeless regression with low intrinsic dimension, discrepancy between weak and strong models reduces W2S generalization variance by dim(V_s)/N in the discrepant subspace while inheriting it in the overlap.
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks cs.LG · 2023-10-05 · accept · none · ref 62 · internal anchor
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
On the Opportunities and Risks of Foundation Models cs.LG · 2021-08-16 · accept · none · ref 1 · internal anchor
Foundation models are large adaptable AI systems with emergent capabilities that offer broad opportunities but carry risks from homogenization, opacity, and inherited defects across downstream applications.
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization cs.LG · 2019-11-20 · conditional · none · ref 1 · internal anchor
Increased regularization is required for group DRO to achieve good worst-group generalization in overparameterized neural networks.

Invariant Risk Minimization

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer