Recognition: 2 theorem links
· Lean TheoremInvariant Risk Minimization
Pith reviewed 2026-05-12 06:25 UTC · model grok-4.3
The pith
Invariant Risk Minimization finds a data representation where the same classifier is optimal for every training distribution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Invariant Risk Minimization (IRM) learns a representation such that the optimal linear classifier on top of that representation is identical across all training environments. This is achieved by jointly minimizing the average risk while adding a penalty that forces the gradient of each environment's risk with respect to the classifier parameters to vanish at the shared optimum. The resulting invariant features correspond to the causal factors that govern the label in the underlying data-generating process, enabling generalization to environments not seen during training.
What carries the argument
The IRM penalty term that requires the gradient of the risk with respect to a fixed classifier to be zero in every environment, thereby enforcing that the same predictor is optimal everywhere.
Load-bearing premise
The observed environments must share the same causal mechanisms that determine the label while differing only in the distributions of non-causal variables.
What would settle it
A controlled experiment on synthetic data with known causal graph where IRM is shown to recover exactly the causal features (or fails to do so) when the environments are generated by intervening only on non-causal variables.
read the original abstract
We introduce Invariant Risk Minimization (IRM), a learning paradigm to estimate invariant correlations across multiple training distributions. To achieve this goal, IRM learns a data representation such that the optimal classifier, on top of that data representation, matches for all training distributions. Through theory and experiments, we show how the invariances learned by IRM relate to the causal structures governing the data and enable out-of-distribution generalization.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Invariant Risk Minimization (IRM), a learning paradigm that estimates a data representation such that the optimal classifier on top of this representation is the same across multiple training distributions. It claims through theory and experiments that the learned invariances correspond to causal structures governing the data and enable out-of-distribution generalization.
Significance. If the central claims hold, this work offers a principled objective for learning predictors that exploit invariance across environments to achieve robust OOD performance, with a direct link to identifying causal features. This is significant for bridging empirical risk minimization with causal inference in non-i.i.d. settings, and the reproducible experimental protocols and parameter-free aspects of the formulation (where applicable) strengthen its potential impact.
major comments (2)
- [§4] §4: The theoretical equivalence showing that IRM recovers causal parents is derived only for linear structural causal models with additive noise and a fixed number of environments; the proof relies on linearity of the representation and identifiability of the shared optimal w. No uniqueness result is given for non-linear feature maps or general non-linear SCMs, so the broader claim that invariances learned by IRM relate to causal structures does not follow in full generality.
- [Eq. (3)] Eq. (3): The practical IRM objective (with the gradient penalty at w=1) enforces only a first-order stationarity condition under the linear classifier assumption. The manuscript does not show that this approximation identifies causal features or guarantees OOD generalization when the representation or SCM is non-linear, which is load-bearing for the central claim.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly qualify the scope of the theoretical results to linear cases to avoid overstatement of the causal connection.
- Experimental sections would benefit from additional details on environment construction and sensitivity to the penalty hyperparameter to aid reproducibility.
Simulated Author's Rebuttal
We thank the referee for the careful reading and insightful comments on our manuscript. We address each major comment below and will incorporate clarifications to better delineate the scope of our theoretical and practical results.
read point-by-point responses
-
Referee: [§4] The theoretical equivalence showing that IRM recovers causal parents is derived only for linear structural causal models with additive noise and a fixed number of environments; the proof relies on linearity of the representation and identifiability of the shared optimal w. No uniqueness result is given for non-linear feature maps or general non-linear SCMs, so the broader claim that invariances learned by IRM relate to causal structures does not follow in full generality.
Authors: We agree that the equivalence result in Section 4 is derived under the specific assumptions of linear structural causal models with additive noise and a fixed number of environments, relying on the linearity of the representation and the identifiability of the shared optimal classifier weights. The manuscript does not provide a uniqueness result for non-linear feature maps or general non-linear SCMs. The broader statements linking invariances to causal structures are presented as holding under these assumptions, with supporting experimental evidence in more general settings. We will revise Section 4, the abstract, and related discussion to explicitly state the assumptions and note that extensions to non-linear cases remain an open direction. revision: partial
-
Referee: [Eq. (3)] The practical IRM objective (with the gradient penalty at w=1) enforces only a first-order stationarity condition under the linear classifier assumption. The manuscript does not show that this approximation identifies causal features or guarantees OOD generalization when the representation or SCM is non-linear, which is load-bearing for the central claim.
Authors: The practical objective in Equation (3) uses a gradient penalty (evaluated at w=1) to enforce the invariance condition, which is exact under the linear classifier assumption but reduces to a first-order stationarity condition more generally. We do not provide a proof that this approximation identifies causal features or guarantees OOD generalization for non-linear representations or SCMs. The formulation is motivated by the linear theory, and our experiments demonstrate improved OOD performance in non-linear regimes. We will add a clarifying discussion of the approximation's nature and limitations in the revised manuscript. revision: partial
Circularity Check
No significant circularity in IRM derivation chain
full rationale
The core IRM definition (a representation Φ such that argmin_w R^e(w ∘ Φ) is identical across environments e) is stated directly from the multi-environment setup and does not reduce to any fitted target quantity or self-referential loop. Section 4 derives the link to causal parents only under explicit linear SCM + additive noise assumptions; this is a one-directional implication proved from the SCM, not a tautology or renaming of the input risks. The practical objective (Eq. 3 with gradient penalty) is an explicit relaxation of the definition, not a statistical fit called a prediction. No load-bearing self-citation or ansatz smuggling is present; the derivation remains self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multiple training distributions share the same causal mechanisms but differ in non-causal aspects.
Forward citations
Cited by 47 Pith papers
-
The Statistical Cost of Adaptation in Multi-Source Transfer Learning
Multi-source transfer learning incurs an intrinsic adaptation cost that can exceed one, with phase transitions separating regimes where bias-agnostic estimators match oracle performance from those where they cannot.
-
TILT: Target-induced loss tilting under covariate shift
TILT adds a target-data penalty on an auxiliary predictor component to induce effective importance weighting for unsupervised domain adaptation under covariate shift.
-
Separating Shortcut Transition from Cross-Family OOD Failure in a Minimal Model
A minimal model analytically separates shortcut attraction during training from the switch to a shortcut rule and from cross-family out-of-distribution failure.
-
Privacy-Aware Video Anomaly Detection through Orthogonal Subspace Projection
A new orthogonal projection module for video anomaly detection suppresses facial attributes via weak face-presence signals and cosine alignment while preserving anomaly-relevant features like pose and motion.
-
Flatness and Gradient Alignment Are Both Necessary: Spectral-Aware Gradient-Aligned Exploration for Multi-Distribution Learning
Excess risk decomposes into independent alignment (trace of inverse average Hessian times gradient covariance) and curvature terms, so both flatness and gradient alignment are required; SAGE achieves this and sets new...
-
Are We Making Progress in Multimodal Domain Generalization? A Comprehensive Benchmark Study
A large-scale benchmark finds that recent multimodal domain generalization methods give only marginal gains over a plain ERM baseline, with no method winning consistently and all degrading sharply under corruption or ...
-
eXplaining to Learn (eX2L): Regularization Using Contrastive Visual Explanation Pairs for Distribution Shifts
eX2L improves robustness to distribution shifts by penalizing similarity between Grad-CAM maps of a label classifier and a confounder classifier, reaching new SOTA average and worst-group accuracy on the Spawrious benchmark.
-
Domain Generalization through Spatial Relation Induction over Visual Primitives
PARSE improves domain generalization accuracy by factoring recognition into visual primitives and their spatial relational compositions learned end-to-end with differentiable predicates.
-
ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection
ScriptHOI decomposes HOI phrases into state slots and uses script coverage, conflict, interval partial-label learning, and counterfactual contrast to improve rare and unseen interaction detection while cutting afforda...
-
ISAAC: Auditing Causal Reasoning in Deep Models for Drug-Target Interaction
ISAAC auditing applied to three DTI models on the Davis benchmark finds 25% relative differences in causal reasoning scores despite nearly identical AUROC values.
-
Robust and Clinically Reliable EEG Biomarkers: A Cross Population Framework for Generalizable Parkinson's Disease Detection
A cross-population framework for EEG Parkinson's detection using exhaustive 75 directional evaluations and nested validation shows asymmetric transfer and accuracy up to 94.1% when training diversity increases, suppor...
-
Synthetic Designed Experiments for Diagnosing Vision Model Failure
SDRS uses designed experiments and ANOVA decomposition on synthetic data to identify Type I coverage gaps and Type II spurious dependencies in vision models, then generates targeted data to improve performance.
-
Rethinking Molecular OOD Generalization via Target-Aware Source Selection
SCOPE-BENCH shows state-of-the-art molecular models suffer up to 8x higher errors under extreme OOD, while POMA reduces mean absolute error by up to 11.2% via target-aware source selection and dual-scale adaptation.
-
Understanding Generalization through Decision Pattern Shift
DPS quantifies deviation of per-sample decision patterns from class averages and shows linear correlation with generalization gaps while unifying degradation scenarios into a continuous trajectory.
-
DeconDTN-Toolkit: A Library for Evaluation and Enhancement of Robustness to Provenance Shift
DeconDTN-Toolkit simulates provenance shifts to expose ERM vulnerabilities and provides tools plus a robust OOD indicator for mitigating confounding by data provenance.
-
Spurious Correlation Learning in Preference Optimization: Mechanisms, Consequences, and Mitigation via Tie Training
Standard preference learning induces spurious feature reliance via mean bias and correlation leakage, creating irreducible distribution shift vulnerabilities that tie training mitigates without degrading causal learning.
-
Intervention-Based Time Series Causal Discovery via Simulator-Generated Interventional Distributions
SVAR-FM uses simulator clamping to produce interventional distributions and flow matching to identify time series causal structures, with an error bound that predicts sign reversal of causal effects below a simulator ...
-
The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory
Agentic memory improves clean reasoning but worsens performance when spurious patterns are present in stored trajectories; CAMEL calibration reduces this reliance while preserving clean performance.
-
CauSim: Scaling Causal Reasoning with Increasingly Complex Causal Simulators
CauSim turns scarce causal reasoning labels into scalable supervised data by having LLMs incrementally construct complex executable structural causal models.
-
TopoGeoScore: A Self-Supervised Source-Only Geometric Framework for OOD Checkpoint Selection
TopoGeoScore combines a torsion-inspired Laplacian log-determinant, Ollivier-Ricci curvature, and higher-order topological summaries from source embeddings, with weights learned via self-supervised invariance to geome...
-
Same Brain, Different Prediction: How Preprocessing Choices Undermine EEG Decoding Reliability
EEG model predictions on the same brain signals flip for up to 42% of trials under different preprocessing choices, with new tools introduced to measure and mitigate the resulting instability.
-
ScriptHOI: Learning Scripted State Transitions for Open-Vocabulary Human-Object Interaction Detection
ScriptHOI improves rare and unseen HOI recognition by decomposing phrases into state slots, using visual tokenization and slot-wise matching for script coverage and conflict to calibrate predictions and constrain trai...
-
Anatomy of a failure: When, how, and why deep vision fails in scientific domains
Deep learning on information-rich scientific images collapses to one-dimensional predictions due to a mismatch between data priors and the model's simplicity bias, even after robustification techniques.
-
Learning to Theorize the World from Observation
NEO induces compositional latent programs as world theories from observations and executes them to enable explanation-driven generalization.
-
Attribution-Guided Masking for Robust Cross-Domain Sentiment Classification
AGM adds a gradient-based masking loss during fine-tuning to suppress reliance on spurious tokens, achieving competitive zero-shot transfer on sentiment tasks while providing token-level interpretability.
-
Deciphering Shortcut Learning from an Evolutionary Game Theory Perspective
Evolutionary game theory shows gradient descent and stochastic gradient descent drive neural networks to distinct stable states favoring shortcut or core subnetworks, with data and optimization noise shaping shortcut ...
-
Cheeger--Hodge Contrastive Learning for Structurally Robust Graph Representation Learning
CHCL aligns a Cheeger-Hodge joint signature across graph augmentations to produce embeddings that remain stable under local structural changes.
-
Robust Representation Learning through Explicit Environment Modeling
Explicitly modeling and marginalizing environment variation via generalized random-intercept models produces representations that support robust average prediction across unseen environments and outperform invariant-l...
-
Bayesian Environment Invariant Regression
A Bayesian spike-and-slab model separates invariant regression mechanisms from environment-specific associations, with proven selection consistency and posterior contraction under a working model.
-
Deep sprite-based image models: An analysis
A deep sprite-based image decomposition method matches SOTA unsupervised class-aware segmentation on CLEVR, scales linearly with objects, explicitly identifies categories, and fully models images interpretably.
-
Adversarial Label Invariant Graph Data Augmentations for Out-of-Distribution Generalization
RIA uses adversarial exploration of counterfactual graph environments via label-invariant augmentations to improve OoD generalization in graph classification tasks.
-
Learning Stable Predictors from Weak Supervision under Distribution Shift
Weak supervision supports in-domain learning for CRISPR transcriptomic perturbations but temporal shifts cause negative R-squared and near-zero correlation across linear and tree models, unlike partial cell-line transfer.
-
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks
SmoothLLM mitigates jailbreaking attacks on LLMs by randomly perturbing multiple copies of a prompt at the character level and aggregating the outputs to detect adversarial inputs.
-
On the Opportunities and Risks of Foundation Models
Foundation models are large adaptable AI systems with emergent capabilities that offer broad opportunities but carry risks from homogenization, opacity, and inherited defects across downstream applications.
-
Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization
Increased regularization is required for group DRO to achieve good worst-group generalization in overparameterized neural networks.
-
Beyond Instance-Level Self-Supervision in 3D Multi-Modal Medical Imaging
A self-supervised approach uses consistent spatial relationships of anatomical structures across patients to improve 3D multi-modal medical image representations, yielding modest gains on segmentation and classificati...
-
Causal Parametric Drift Simulation: A Digital Twin Framework for Classifier Robustness Evaluation
A framework using structural causal models simulates parametric drifts to evaluate classifier robustness more realistically than static tests or noise perturbations.
-
Agentic AIs Are the Missing Paradigm for Out-of-Distribution Generalization in Foundation Models
Agentic AI systems are required to overcome the parameter coverage ceiling that prevents foundation models from handling certain out-of-distribution cases.
-
When Brain Networks Travel: Learning Beyond Site
CORE decouples site confounders in fMRI networks, profiles transient dynamics on a population scaffold using line graphs, and applies subject-adaptive gating to achieve up to 6.7% better cross-site generalization on A...
-
MER-DG: Modality-Entropy Regularization for Multimodal Domain Generalization
MER-DG applies modality-entropy regularization to reduce fusion overfitting in multimodal domain generalization, reporting average gains of 5% over standard fusion and 2% over prior methods on EPIC-Kitchens and HAC be...
-
Dreaming Across Towns: Semantic Rollout and Town-Adversarial Regularization for Zero-Shot Held-Out-Town Fixed-Route Driving in CARLA
Semantic rollout prediction plus town-adversarial regularization on a Dreamer agent raises mean zero-shot success rate for fixed-route driving across held-out CARLA towns under fixed weather and no traffic.
-
Asynchronous Federated Unlearning with Invariance Calibration for Medical Imaging
AFU-IC decouples client unlearning from global federated training in medical imaging and adds server-side invariance calibration to prevent relearning of erased data.
-
Sensitivity Uncertainty Alignment in Large Language Models
SUA measures the gap between how much an LLM's output changes under perturbations and how uncertain the model claims to be, with a training procedure to reduce that gap.
-
Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities
Introduces MAF framework and DeepModal-Bench to capture universal cross-modal forgery traces for better generalization in multimodal deepfake detection.
-
Why Invariance is Not Enough for Biomedical Domain Generalization and How to Fix It
MaskGen improves domain generalization for biomedical image segmentation by using source intensities plus domain-stable foundation model representations with minimal added complexity.
-
Investigating Data Interventions for Subgroup Fairness: An ICU Case Study
Data addition from different sources does not reliably boost subgroup fairness in ICU models and often requires post-hoc calibration to work.
-
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Offline RL promises to extract high-utility policies from static datasets but faces fundamental challenges that current methods only partially address.
Reference graph
Works this paper leans on
- [1]
-
[2]
James Andrew Bagnell. Robust supervised learning. In AAAI, 2005
work page 2005
-
[3]
Peter L. Bartlett, Philip M. Long, G´ abor Lugosi, and Alexander Tsigler. Benign Overfitting in Linear Regression. arXiv, 2019
work page 2019
-
[4]
Recognition in terra incognita
Sara Beery, Grant Van Horn, and Pietro Perona. Recognition in terra incognita. In ECCV, 2018
work page 2018
-
[5]
Analysis of representations for domain adaptation
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In NIPS. 2007
work page 2007
-
[6]
Aharon Ben-Tal, Laurent El Ghaoui, and Arkadi Nemirovski. Robust optimiza- tion. Princeton University Press, 2009
work page 2009
-
[7]
A meta- transfer objective for learning to disentangle causal mechanisms
Yoshua Bengio, Tristan Deleu, Nasim Rahaman, Rosemary Ke, S´ ebastien Lachapelle, Olexa Bilaniuk, Anirudh Goyal, and Christopher Pal. A meta- transfer objective for learning to disentangle causal mechanisms. arXiv, 2019
work page 2019
-
[8]
Denker, Harris Drucker, Isabelle Guyon, Lawrence D
L´ eon Bottou, Corinna Cortes, John S. Denker, Harris Drucker, Isabelle Guyon, Lawrence D. Jackel, Yann Le Cun, Urs A. Muller, Eduard S¨ ackinger, Patrice Simard, and Vladimir Vapnik. Comparison of classifier methods: a case study in handwritten digit recognition. In ICPR, 1994
work page 1994
-
[9]
Approximating CNNs with bag-of-local- features models works surprisingly well on imagenet
Wieland Brendel and Matthias Bethge. Approximating CNNs with bag-of-local- features models works surprisingly well on imagenet. In ICLR, 2019
work page 2019
-
[10]
Invariant scattering convolution networks
Joan Bruna and Stephane Mallat. Invariant scattering convolution networks. TPAMI, 2013
work page 2013
-
[11]
In- termittent process analysis with scattering moments
Joan Bruna, Stephane Mallat, Emmanuel Bacry, and Jean-Franois Muzy. In- termittent process analysis with scattering moments. The Annals of Statistics , 2015
work page 2015
-
[12]
Two theorems on invariance and causality
Nancy Cartwright. Two theorems on invariance and causality. Philosophy of Science, 2003. 23
work page 2003
-
[13]
Patricia W. Cheng and Hongjing Lu. Causal invariance as an essential constraint for creating a causal representation of the world. The Oxford handbook of causal reasoning, 2017
work page 2017
-
[14]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL, 2019
work page 2019
-
[15]
Statistics of robust optimization: A generalized empirical likelihood approach
John Duchi, Peter Glynn, and Hongseok Namkoong. Statistics of robust optimization: A generalized empirical likelihood approach. arXiv, 2016
work page 2016
-
[16]
Domain- adversarial training of neural networks
Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Fran¸ cois Laviolette, Mario March, and Victor Lempitsky. Domain- adversarial training of neural networks. JMLR, 2016
work page 2016
-
[17]
Robert Geirhos, Patricia Rubisch, Claudio Michaelis, Matthias Bethge, Felix A. Wichmann, and Wieland Brendel. Imagenet-trained cnns are biased towards texture; increasing shape bias improves accuracy and robustness. ICLR, 2019
work page 2019
-
[18]
Learning causal structures using regression invariance
AmirEmad Ghassami, Saber Salehkaleybar, Negar Kiyavash, and Kun Zhang. Learning causal structures using regression invariance. In NIPS, 2017
work page 2017
-
[19]
Patrick J. Grother. NIST Special Database 19: Handprinted forms and char- acters database. https://www.nist.gov/srd/nist-special-database-19 ,
-
[20]
File doc/doc.ps in the 1995 NIST CD ROM NIST Special Database 19
work page 1995
-
[21]
The probability approach in econometrics
Trygve Haavelmo. The probability approach in econometrics. Econometrica: Journal of the Econometric Society , 1944
work page 1944
-
[22]
Conditional variance penalties and domain shift robustness
Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. arXiv, 2017
work page 2017
-
[23]
Invariant causal prediction for nonlinear models
Christina Heinze-Deml, Jonas Peters, and Nicolai Meinshausen. Invariant causal prediction for nonlinear models. Journal of Causal Inference , 2018
work page 2018
-
[24]
Revisiting visual question answering baselines
Allan Jabri, Armand Joulin, and Laurens Van Der Maaten. Revisiting visual question answering baselines. In ECCV, 2016
work page 2016
-
[25]
Fredrik D. Johansson, David A. Sontag, and Rajesh Ranganath. Support and invertibility in domain-invariant representations. AISTATS, 2019
work page 2019
-
[26]
General- ization in anti-causal learning
Niki Kilbertus, Giambattista Parascandolo, and Bernhard Sch¨ olkopf. General- ization in anti-causal learning. arXiv, 2018
work page 2018
-
[27]
Stable prediction across unknown environments
Kun Kuang, Peng Cui, Susan Athey, Ruoxuan Xiong, and Bo Li. Stable prediction across unknown environments. In SIGKDD, 2018
work page 2018
-
[28]
Brenden M. Lake, Tomer D. Ullman, Joshua B Tenenbaum, and Samuel J. Gershman. Building machines that learn and think like people. Behavioral and brain sciences, 2017. 24
work page 2017
-
[29]
James M. Lee. Introduction to Smooth Manifolds . Springer, 2003
work page 2003
- [30]
-
[31]
Deep domain generalization via conditional invariant adver- sarial networks
Ya Li, Xinmei Tian, Mingming Gong, Yajing Liu, Tongliang Liu, Kun Zhang, and Dacheng Tao. Deep domain generalization via conditional invariant adver- sarial networks. In ECCV, 2018
work page 2018
-
[32]
David Lopez-Paz. From dependence to causation. PhD thesis, University of Cambridge, 2016
work page 2016
-
[33]
Discovering causal signals in images
David Lopez-Paz, Robert Nishihara, Soumith Chintala, Bernhard Scholkopf, and L´ eon Bottou. Discovering causal signals in images. In CVPR, 2017
work page 2017
-
[34]
Learning to pivot with adversarial networks
Gilles Louppe, Michael Kagan, and Kyle Cranmer. Learning to pivot with adversarial networks. In Advances in neural information processing systems , pages 981–990, 2017
work page 2017
-
[35]
Domain adaptation by using causal inference to predict invariant conditional distributions
Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris M Mooij. Domain adaptation by using causal inference to predict invariant conditional distributions. In NIPS, 2018
work page 2018
-
[36]
Deep learning: A critical appraisal
Gary Marcus. Deep learning: A critical appraisal. arXiv, 2018
work page 2018
-
[37]
Causality from a distributional robustness point of view
Nicolai Meinshausen. Causality from a distributional robustness point of view. In Data Science Workshop (DSW) , 2018
work page 2018
-
[38]
Maximin effects in inhomogeneous large-scale data
Nicolai Meinshausen and Peter B¨ uhlmann. Maximin effects in inhomogeneous large-scale data. The Annals of Statistics , 2015
work page 2015
- [39]
-
[40]
Causality: Models, Reasoning, and Inference
Judea Pearl. Causality: Models, Reasoning, and Inference. Cambridge University Press, 2nd edition, 2009
work page 2009
-
[41]
Causal inference using invariant prediction: identification and confidence intervals
Jonas Peters, Peter B¨ uhlmann, and Nicolai Meinshausen. Causal inference using invariant prediction: identification and confidence intervals. JRSS B , 2016
work page 2016
-
[42]
Jonas Peters, Dominik Janzing, and Bernhard Sch¨ olkopf.Elements of causal inference: foundations and learning algorithms . MIT press, 2017
work page 2017
-
[43]
Incompleteness, non locality and realism
Michael Redhead. Incompleteness, non locality and realism. a prolegomenon to the philosophy of quantum mechanics. 1987
work page 1987
-
[44]
Invariant models for causal transfer learning
Mateo Rojas-Carulla, Bernhard Sch¨ olkopf, Richard Turner, and Jonas Peters. Invariant models for causal transfer learning. JMLR, 2018
work page 2018
-
[45]
Donald B. Rubin. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology , 1974. 25
work page 1974
-
[46]
On causal and anticausal learning
Bernhard Sch¨ olkopf, Dominik Janzing, Jonas Peters, Eleni Sgouritsa, Kun Zhang, and Joris Mooij. On causal and anticausal learning. In ICML, 2012
work page 2012
-
[47]
Certifying some distribu- tional robustness with principled adversarial training
Aman Sinha, Hongseok Namkoong, and John Duchi. Certifying some distribu- tional robustness with principled adversarial training. ICLR, 2018
work page 2018
-
[48]
Causal necessity: a pragmatic investigation of the necessity of laws
Brian Skyrms. Causal necessity: a pragmatic investigation of the necessity of laws. Yale University Press, 1980
work page 1980
-
[49]
Bob L. Sturm. A simple method to determine if a music information retrieval system is a “horse”. IEEE Transactions on Multimedia , 2014
work page 2014
-
[50]
Antonio Torralba and Alexei Efros. Unbiased look at dataset bias. In CVPR, 2011
work page 2011
-
[51]
Principles of risk minimization for learning theory
Vladimir Vapnik. Principles of risk minimization for learning theory. In NIPS. 1992
work page 1992
-
[52]
Vladimir N. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998
work page 1998
-
[53]
Do we still need models or just more data and compute?, 2019
Max Welling. Do we still need models or just more data and compute?, 2019
work page 2019
-
[54]
Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro, and Benjamin Recht
Ashia C. Wilson, Rebecca Roelofs, Mitchell Stern, Nati Srebro, and Benjamin Recht. The marginal value of adaptive gradient methods in machine learning. In NIPS. 2017
work page 2017
-
[55]
Making things happen: A theory of causal explanation
James Woodward. Making things happen: A theory of causal explanation . Oxford university press, 2005
work page 2005
-
[56]
Sewall Wright. Correlation and causation. Journal of agricultural research , 1921
work page 1921
-
[57]
Understanding deep learning requires rethinking generalization
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, and Oriol Vinyals. Understanding deep learning requires rethinking generalization. ICLR, 2016. 26 A Additional theorems Theorem 10. Let Σe X,X := EX e[XeXe⊤] ∈ Sd×d + , with Sd×d + the space of symmetric positive semi-definite matrices, and Σe X,ϵ := EX e[Xeϵe] ∈ Rd. Then, for any arbitrary tuple ( ...
work page 2016
-
[58]
Using these data and the domain adaptation recipe outlined above, we build a classifierw◦Φ. Since domain adaptation enforcesP (Φ(Xes)) =P (Φ(Xet)), it consequently enforces P ( ˆYes) =P ( ˆYet), where ˆYe = w(Φ(Xe)), for all e ∈ {es,et}. Then, the classification accuracy will be at most 20%. This is worse than random guessing, in a problem where simply trai...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.