How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?

Elias Bareinboim; Fanny Yang; Julia Kostin; Kasra Jalaldoust; Samory Kpotufe

arxiv: 2606.12680 · v1 · pith:Q774Y7OPnew · submitted 2026-06-10 · 💻 cs.LG · stat.ML

How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?

Julia Kostin , Kasra Jalaldoust , Elias Bareinboim , Samory Kpotufe , Fanny Yang This is my paper

Pith reviewed 2026-06-27 10:04 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords causal invariancedomain adaptationfinite-sample analysissupervised domain adaptationlinear regressioninvariant predictorsnegative transfer

0 comments

The pith

Causal invariances improve supervised domain adaptation in finite samples only when target-risk margins are large relative to sample size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether causal knowledge of invariant features can help in supervised domain adaptation with limited target samples. It focuses on linear regression where causal structure defines candidate predictors from different feature subsets. The analysis derives bounds showing that gains depend on how much better the best candidate is on the target risk compared to others, relative to estimation errors from source data. When margins are sufficient, an adaptive method can select the best without negative transfer; when small, no method can do better than target-only learning. This connects the usefulness of causal invariance to structural properties of the shifts.

Core claim

In linear regression with a collection of candidate predictors from invariant or possibly invariant feature subsets specified by causal knowledge, matching upper and lower bounds show that finite-sample performance gains are determined by the target-risk margins separating the candidates and the finite-source estimation error. An adaptive aggregation procedure matches the best candidate and avoids negative transfer when margins are large enough relative to the number of target samples n_Q; when margins are too small, no algorithm can reliably exploit the candidates for faster rates.

What carries the argument

Target-risk margins separating the candidate predictors from invariant feature subsets, which govern whether adaptive aggregation can outperform target-only learning.

If this is right

When target-risk margins exceed a threshold involving source estimation error and n_Q, the adaptive procedure achieves the rate of the best candidate.
The procedure avoids negative transfer, meaning it does not perform worse than using only target samples.
The margins can be connected to the magnitude of structural shifts in linear structural causal models.
When margins are small, invariance provides no finite-sample advantage over target-only regression.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

In practice, one might first estimate or bound these margins before committing to causal candidates for adaptation.
The selection logic may apply to other multi-model settings where candidates differ by risk margins on the target.
Partial causal knowledge yields benefit only when the induced predictors are sufficiently separated in target risk.

Load-bearing premise

Causal knowledge is available to identify a collection of invariant feature subsets for generating candidate predictors in linear regression.

What would settle it

A simulation or real-data check where target-risk margins between candidates fall below a threshold set by source estimation error divided by n_Q, in which case the adaptive procedure shows no improvement over target-only regression.

Figures

Figures reproduced from arXiv: 2606.12680 by Elias Bareinboim, Fanny Yang, Julia Kostin, Kasra Jalaldoust, Samory Kpotufe.

**Figure 2.** Figure 2: (Generally unknown) underlying causal structure of the data shared across source and target domains [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Guarantees for our procedure h˜ (shown in purple) in the case of Corollary 3.2. A small excess risk τ can be achieved given nQ ≳ log |H0 acc|/∆τ samples, provided ∆τ is sufficiently large. In particular, h˜ does not have to select the best model hI ⋆,P to achieve the guarantee. 3.2.4 Comparison with naïve model selection and aggregation We now discuss how some natural approaches, which utilize a collection… view at source ↗

**Figure 4.** Figure 4: ). A more advanced baseline—Step 1 of Algorithm 1 followed by ERM over the accepted models—results in margin error term of order qlog |H0 acc| nQ , a slow rate compared to the margin error incurred in Theorem 3.1 [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗

**Figure 5.** Figure 5: Target risk for the toy example in Equation ( [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Target error (MAE and MSE, respectively) of the source and target model, causal DG methods, naive [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: The causal graph of the light tunnel variables used in our experiment. [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

**Figure 8.** Figure 8: (a) Examples of light tunnel image data under various interventions on the camera and tunnel setup. [PITH_FULL_IMAGE:figures/full_fig_p029_8.png] view at source ↗

read the original abstract

Machine learning models often degrade when they are deployed on a target distribution that differs from the source distributions they were trained on. Recent work in causality-based domain generalization has shown how shared causal structure between domains can induce invariant predictors, e.g., models on a subset of features which have stable risk across structured domain shifts. However, the extent to which such population-level causal invariances can lead to gains in finite-sample settings remains underexplored. In particular, in practice we often have access to a few labeled target samples, a setting called supervised domain adaptation (sDA). In this paper, we explore when (full or partial) causal knowledge can provably improve supervised domain adaptation. As a first step, we study linear regression, where full or partial causal knowledge specifies a collection of invariant or possibly invariant feature subsets, each yielding a source-trained candidate predictor. We derive matching upper and lower bounds showing that finite-sample gains are governed by the target-risk margins separating the candidates, together with the finite-source estimation error. When these margins are sufficiently large relative to $n_Q$, an adaptive aggregation procedure can match the best candidate predictor while avoiding negative transfer relative to target-only learning. On the other hand, when the margins are too small, no algorithm can reliably exploit the candidate collection to obtain faster finite-sample rates. We further connect these margins to structural shift magnitude in linear SCMs and validate the theory on real-world causal benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Finite-sample bounds show that causal candidates help in linear supervised domain adaptation only when target-risk margins exceed source estimation error, with matching upper and lower bounds.

read the letter

The main point is that causal invariance gives finite-sample gains in supervised domain adaptation only when the target-risk margins between candidate predictors are large enough relative to source estimation error. When margins are too small, the lower bound shows no algorithm can beat target-only learning.

The paper extends population-level invariance results by working in linear regression where causal knowledge defines a collection of candidate feature subsets. It derives matching upper and lower bounds on the performance of an adaptive aggregator over those candidates. The upper bound shows the aggregator can match the best candidate and avoid negative transfer when the margin condition holds with respect to n_Q. The lower bound establishes the matching impossibility result. They also connect the margins to structural shift size in linear SCMs and include checks on real causal benchmarks.

This is useful because it supplies an explicit, testable criterion for when to use the causal candidates versus ignoring them. The analysis treats the positive and negative cases symmetrically, and the causal-knowledge assumption is stated as an explicit modeling choice.

The main soft spot is the linear regression restriction, which keeps the bounds clean but limits direct applicability to nonlinear models. The practical value also hinges on already having the causal knowledge to name the candidate subsets; if that step is uncertain, the bounds become harder to use. The benchmark section is mentioned but lacks detail on how the observed margins and error bars line up with the theory.

This is for researchers focused on finite-sample behavior in causal domain adaptation. It deserves peer review because the bounds are explicit and the setup is internally consistent.

Referee Report

0 major / 4 minor

Summary. The paper studies the finite-sample utility of causal invariance for supervised domain adaptation (sDA) in linear regression. Full or partial causal knowledge is assumed to define a collection of invariant or possibly-invariant feature subsets; each subset yields a source-trained candidate predictor. The central claim is that matching upper and lower bounds show that any finite-sample gain is governed by the target-risk margins separating the candidates together with source estimation error. When these margins are sufficiently large relative to the number of target samples n_Q, an adaptive aggregation procedure matches the best candidate while avoiding negative transfer relative to target-only learning; when the margins are too small, no algorithm can reliably obtain faster rates by exploiting the collection. The margins are further linked to structural shift magnitude in linear SCMs, and the theory is validated on real-world causal benchmarks.

Significance. If the matching bounds hold, the work supplies a precise, symmetric characterization of when (and why) population-level causal invariances translate into finite-sample gains or fail to do so in sDA. The explicit modeling choice of available causal knowledge, the impossibility result that matches the positive result, and the empirical validation on benchmarks are all strengths. The analysis clarifies the role of target-risk margins in preventing negative transfer and connects theoretical quantities to SCM parameters, which is useful for understanding the practical limits of invariance-based domain-adaptation methods.

minor comments (4)

[§3.2] §3.2, Definition 2: the precise definition of the target-risk margin Δ_jk should be restated in the main text (currently only referenced to the appendix) so that the statements of Theorems 1 and 2 are self-contained.
[Figure 2] Figure 2: the error bars are described as 'standard deviation over 10 runs' but the caption does not indicate whether the plotted points are means or medians; this affects interpretation of the 'avoiding negative transfer' claim.
[§5.1] §5.1: the mapping from SCM parameters (eta, u) to the target-risk margins is stated as 'direct' but the explicit algebraic relation is only sketched; adding one displayed equation would make the structural-shift claim immediately verifiable.
Notation: the symbol n_Q is used for the number of target samples throughout, yet the source sample size is denoted n_S in some places and n in others; a single consistent notation would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained theoretical analysis

full rationale

The paper derives matching upper and lower bounds on finite-sample gains in supervised domain adaptation for linear regression, where gains are controlled by target-risk margins between a fixed collection of source-trained candidate predictors (specified via assumed causal knowledge) versus source estimation error. The adaptive aggregation succeeds only when margins exceed a threshold relative to n_Q; the lower bound shows impossibility otherwise. These bounds are derived from standard concentration and margin arguments on the given candidates; no step reduces a prediction or bound to a fitted quantity from the same data, no self-citation is invoked as a load-bearing uniqueness theorem, and the causal-knowledge assumption is stated explicitly as an input modeling choice rather than derived. The structure is internally consistent with independent content in the bounds.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the analysis rests on the existence of causal knowledge that identifies invariant feature subsets in linear SCMs.

pith-pipeline@v0.9.1-grok · 5804 in / 1096 out tokens · 21033 ms · 2026-06-27T10:04:44.587709+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

94 extracted references · 7 canonical work pages · 1 internal anchor

[1]

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant predic- tion: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, 2016

2016
[2]

Invariant models for causal transfer learning.The Journal of Machine Learning Research, 19(1):1309–1342, 2018

Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. Invariant models for causal transfer learning.The Journal of Machine Learning Research, 19(1):1309–1342, 2018

2018
[3]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[4]

Transportability from multiple environments with limited experiments: Completeness results.Advances in neural information processing systems, 27, 2014

Elias Bareinboim and Judea Pearl. Transportability from multiple environments with limited experiments: Completeness results.Advances in neural information processing systems, 27, 2014

2014
[5]

Invariance, causality and robustness.Statistical Science, 35(3):404–426, 2020

Peter Bühlmann. Invariance, causality and robustness.Statistical Science, 35(3):404–426, 2020

2020
[6]

Transportable representations for domain generalization.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12790–12800, Mar

Kasra Jalaldoust and Elias Bareinboim. Transportable representations for domain generalization.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12790–12800, Mar. 2024

2024
[7]

Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference, 90(2):227–244, 2000

Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference, 90(2):227–244, 2000

2000
[8]

Covariate shift adaptation by importance weighted cross validation.Journal of Machine Learning Research, 8(5), 2007

Masashi Sugiyama, Matthias Krauledat, and Klaus-Robert Müller. Covariate shift adaptation by importance weighted cross validation.Journal of Machine Learning Research, 8(5), 2007

2007
[9]

When training and test sets are different: characterizing learning transfer

Amos Storkey. When training and test sets are different: characterizing learning transfer. 2008

2008
[10]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

2018
[11]

A unified view of label shift estimation.Advances in Neural Information Processing Systems, 2020

Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, and Zachary Lipton. A unified view of label shift estimation.Advances in Neural Information Processing Systems, 2020

2020
[12]

Mechanisms and the nature of causation.Erkenntnis, 44(1):49–71, 1996

Stuart S Glennan. Mechanisms and the nature of causation.Erkenntnis, 44(1):49–71, 1996

1996
[13]

Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000

Peter Machamer, Lindley Darden, and Carl F Craver. Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000

2000
[14]

Transportability of causal and statistical relations: A formal approach

Judea Pearl and Elias Bareinboim. Transportability of causal and statistical relations: A formal approach. In2011 IEEE 11th International Conference on Data Mining Workshops, pages 540–547, 2011

2011
[15]

From statistical transportability to estimating the effect of stochastic interventions

Juan D Correa and Elias Bareinboim. From statistical transportability to estimating the effect of stochastic interventions. InIJCAI, pages 1661–1667, 2019

2019
[16]

General transportability of soft interventions: Completeness results

Juan Correa and Elias Bareinboim. General transportability of soft interventions: Completeness results. Advances in Neural Information Processing Systems, 33:10902–10912, 2020

2020
[17]

A causal framework for distribution generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6614–6630, 2021

Rune Christiansen, Niklas Pfister, Martin Emil Jakobsen, Nicola Gnecco, and Jonas Peters. A causal framework for distribution generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6614–6630, 2021

2021
[18]

Invariant causal prediction for nonlinear models.Journal of Causal Inference, 6(2):20170016, 2018

Christina Heinze-Deml, Jonas Peters, and Nicolai Meinshausen. Invariant causal prediction for nonlinear models.Journal of Causal Inference, 6(2):20170016, 2018

2018
[19]

Invariant causal prediction for nonlinear models.Journal of Causal Inference, 8(1):350–367, 2020

Biwei Huang, Kun Zhang, and Bernhard Schölkopf. Invariant causal prediction for nonlinear models.Journal of Causal Inference, 8(1):350–367, 2020

2020
[20]

Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2022

Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2022

2022
[21]

On calibration and out-of-domain generalization

Yoav Wald, Amir Feder, Daniel Greenfeld, and Uri Shalit. On calibration and out-of-domain generalization. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021. 19

2021
[22]

Domain generalization via invariant feature representation

Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. Domain generalization via invariant feature representation. InInternational conference on Machine Learning, pages 10–18. PMLR, 2013

2013
[23]

Domain generalization via conditional invariant representations

Ya Li, Mingming Gong, Xinmei Tian, Tongliang Liu, and Dacheng Tao. Domain generalization via conditional invariant representations. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

2018
[24]

In search of lost domain generalization

Ishaan Gulrajani and David Lopez-Paz. In search of lost domain generalization. InInternational Conference on Learning Representations, 2021

2021
[25]

Do causal predictors generalize better to new domains? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

Vivian Yvonne Nastl and Moritz Hardt. Do causal predictors generalize better to new domains? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[26]

Shanmukha Ramakrishna Vedantam, David Lopez-Paz, and David J. Schwab. An empirical investigation of domain generalization with empirical risk minimizers. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021

2021
[27]

Partial transportability for domain generalization

Kasra Jalaldoust, Alexis Bellot, and Elias Bareinboim. Partial transportability for domain generalization. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[28]

Achievable distributional robustness when the robust risk is only partially identified.Advances in Neural Information Processing Systems, 37:83915–83950, 2024

Julia Kostin, Nicola Gnecco, and Fanny Yang. Achievable distributional robustness when the robust risk is only partially identified.Advances in Neural Information Processing Systems, 37:83915–83950, 2024

2024
[29]

Anchor regression: Heterogeneous data meet causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: Heterogeneous data meet causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021

2021
[30]

Causality-oriented robustness: exploiting general additive interventions.arXiv preprint arXiv:2307.10299, 2023

Xinwei Shen, Peter Bühlmann, and Armeen Taeb. Causality-oriented robustness: exploiting general additive interventions.arXiv preprint arXiv:2307.10299, 2023

work page arXiv 2023
[31]

Distributional anchor regression.Statistics and Computing, 32(3), May 2022

Lucas Kook, Beate Sick, and Peter Bühlmann. Distributional anchor regression.Statistics and Computing, 32(3), May 2022

2022
[32]

Distributional robustness of K-class estimators and the PULSE

Martin Emil Jakobsen and Jonas Peters. Distributional robustness of K-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022

2022
[33]

Stabilizing variable selection and regression.The Annals of Applied Statistics, 15(3):1220–1246, 2021

Niklas Pfister, Evan G Williams, Jonas Peters, Ruedi Aebersold, and Peter Bühlmann. Stabilizing variable selection and regression.The Annals of Applied Statistics, 15(3):1220–1246, 2021

2021
[34]

A survey on domain adaptation theory: learning bounds and theoretical guarantees.arXiv preprint arXiv:2004.11829, 2020

Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, and Younès Bennani. A survey on domain adaptation theory: learning bounds and theoretical guarantees.arXiv preprint arXiv:2004.11829, 2020

work page arXiv 2004
[35]

Analysis of representations for domain adaptation

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In B. Schölkopf, J. Platt, and T. Hoffman, editors,Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006

2006
[36]

A theory of learning from different domains.Machine learning, 79:151–175, 2010

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains.Machine learning, 79:151–175, 2010

2010
[37]

Learning bounds for importance weighting

Corinna Cortes, Yishay Mansour, and Mehryar Mohri. Learning bounds for importance weighting. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors,Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010

2010
[38]

Domain adaptation with structural correspondence learning

John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. InProceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 120–128, 2006

2006
[39]

Domain adaptation with multiple sources

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008

2008
[40]

Domain adaptation with coupled subspaces

John Blitzer, Sham Kakade, and Dean Foster. Domain adaptation with coupled subspaces. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 173–181. PMLR, 2011. 20

2011
[41]

Joint transfer and batch-mode active learning

Rita Chattopadhyay, Wei Fan, Ian Davidson, Sethuraman Panchanathan, and Jieping Ye. Joint transfer and batch-mode active learning. In Sanjoy Dasgupta and David McAllester, editors,Proceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 253–261, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR

2013
[42]

A theory of transfer learning with applications to active learning.Machine Learning, 90, 02 2013

Liu Yang, Steve Hanneke, and Jaime Carbonell. A theory of transfer learning with applications to active learning.Machine Learning, 90, 02 2013

2013
[43]

Avishek Saha, Piyush Rai, Hal Daumé, Suresh Venkatasubramanian, and Scott L. DuVall. Active supervised domain adaptation. In Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, editors,Machine Learning and Knowledge Discovery in Databases, pages 97–112, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg

2011
[44]

Curran Associates Inc., Red Hook, NY, USA, 2019

Steve Hanneke and Samory Kpotufe.On the value of target data in transfer learning. Curran Associates Inc., Red Hook, NY, USA, 2019

2019
[45]

Adaptive sample aggregation in transfer learning, 2025

Steve Hanneke and Samory Kpotufe. Adaptive sample aggregation in transfer learning, 2025

2025
[46]

Exploiting task relatedness for multiple task learning

Shai Ben-David and Reba Schuller. Exploiting task relatedness for multiple task learning. InProceedings of the 16th Annual Conference on Learning Theory (COLT), pages 567–580, 2003

2003
[47]

Impossibility theorems for domain adaptation

Shai Ben-David, Tyler Lu, Teresa Luu, and Dávid Pál. Impossibility theorems for domain adaptation. InProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 129–136. JMLR Workshop and Conference Proceedings, 2010

2010
[48]

On the hardness of domain adaptation and the utility of unlabeled target samples

Shai Ben-David and Ruth Urner. On the hardness of domain adaptation and the utility of unlabeled target samples. In Nader H. Bshouty, Gilles Stoltz, Nicolas Vayatis, and Thomas Zeugmann, editors,Algorithmic Learning Theory, pages 139–153, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg

2012
[49]

Domain adaptation–can quantity compensate for quality?Annals of Mathematics and Artificial Intelligence, 70(3):185–202, 2014

Shai Ben-David and Ruth Urner. Domain adaptation–can quantity compensate for quality?Annals of Mathematics and Artificial Intelligence, 70(3):185–202, 2014

2014
[50]

Domain adaptation with conditional transferable components

Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Schölkopf. Domain adaptation with conditional transferable components. InInternational Conference on Machine Learning, pages 2839–2848. PMLR, 2016

2016
[51]

Conditional variance penalties and domain shift robustness

Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. Machine Learning, 110(2):303–348, 2021

2021
[52]

Domain adaptation under structural causal models.Journal of Machine Learning Research, 22(261):1–80, 2021

Yuansi Chen and Peter Bühlmann. Domain adaptation under structural causal models.Journal of Machine Learning Research, 22(261):1–80, 2021

2021
[53]

Prominent roles of conditionally invariant components in domain adaptation: Theory and algorithms.arXiv preprint arXiv:2309.10301, 2023

Keru Wu, Yuansi Chen, Wooseok Ha, and Bin Yu. Prominent roles of conditionally invariant components in domain adaptation: Theory and algorithms.arXiv preprint arXiv:2309.10301, 2023

work page arXiv 2023
[54]

Onlearninginvariantrepresentations for domain adaptation

HanZhao, RemiTachetDesCombes, KunZhang, andGeoffreyGordon. Onlearninginvariantrepresentations for domain adaptation. InInternational conference on machine learning, pages 7523–7532. PMLR, 2019

2019
[55]

Support and invertibility in domain-invariant representations

Fredrik D Johansson, David Sontag, and Rajesh Ranganath. Support and invertibility in domain-invariant representations. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 527–536. PMLR, 2019

2019
[56]

Domain generalization and adaptation in intensive care with anchor regression.arXiv preprint arXiv:2507.21783, 2025

Malte Londschien, Manuel Burger, Gunnar Rätsch, and Peter Bühlmann. Domain generalization and adaptation in intensive care with anchor regression.arXiv preprint arXiv:2507.21783, 2025

work page arXiv 2025
[57]

Optimal rates of aggregation

Alexandre B Tsybakov. Optimal rates of aggregation. InLearning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings, pages 303–313. Springer, 2003

2003
[58]

Kullback-leibler aggregation and misspecified generalized linear models.The Annals of Statistics, pages 639–665, 2012

Philippe Rigollet. Kullback-leibler aggregation and misspecified generalized linear models.The Annals of Statistics, pages 639–665, 2012

2012
[59]

Model selection for nonparametric regression.Statistica Sinica, pages 475–499, 1999

Yuhong Yang. Model selection for nonparametric regression.Statistica Sinica, pages 475–499, 1999. 21

1999
[60]

Progressive mixture rules are deviation suboptimal.Advances in Neural Information Processing Systems, 20, 2007

Jean-Yves Audibert. Progressive mixture rules are deviation suboptimal.Advances in Neural Information Processing Systems, 20, 2007

2007
[61]

Learning by mirror averaging

Anatoli Juditsky, Philippe Rigollet, and Alexandre B Tsybakov. Learning by mirror averaging. 2008

2008
[62]

Optimal learning with q-aggregation

Guillaume Lecué and Philippe Rigollet. Optimal learning with q-aggregation. 2014

2014
[63]

Proof of the optimality of the empirical star algorithm.Technical note, 2007

Jean-Yves Audibert. Proof of the optimality of the empirical star algorithm.Technical note, 2007

2007
[64]

Cambridge University Press, USA, 2nd edition, 2009

Judea Pearl.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009

2009
[65]

MIT press, 2001

Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2001

2001
[66]

MIT press, 2000

Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000

2000
[67]

Confidence sets for causal orderings.Journal of the American Statistical Association, 121(553):690–703, 2026

Y Samuel Wang, Mladen Kolar, and Mathias Drton. Confidence sets for causal orderings.Journal of the American Statistical Association, 121(553):690–703, 2026

2026
[68]

Causality pursuit from heterogeneous environments via neural adversarial invariance learning.arXiv preprint arXiv:2405.04715, 2024

Yihong Gu, Cong Fang, Peter Bühlmann, and Jianqing Fan. Causality pursuit from heterogeneous environments via neural adversarial invariance learning.arXiv preprint arXiv:2405.04715, 2024

work page arXiv 2024
[69]

On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias.Artificial Intelligence, 172(16-17):1873–1896, 2008

Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias.Artificial Intelligence, 172(16-17):1873–1896, 2008

2008
[70]

Complete graphical charac- terization and construction of adjustment sets in markov equivalence classes of ancestral graphs.Journal of Machine Learning Research, 18(220):1–62, 2018

Emilija Perković, Johannes Textor, Markus Kalisch, and Marloes H Maathuis. Complete graphical charac- terization and construction of adjustment sets in markov equivalence classes of ancestral graphs.Journal of Machine Learning Research, 18(220):1–62, 2018

2018
[71]

Causal discovery from observational and interventional data across multiple environments.Advances in Neural Information Processing Systems, 36:16942–16956, 2023

Adam Li, Amin Jaber, and Elias Bareinboim. Causal discovery from observational and interventional data across multiple environments.Advances in Neural Information Processing Systems, 36:16942–16956, 2023

2023
[72]

Characterizationandgreedylearningofinterventionalmarkovequivalence classes of directed acyclic graphs.The Journal of Machine Learning Research, 13(1):2409–2464, 2012

AlainHauserandPeterBühlmann. Characterizationandgreedylearningofinterventionalmarkovequivalence classes of directed acyclic graphs.The Journal of Machine Learning Research, 13(1):2409–2464, 2012

2012
[73]

Characterizing and learning equivalence classes of causal dags under interventions

Karren Yang, Abigail Katcoff, and Caroline Uhler. Characterizing and learning equivalence classes of causal dags under interventions. InInternational Conference on Machine Learning, pages 5541–5550. PMLR, 2018

2018
[74]

Random design analysis of ridge regression

Daniel Hsu, Sham M Kakade, and Tong Zhang. Random design analysis of ridge regression. InConference on learning theory, pages 9–1. JMLR Workshop and Conference Proceedings, 2012

2012
[75]

Deviation optimal learning using greedy q-aggregation

Dong Dai, Philippe Rigollet, and Tong Zhang. Deviation optimal learning using greedy q-aggregation. 2012

2012
[76]

Out-of-distribution generalization via risk extrapolation (REx)

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk extrapolation (REx). In International Conference on Machine Learning, pages 5815–5826. PMLR, 2021

2021
[77]

Hashimoto, and Percy Liang

Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. InInternational Conference on Learning Representations, 2020

2020
[78]

Gamella, Jonas Peters, and Peter Bühlmann

Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal chambers as a real-world physical testbed for AI methodology.Nature Machine Intelligence, 2025

2025
[79]

Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.Cell, 185(14):2559–2575, 2022

Joseph M Replogle, Reuben A Saunders, Angela N Pogson, Jeffrey A Hussmann, Alexander Lenail, Alina Guna, Lauren Mascibroda, Eric J Wagner, Karen Adelman, Gila Lithwick-Yanai, et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.Cell, 185(14):2559–2575, 2022

2022
[80]

Hyper-sparse optimal aggregation.The Journal of Machine Learning Research, 12:1813–1833, 2011

Stéphane Gaîffas and Guillaume Lecué. Hyper-sparse optimal aggregation.The Journal of Machine Learning Research, 12:1813–1833, 2011

2011

Showing first 80 references.

[1] [1]

Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant predic- tion: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, 2016

2016

[2] [2]

Invariant models for causal transfer learning.The Journal of Machine Learning Research, 19(1):1309–1342, 2018

Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. Invariant models for causal transfer learning.The Journal of Machine Learning Research, 19(1):1309–1342, 2018

2018

[3] [3]

Invariant Risk Minimization

Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[4] [4]

Transportability from multiple environments with limited experiments: Completeness results.Advances in neural information processing systems, 27, 2014

Elias Bareinboim and Judea Pearl. Transportability from multiple environments with limited experiments: Completeness results.Advances in neural information processing systems, 27, 2014

2014

[5] [5]

Invariance, causality and robustness.Statistical Science, 35(3):404–426, 2020

Peter Bühlmann. Invariance, causality and robustness.Statistical Science, 35(3):404–426, 2020

2020

[6] [6]

Transportable representations for domain generalization.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12790–12800, Mar

Kasra Jalaldoust and Elias Bareinboim. Transportable representations for domain generalization.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12790–12800, Mar. 2024

2024

[7] [7]

Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference, 90(2):227–244, 2000

Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference, 90(2):227–244, 2000

2000

[8] [8]

Covariate shift adaptation by importance weighted cross validation.Journal of Machine Learning Research, 8(5), 2007

Masashi Sugiyama, Matthias Krauledat, and Klaus-Robert Müller. Covariate shift adaptation by importance weighted cross validation.Journal of Machine Learning Research, 8(5), 2007

2007

[9] [9]

When training and test sets are different: characterizing learning transfer

Amos Storkey. When training and test sets are different: characterizing learning transfer. 2008

2008

[10] [10]

Detecting and correcting for label shift with black box predictors

Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018

2018

[11] [11]

A unified view of label shift estimation.Advances in Neural Information Processing Systems, 2020

Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, and Zachary Lipton. A unified view of label shift estimation.Advances in Neural Information Processing Systems, 2020

2020

[12] [12]

Mechanisms and the nature of causation.Erkenntnis, 44(1):49–71, 1996

Stuart S Glennan. Mechanisms and the nature of causation.Erkenntnis, 44(1):49–71, 1996

1996

[13] [13]

Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000

Peter Machamer, Lindley Darden, and Carl F Craver. Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000

2000

[14] [14]

Transportability of causal and statistical relations: A formal approach

Judea Pearl and Elias Bareinboim. Transportability of causal and statistical relations: A formal approach. In2011 IEEE 11th International Conference on Data Mining Workshops, pages 540–547, 2011

2011

[15] [15]

From statistical transportability to estimating the effect of stochastic interventions

Juan D Correa and Elias Bareinboim. From statistical transportability to estimating the effect of stochastic interventions. InIJCAI, pages 1661–1667, 2019

2019

[16] [16]

General transportability of soft interventions: Completeness results

Juan Correa and Elias Bareinboim. General transportability of soft interventions: Completeness results. Advances in Neural Information Processing Systems, 33:10902–10912, 2020

2020

[17] [17]

A causal framework for distribution generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6614–6630, 2021

Rune Christiansen, Niklas Pfister, Martin Emil Jakobsen, Nicola Gnecco, and Jonas Peters. A causal framework for distribution generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6614–6630, 2021

2021

[18] [18]

Invariant causal prediction for nonlinear models.Journal of Causal Inference, 6(2):20170016, 2018

Christina Heinze-Deml, Jonas Peters, and Nicolai Meinshausen. Invariant causal prediction for nonlinear models.Journal of Causal Inference, 6(2):20170016, 2018

2018

[19] [19]

Invariant causal prediction for nonlinear models.Journal of Causal Inference, 8(1):350–367, 2020

Biwei Huang, Kun Zhang, and Bernhard Schölkopf. Invariant causal prediction for nonlinear models.Journal of Causal Inference, 8(1):350–367, 2020

2020

[20] [20]

Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2022

Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2022

2022

[21] [21]

On calibration and out-of-domain generalization

Yoav Wald, Amir Feder, Daniel Greenfeld, and Uri Shalit. On calibration and out-of-domain generalization. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021. 19

2021

[22] [22]

Domain generalization via invariant feature representation

Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. Domain generalization via invariant feature representation. InInternational conference on Machine Learning, pages 10–18. PMLR, 2013

2013

[23] [23]

Domain generalization via conditional invariant representations

Ya Li, Mingming Gong, Xinmei Tian, Tongliang Liu, and Dacheng Tao. Domain generalization via conditional invariant representations. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018

2018

[24] [24]

In search of lost domain generalization

Ishaan Gulrajani and David Lopez-Paz. In search of lost domain generalization. InInternational Conference on Learning Representations, 2021

2021

[25] [25]

Do causal predictors generalize better to new domains? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

Vivian Yvonne Nastl and Moritz Hardt. Do causal predictors generalize better to new domains? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[26] [26]

Shanmukha Ramakrishna Vedantam, David Lopez-Paz, and David J. Schwab. An empirical investigation of domain generalization with empirical risk minimizers. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021

2021

[27] [27]

Partial transportability for domain generalization

Kasra Jalaldoust, Alexis Bellot, and Elias Bareinboim. Partial transportability for domain generalization. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[28] [28]

Achievable distributional robustness when the robust risk is only partially identified.Advances in Neural Information Processing Systems, 37:83915–83950, 2024

Julia Kostin, Nicola Gnecco, and Fanny Yang. Achievable distributional robustness when the robust risk is only partially identified.Advances in Neural Information Processing Systems, 37:83915–83950, 2024

2024

[29] [29]

Anchor regression: Heterogeneous data meet causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021

Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: Heterogeneous data meet causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021

2021

[30] [30]

Causality-oriented robustness: exploiting general additive interventions.arXiv preprint arXiv:2307.10299, 2023

Xinwei Shen, Peter Bühlmann, and Armeen Taeb. Causality-oriented robustness: exploiting general additive interventions.arXiv preprint arXiv:2307.10299, 2023

work page arXiv 2023

[31] [31]

Distributional anchor regression.Statistics and Computing, 32(3), May 2022

Lucas Kook, Beate Sick, and Peter Bühlmann. Distributional anchor regression.Statistics and Computing, 32(3), May 2022

2022

[32] [32]

Distributional robustness of K-class estimators and the PULSE

Martin Emil Jakobsen and Jonas Peters. Distributional robustness of K-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022

2022

[33] [33]

Stabilizing variable selection and regression.The Annals of Applied Statistics, 15(3):1220–1246, 2021

Niklas Pfister, Evan G Williams, Jonas Peters, Ruedi Aebersold, and Peter Bühlmann. Stabilizing variable selection and regression.The Annals of Applied Statistics, 15(3):1220–1246, 2021

2021

[34] [34]

A survey on domain adaptation theory: learning bounds and theoretical guarantees.arXiv preprint arXiv:2004.11829, 2020

Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, and Younès Bennani. A survey on domain adaptation theory: learning bounds and theoretical guarantees.arXiv preprint arXiv:2004.11829, 2020

work page arXiv 2004

[35] [35]

Analysis of representations for domain adaptation

Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In B. Schölkopf, J. Platt, and T. Hoffman, editors,Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006

2006

[36] [36]

A theory of learning from different domains.Machine learning, 79:151–175, 2010

Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains.Machine learning, 79:151–175, 2010

2010

[37] [37]

Learning bounds for importance weighting

Corinna Cortes, Yishay Mansour, and Mehryar Mohri. Learning bounds for importance weighting. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors,Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010

2010

[38] [38]

Domain adaptation with structural correspondence learning

John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. InProceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 120–128, 2006

2006

[39] [39]

Domain adaptation with multiple sources

Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008

2008

[40] [40]

Domain adaptation with coupled subspaces

John Blitzer, Sham Kakade, and Dean Foster. Domain adaptation with coupled subspaces. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 173–181. PMLR, 2011. 20

2011

[41] [41]

Joint transfer and batch-mode active learning

Rita Chattopadhyay, Wei Fan, Ian Davidson, Sethuraman Panchanathan, and Jieping Ye. Joint transfer and batch-mode active learning. In Sanjoy Dasgupta and David McAllester, editors,Proceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 253–261, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR

2013

[42] [42]

A theory of transfer learning with applications to active learning.Machine Learning, 90, 02 2013

Liu Yang, Steve Hanneke, and Jaime Carbonell. A theory of transfer learning with applications to active learning.Machine Learning, 90, 02 2013

2013

[43] [43]

Avishek Saha, Piyush Rai, Hal Daumé, Suresh Venkatasubramanian, and Scott L. DuVall. Active supervised domain adaptation. In Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, editors,Machine Learning and Knowledge Discovery in Databases, pages 97–112, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg

2011

[44] [44]

Curran Associates Inc., Red Hook, NY, USA, 2019

Steve Hanneke and Samory Kpotufe.On the value of target data in transfer learning. Curran Associates Inc., Red Hook, NY, USA, 2019

2019

[45] [45]

Adaptive sample aggregation in transfer learning, 2025

Steve Hanneke and Samory Kpotufe. Adaptive sample aggregation in transfer learning, 2025

2025

[46] [46]

Exploiting task relatedness for multiple task learning

Shai Ben-David and Reba Schuller. Exploiting task relatedness for multiple task learning. InProceedings of the 16th Annual Conference on Learning Theory (COLT), pages 567–580, 2003

2003

[47] [47]

Impossibility theorems for domain adaptation

Shai Ben-David, Tyler Lu, Teresa Luu, and Dávid Pál. Impossibility theorems for domain adaptation. InProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 129–136. JMLR Workshop and Conference Proceedings, 2010

2010

[48] [48]

On the hardness of domain adaptation and the utility of unlabeled target samples

Shai Ben-David and Ruth Urner. On the hardness of domain adaptation and the utility of unlabeled target samples. In Nader H. Bshouty, Gilles Stoltz, Nicolas Vayatis, and Thomas Zeugmann, editors,Algorithmic Learning Theory, pages 139–153, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg

2012

[49] [49]

Domain adaptation–can quantity compensate for quality?Annals of Mathematics and Artificial Intelligence, 70(3):185–202, 2014

Shai Ben-David and Ruth Urner. Domain adaptation–can quantity compensate for quality?Annals of Mathematics and Artificial Intelligence, 70(3):185–202, 2014

2014

[50] [50]

Domain adaptation with conditional transferable components

Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Schölkopf. Domain adaptation with conditional transferable components. InInternational Conference on Machine Learning, pages 2839–2848. PMLR, 2016

2016

[51] [51]

Conditional variance penalties and domain shift robustness

Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. Machine Learning, 110(2):303–348, 2021

2021

[52] [52]

Domain adaptation under structural causal models.Journal of Machine Learning Research, 22(261):1–80, 2021

Yuansi Chen and Peter Bühlmann. Domain adaptation under structural causal models.Journal of Machine Learning Research, 22(261):1–80, 2021

2021

[53] [53]

Prominent roles of conditionally invariant components in domain adaptation: Theory and algorithms.arXiv preprint arXiv:2309.10301, 2023

Keru Wu, Yuansi Chen, Wooseok Ha, and Bin Yu. Prominent roles of conditionally invariant components in domain adaptation: Theory and algorithms.arXiv preprint arXiv:2309.10301, 2023

work page arXiv 2023

[54] [54]

Onlearninginvariantrepresentations for domain adaptation

HanZhao, RemiTachetDesCombes, KunZhang, andGeoffreyGordon. Onlearninginvariantrepresentations for domain adaptation. InInternational conference on machine learning, pages 7523–7532. PMLR, 2019

2019

[55] [55]

Support and invertibility in domain-invariant representations

Fredrik D Johansson, David Sontag, and Rajesh Ranganath. Support and invertibility in domain-invariant representations. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 527–536. PMLR, 2019

2019

[56] [56]

Domain generalization and adaptation in intensive care with anchor regression.arXiv preprint arXiv:2507.21783, 2025

Malte Londschien, Manuel Burger, Gunnar Rätsch, and Peter Bühlmann. Domain generalization and adaptation in intensive care with anchor regression.arXiv preprint arXiv:2507.21783, 2025

work page arXiv 2025

[57] [57]

Optimal rates of aggregation

Alexandre B Tsybakov. Optimal rates of aggregation. InLearning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings, pages 303–313. Springer, 2003

2003

[58] [58]

Kullback-leibler aggregation and misspecified generalized linear models.The Annals of Statistics, pages 639–665, 2012

Philippe Rigollet. Kullback-leibler aggregation and misspecified generalized linear models.The Annals of Statistics, pages 639–665, 2012

2012

[59] [59]

Model selection for nonparametric regression.Statistica Sinica, pages 475–499, 1999

Yuhong Yang. Model selection for nonparametric regression.Statistica Sinica, pages 475–499, 1999. 21

1999

[60] [60]

Progressive mixture rules are deviation suboptimal.Advances in Neural Information Processing Systems, 20, 2007

Jean-Yves Audibert. Progressive mixture rules are deviation suboptimal.Advances in Neural Information Processing Systems, 20, 2007

2007

[61] [61]

Learning by mirror averaging

Anatoli Juditsky, Philippe Rigollet, and Alexandre B Tsybakov. Learning by mirror averaging. 2008

2008

[62] [62]

Optimal learning with q-aggregation

Guillaume Lecué and Philippe Rigollet. Optimal learning with q-aggregation. 2014

2014

[63] [63]

Proof of the optimality of the empirical star algorithm.Technical note, 2007

Jean-Yves Audibert. Proof of the optimality of the empirical star algorithm.Technical note, 2007

2007

[64] [64]

Cambridge University Press, USA, 2nd edition, 2009

Judea Pearl.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009

2009

[65] [65]

MIT press, 2001

Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2001

2001

[66] [66]

MIT press, 2000

Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000

2000

[67] [67]

Confidence sets for causal orderings.Journal of the American Statistical Association, 121(553):690–703, 2026

Y Samuel Wang, Mladen Kolar, and Mathias Drton. Confidence sets for causal orderings.Journal of the American Statistical Association, 121(553):690–703, 2026

2026

[68] [68]

Causality pursuit from heterogeneous environments via neural adversarial invariance learning.arXiv preprint arXiv:2405.04715, 2024

Yihong Gu, Cong Fang, Peter Bühlmann, and Jianqing Fan. Causality pursuit from heterogeneous environments via neural adversarial invariance learning.arXiv preprint arXiv:2405.04715, 2024

work page arXiv 2024

[69] [69]

On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias.Artificial Intelligence, 172(16-17):1873–1896, 2008

Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias.Artificial Intelligence, 172(16-17):1873–1896, 2008

2008

[70] [70]

Complete graphical charac- terization and construction of adjustment sets in markov equivalence classes of ancestral graphs.Journal of Machine Learning Research, 18(220):1–62, 2018

Emilija Perković, Johannes Textor, Markus Kalisch, and Marloes H Maathuis. Complete graphical charac- terization and construction of adjustment sets in markov equivalence classes of ancestral graphs.Journal of Machine Learning Research, 18(220):1–62, 2018

2018

[71] [71]

Causal discovery from observational and interventional data across multiple environments.Advances in Neural Information Processing Systems, 36:16942–16956, 2023

Adam Li, Amin Jaber, and Elias Bareinboim. Causal discovery from observational and interventional data across multiple environments.Advances in Neural Information Processing Systems, 36:16942–16956, 2023

2023

[72] [72]

Characterizationandgreedylearningofinterventionalmarkovequivalence classes of directed acyclic graphs.The Journal of Machine Learning Research, 13(1):2409–2464, 2012

AlainHauserandPeterBühlmann. Characterizationandgreedylearningofinterventionalmarkovequivalence classes of directed acyclic graphs.The Journal of Machine Learning Research, 13(1):2409–2464, 2012

2012

[73] [73]

Characterizing and learning equivalence classes of causal dags under interventions

Karren Yang, Abigail Katcoff, and Caroline Uhler. Characterizing and learning equivalence classes of causal dags under interventions. InInternational Conference on Machine Learning, pages 5541–5550. PMLR, 2018

2018

[74] [74]

Random design analysis of ridge regression

Daniel Hsu, Sham M Kakade, and Tong Zhang. Random design analysis of ridge regression. InConference on learning theory, pages 9–1. JMLR Workshop and Conference Proceedings, 2012

2012

[75] [75]

Deviation optimal learning using greedy q-aggregation

Dong Dai, Philippe Rigollet, and Tong Zhang. Deviation optimal learning using greedy q-aggregation. 2012

2012

[76] [76]

Out-of-distribution generalization via risk extrapolation (REx)

David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk extrapolation (REx). In International Conference on Machine Learning, pages 5815–5826. PMLR, 2021

2021

[77] [77]

Hashimoto, and Percy Liang

Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. InInternational Conference on Learning Representations, 2020

2020

[78] [78]

Gamella, Jonas Peters, and Peter Bühlmann

Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal chambers as a real-world physical testbed for AI methodology.Nature Machine Intelligence, 2025

2025

[79] [79]

Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.Cell, 185(14):2559–2575, 2022

Joseph M Replogle, Reuben A Saunders, Angela N Pogson, Jeffrey A Hussmann, Alexander Lenail, Alina Guna, Lauren Mascibroda, Eric J Wagner, Karen Adelman, Gila Lithwick-Yanai, et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.Cell, 185(14):2559–2575, 2022

2022

[80] [80]

Hyper-sparse optimal aggregation.The Journal of Machine Learning Research, 12:1813–1833, 2011

Stéphane Gaîffas and Guillaume Lecué. Hyper-sparse optimal aggregation.The Journal of Machine Learning Research, 12:1813–1833, 2011

2011