How Useful is Causal Invariance for Domain Adaptation in Finite-Sample Settings?
Pith reviewed 2026-06-27 10:04 UTC · model grok-4.3
The pith
Causal invariances improve supervised domain adaptation in finite samples only when target-risk margins are large relative to sample size.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In linear regression with a collection of candidate predictors from invariant or possibly invariant feature subsets specified by causal knowledge, matching upper and lower bounds show that finite-sample performance gains are determined by the target-risk margins separating the candidates and the finite-source estimation error. An adaptive aggregation procedure matches the best candidate and avoids negative transfer when margins are large enough relative to the number of target samples n_Q; when margins are too small, no algorithm can reliably exploit the candidates for faster rates.
What carries the argument
Target-risk margins separating the candidate predictors from invariant feature subsets, which govern whether adaptive aggregation can outperform target-only learning.
If this is right
- When target-risk margins exceed a threshold involving source estimation error and n_Q, the adaptive procedure achieves the rate of the best candidate.
- The procedure avoids negative transfer, meaning it does not perform worse than using only target samples.
- The margins can be connected to the magnitude of structural shifts in linear structural causal models.
- When margins are small, invariance provides no finite-sample advantage over target-only regression.
Where Pith is reading between the lines
- In practice, one might first estimate or bound these margins before committing to causal candidates for adaptation.
- The selection logic may apply to other multi-model settings where candidates differ by risk margins on the target.
- Partial causal knowledge yields benefit only when the induced predictors are sufficiently separated in target risk.
Load-bearing premise
Causal knowledge is available to identify a collection of invariant feature subsets for generating candidate predictors in linear regression.
What would settle it
A simulation or real-data check where target-risk margins between candidates fall below a threshold set by source estimation error divided by n_Q, in which case the adaptive procedure shows no improvement over target-only regression.
Figures
read the original abstract
Machine learning models often degrade when they are deployed on a target distribution that differs from the source distributions they were trained on. Recent work in causality-based domain generalization has shown how shared causal structure between domains can induce invariant predictors, e.g., models on a subset of features which have stable risk across structured domain shifts. However, the extent to which such population-level causal invariances can lead to gains in finite-sample settings remains underexplored. In particular, in practice we often have access to a few labeled target samples, a setting called supervised domain adaptation (sDA). In this paper, we explore when (full or partial) causal knowledge can provably improve supervised domain adaptation. As a first step, we study linear regression, where full or partial causal knowledge specifies a collection of invariant or possibly invariant feature subsets, each yielding a source-trained candidate predictor. We derive matching upper and lower bounds showing that finite-sample gains are governed by the target-risk margins separating the candidates, together with the finite-source estimation error. When these margins are sufficiently large relative to $n_Q$, an adaptive aggregation procedure can match the best candidate predictor while avoiding negative transfer relative to target-only learning. On the other hand, when the margins are too small, no algorithm can reliably exploit the candidate collection to obtain faster finite-sample rates. We further connect these margins to structural shift magnitude in linear SCMs and validate the theory on real-world causal benchmarks.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies the finite-sample utility of causal invariance for supervised domain adaptation (sDA) in linear regression. Full or partial causal knowledge is assumed to define a collection of invariant or possibly-invariant feature subsets; each subset yields a source-trained candidate predictor. The central claim is that matching upper and lower bounds show that any finite-sample gain is governed by the target-risk margins separating the candidates together with source estimation error. When these margins are sufficiently large relative to the number of target samples n_Q, an adaptive aggregation procedure matches the best candidate while avoiding negative transfer relative to target-only learning; when the margins are too small, no algorithm can reliably obtain faster rates by exploiting the collection. The margins are further linked to structural shift magnitude in linear SCMs, and the theory is validated on real-world causal benchmarks.
Significance. If the matching bounds hold, the work supplies a precise, symmetric characterization of when (and why) population-level causal invariances translate into finite-sample gains or fail to do so in sDA. The explicit modeling choice of available causal knowledge, the impossibility result that matches the positive result, and the empirical validation on benchmarks are all strengths. The analysis clarifies the role of target-risk margins in preventing negative transfer and connects theoretical quantities to SCM parameters, which is useful for understanding the practical limits of invariance-based domain-adaptation methods.
minor comments (4)
- [§3.2] §3.2, Definition 2: the precise definition of the target-risk margin Δ_jk should be restated in the main text (currently only referenced to the appendix) so that the statements of Theorems 1 and 2 are self-contained.
- [Figure 2] Figure 2: the error bars are described as 'standard deviation over 10 runs' but the caption does not indicate whether the plotted points are means or medians; this affects interpretation of the 'avoiding negative transfer' claim.
- [§5.1] §5.1: the mapping from SCM parameters (eta, u) to the target-risk margins is stated as 'direct' but the explicit algebraic relation is only sketched; adding one displayed equation would make the structural-shift claim immediately verifiable.
- Notation: the symbol n_Q is used for the number of target samples throughout, yet the source sample size is denoted n_S in some places and n in others; a single consistent notation would improve readability.
Simulated Author's Rebuttal
We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No specific major comments were provided in the report.
Circularity Check
No significant circularity; derivation is self-contained theoretical analysis
full rationale
The paper derives matching upper and lower bounds on finite-sample gains in supervised domain adaptation for linear regression, where gains are controlled by target-risk margins between a fixed collection of source-trained candidate predictors (specified via assumed causal knowledge) versus source estimation error. The adaptive aggregation succeeds only when margins exceed a threshold relative to n_Q; the lower bound shows impossibility otherwise. These bounds are derived from standard concentration and margin arguments on the given candidates; no step reduces a prediction or bound to a fitted quantity from the same data, no self-citation is invoked as a load-bearing uniqueness theorem, and the causal-knowledge assumption is stated explicitly as an input modeling choice rather than derived. The structure is internally consistent with independent content in the bounds.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal inference by using invariant predic- tion: identification and confidence intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, 2016
2016
-
[2]
Invariant models for causal transfer learning.The Journal of Machine Learning Research, 19(1):1309–1342, 2018
Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. Invariant models for causal transfer learning.The Journal of Machine Learning Research, 19(1):1309–1342, 2018
2018
-
[3]
Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1907
-
[4]
Transportability from multiple environments with limited experiments: Completeness results.Advances in neural information processing systems, 27, 2014
Elias Bareinboim and Judea Pearl. Transportability from multiple environments with limited experiments: Completeness results.Advances in neural information processing systems, 27, 2014
2014
-
[5]
Invariance, causality and robustness.Statistical Science, 35(3):404–426, 2020
Peter Bühlmann. Invariance, causality and robustness.Statistical Science, 35(3):404–426, 2020
2020
-
[6]
Transportable representations for domain generalization.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12790–12800, Mar
Kasra Jalaldoust and Elias Bareinboim. Transportable representations for domain generalization.Proceedings of the AAAI Conference on Artificial Intelligence, 38(11):12790–12800, Mar. 2024
2024
-
[7]
Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference, 90(2):227–244, 2000
Hidetoshi Shimodaira. Improving predictive inference under covariate shift by weighting the log-likelihood function.Journal of Statistical Planning and Inference, 90(2):227–244, 2000
2000
-
[8]
Covariate shift adaptation by importance weighted cross validation.Journal of Machine Learning Research, 8(5), 2007
Masashi Sugiyama, Matthias Krauledat, and Klaus-Robert Müller. Covariate shift adaptation by importance weighted cross validation.Journal of Machine Learning Research, 8(5), 2007
2007
-
[9]
When training and test sets are different: characterizing learning transfer
Amos Storkey. When training and test sets are different: characterizing learning transfer. 2008
2008
-
[10]
Detecting and correcting for label shift with black box predictors
Zachary Lipton, Yu-Xiang Wang, and Alexander Smola. Detecting and correcting for label shift with black box predictors. InInternational conference on machine learning, pages 3122–3130. PMLR, 2018
2018
-
[11]
A unified view of label shift estimation.Advances in Neural Information Processing Systems, 2020
Saurabh Garg, Yifan Wu, Sivaraman Balakrishnan, and Zachary Lipton. A unified view of label shift estimation.Advances in Neural Information Processing Systems, 2020
2020
-
[12]
Mechanisms and the nature of causation.Erkenntnis, 44(1):49–71, 1996
Stuart S Glennan. Mechanisms and the nature of causation.Erkenntnis, 44(1):49–71, 1996
1996
-
[13]
Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000
Peter Machamer, Lindley Darden, and Carl F Craver. Thinking about mechanisms.Philosophy of science, 67(1):1–25, 2000
2000
-
[14]
Transportability of causal and statistical relations: A formal approach
Judea Pearl and Elias Bareinboim. Transportability of causal and statistical relations: A formal approach. In2011 IEEE 11th International Conference on Data Mining Workshops, pages 540–547, 2011
2011
-
[15]
From statistical transportability to estimating the effect of stochastic interventions
Juan D Correa and Elias Bareinboim. From statistical transportability to estimating the effect of stochastic interventions. InIJCAI, pages 1661–1667, 2019
2019
-
[16]
General transportability of soft interventions: Completeness results
Juan Correa and Elias Bareinboim. General transportability of soft interventions: Completeness results. Advances in Neural Information Processing Systems, 33:10902–10912, 2020
2020
-
[17]
A causal framework for distribution generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6614–6630, 2021
Rune Christiansen, Niklas Pfister, Martin Emil Jakobsen, Nicola Gnecco, and Jonas Peters. A causal framework for distribution generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):6614–6630, 2021
2021
-
[18]
Invariant causal prediction for nonlinear models.Journal of Causal Inference, 6(2):20170016, 2018
Christina Heinze-Deml, Jonas Peters, and Nicolai Meinshausen. Invariant causal prediction for nonlinear models.Journal of Causal Inference, 6(2):20170016, 2018
2018
-
[19]
Invariant causal prediction for nonlinear models.Journal of Causal Inference, 8(1):350–367, 2020
Biwei Huang, Kun Zhang, and Bernhard Schölkopf. Invariant causal prediction for nonlinear models.Journal of Causal Inference, 8(1):350–367, 2020
2020
-
[20]
Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2022
Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and Philip Yu. Generalizing to unseen domains: A survey on domain generalization.IEEE Transactions on Knowledge and Data Engineering, 2022
2022
-
[21]
On calibration and out-of-domain generalization
Yoav Wald, Amir Feder, Daniel Greenfeld, and Uri Shalit. On calibration and out-of-domain generalization. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021. 19
2021
-
[22]
Domain generalization via invariant feature representation
Krikamol Muandet, David Balduzzi, and Bernhard Schölkopf. Domain generalization via invariant feature representation. InInternational conference on Machine Learning, pages 10–18. PMLR, 2013
2013
-
[23]
Domain generalization via conditional invariant representations
Ya Li, Mingming Gong, Xinmei Tian, Tongliang Liu, and Dacheng Tao. Domain generalization via conditional invariant representations. InProceedings of the AAAI conference on artificial intelligence, volume 32, 2018
2018
-
[24]
In search of lost domain generalization
Ishaan Gulrajani and David Lopez-Paz. In search of lost domain generalization. InInternational Conference on Learning Representations, 2021
2021
-
[25]
Do causal predictors generalize better to new domains? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
Vivian Yvonne Nastl and Moritz Hardt. Do causal predictors generalize better to new domains? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
2024
-
[26]
Shanmukha Ramakrishna Vedantam, David Lopez-Paz, and David J. Schwab. An empirical investigation of domain generalization with empirical risk minimizers. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors,Advances in Neural Information Processing Systems, 2021
2021
-
[27]
Partial transportability for domain generalization
Kasra Jalaldoust, Alexis Bellot, and Elias Bareinboim. Partial transportability for domain generalization. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
2024
-
[28]
Achievable distributional robustness when the robust risk is only partially identified.Advances in Neural Information Processing Systems, 37:83915–83950, 2024
Julia Kostin, Nicola Gnecco, and Fanny Yang. Achievable distributional robustness when the robust risk is only partially identified.Advances in Neural Information Processing Systems, 37:83915–83950, 2024
2024
-
[29]
Anchor regression: Heterogeneous data meet causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021
Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor regression: Heterogeneous data meet causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021
2021
-
[30]
Xinwei Shen, Peter Bühlmann, and Armeen Taeb. Causality-oriented robustness: exploiting general additive interventions.arXiv preprint arXiv:2307.10299, 2023
-
[31]
Distributional anchor regression.Statistics and Computing, 32(3), May 2022
Lucas Kook, Beate Sick, and Peter Bühlmann. Distributional anchor regression.Statistics and Computing, 32(3), May 2022
2022
-
[32]
Distributional robustness of K-class estimators and the PULSE
Martin Emil Jakobsen and Jonas Peters. Distributional robustness of K-class estimators and the PULSE. The Econometrics Journal, 25(2):404–432, 2022
2022
-
[33]
Stabilizing variable selection and regression.The Annals of Applied Statistics, 15(3):1220–1246, 2021
Niklas Pfister, Evan G Williams, Jonas Peters, Ruedi Aebersold, and Peter Bühlmann. Stabilizing variable selection and regression.The Annals of Applied Statistics, 15(3):1220–1246, 2021
2021
-
[34]
Ievgen Redko, Emilie Morvant, Amaury Habrard, Marc Sebban, and Younès Bennani. A survey on domain adaptation theory: learning bounds and theoretical guarantees.arXiv preprint arXiv:2004.11829, 2020
-
[35]
Analysis of representations for domain adaptation
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. Analysis of representations for domain adaptation. In B. Schölkopf, J. Platt, and T. Hoffman, editors,Advances in Neural Information Processing Systems, volume 19. MIT Press, 2006
2006
-
[36]
A theory of learning from different domains.Machine learning, 79:151–175, 2010
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. A theory of learning from different domains.Machine learning, 79:151–175, 2010
2010
-
[37]
Learning bounds for importance weighting
Corinna Cortes, Yishay Mansour, and Mehryar Mohri. Learning bounds for importance weighting. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, and A. Culotta, editors,Advances in Neural Information Processing Systems, volume 23. Curran Associates, Inc., 2010
2010
-
[38]
Domain adaptation with structural correspondence learning
John Blitzer, Ryan McDonald, and Fernando Pereira. Domain adaptation with structural correspondence learning. InProceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 120–128, 2006
2006
-
[39]
Domain adaptation with multiple sources
Yishay Mansour, Mehryar Mohri, and Afshin Rostamizadeh. Domain adaptation with multiple sources. In Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc., 2008
2008
-
[40]
Domain adaptation with coupled subspaces
John Blitzer, Sham Kakade, and Dean Foster. Domain adaptation with coupled subspaces. InProceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, volume 15 ofProceedings of Machine Learning Research, pages 173–181. PMLR, 2011. 20
2011
-
[41]
Joint transfer and batch-mode active learning
Rita Chattopadhyay, Wei Fan, Ian Davidson, Sethuraman Panchanathan, and Jieping Ye. Joint transfer and batch-mode active learning. In Sanjoy Dasgupta and David McAllester, editors,Proceedings of the 30th International Conference on Machine Learning, volume 28 ofProceedings of Machine Learning Research, pages 253–261, Atlanta, Georgia, USA, 17–19 Jun 2013. PMLR
2013
-
[42]
A theory of transfer learning with applications to active learning.Machine Learning, 90, 02 2013
Liu Yang, Steve Hanneke, and Jaime Carbonell. A theory of transfer learning with applications to active learning.Machine Learning, 90, 02 2013
2013
-
[43]
Avishek Saha, Piyush Rai, Hal Daumé, Suresh Venkatasubramanian, and Scott L. DuVall. Active supervised domain adaptation. In Dimitrios Gunopulos, Thomas Hofmann, Donato Malerba, and Michalis Vazirgiannis, editors,Machine Learning and Knowledge Discovery in Databases, pages 97–112, Berlin, Heidelberg, 2011. Springer Berlin Heidelberg
2011
-
[44]
Curran Associates Inc., Red Hook, NY, USA, 2019
Steve Hanneke and Samory Kpotufe.On the value of target data in transfer learning. Curran Associates Inc., Red Hook, NY, USA, 2019
2019
-
[45]
Adaptive sample aggregation in transfer learning, 2025
Steve Hanneke and Samory Kpotufe. Adaptive sample aggregation in transfer learning, 2025
2025
-
[46]
Exploiting task relatedness for multiple task learning
Shai Ben-David and Reba Schuller. Exploiting task relatedness for multiple task learning. InProceedings of the 16th Annual Conference on Learning Theory (COLT), pages 567–580, 2003
2003
-
[47]
Impossibility theorems for domain adaptation
Shai Ben-David, Tyler Lu, Teresa Luu, and Dávid Pál. Impossibility theorems for domain adaptation. InProceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pages 129–136. JMLR Workshop and Conference Proceedings, 2010
2010
-
[48]
On the hardness of domain adaptation and the utility of unlabeled target samples
Shai Ben-David and Ruth Urner. On the hardness of domain adaptation and the utility of unlabeled target samples. In Nader H. Bshouty, Gilles Stoltz, Nicolas Vayatis, and Thomas Zeugmann, editors,Algorithmic Learning Theory, pages 139–153, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg
2012
-
[49]
Domain adaptation–can quantity compensate for quality?Annals of Mathematics and Artificial Intelligence, 70(3):185–202, 2014
Shai Ben-David and Ruth Urner. Domain adaptation–can quantity compensate for quality?Annals of Mathematics and Artificial Intelligence, 70(3):185–202, 2014
2014
-
[50]
Domain adaptation with conditional transferable components
Mingming Gong, Kun Zhang, Tongliang Liu, Dacheng Tao, Clark Glymour, and Bernhard Schölkopf. Domain adaptation with conditional transferable components. InInternational Conference on Machine Learning, pages 2839–2848. PMLR, 2016
2016
-
[51]
Conditional variance penalties and domain shift robustness
Christina Heinze-Deml and Nicolai Meinshausen. Conditional variance penalties and domain shift robustness. Machine Learning, 110(2):303–348, 2021
2021
-
[52]
Domain adaptation under structural causal models.Journal of Machine Learning Research, 22(261):1–80, 2021
Yuansi Chen and Peter Bühlmann. Domain adaptation under structural causal models.Journal of Machine Learning Research, 22(261):1–80, 2021
2021
-
[53]
Keru Wu, Yuansi Chen, Wooseok Ha, and Bin Yu. Prominent roles of conditionally invariant components in domain adaptation: Theory and algorithms.arXiv preprint arXiv:2309.10301, 2023
-
[54]
Onlearninginvariantrepresentations for domain adaptation
HanZhao, RemiTachetDesCombes, KunZhang, andGeoffreyGordon. Onlearninginvariantrepresentations for domain adaptation. InInternational conference on machine learning, pages 7523–7532. PMLR, 2019
2019
-
[55]
Support and invertibility in domain-invariant representations
Fredrik D Johansson, David Sontag, and Rajesh Ranganath. Support and invertibility in domain-invariant representations. InThe 22nd International Conference on Artificial Intelligence and Statistics, pages 527–536. PMLR, 2019
2019
-
[56]
Malte Londschien, Manuel Burger, Gunnar Rätsch, and Peter Bühlmann. Domain generalization and adaptation in intensive care with anchor regression.arXiv preprint arXiv:2507.21783, 2025
-
[57]
Optimal rates of aggregation
Alexandre B Tsybakov. Optimal rates of aggregation. InLearning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop, COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings, pages 303–313. Springer, 2003
2003
-
[58]
Kullback-leibler aggregation and misspecified generalized linear models.The Annals of Statistics, pages 639–665, 2012
Philippe Rigollet. Kullback-leibler aggregation and misspecified generalized linear models.The Annals of Statistics, pages 639–665, 2012
2012
-
[59]
Model selection for nonparametric regression.Statistica Sinica, pages 475–499, 1999
Yuhong Yang. Model selection for nonparametric regression.Statistica Sinica, pages 475–499, 1999. 21
1999
-
[60]
Progressive mixture rules are deviation suboptimal.Advances in Neural Information Processing Systems, 20, 2007
Jean-Yves Audibert. Progressive mixture rules are deviation suboptimal.Advances in Neural Information Processing Systems, 20, 2007
2007
-
[61]
Learning by mirror averaging
Anatoli Juditsky, Philippe Rigollet, and Alexandre B Tsybakov. Learning by mirror averaging. 2008
2008
-
[62]
Optimal learning with q-aggregation
Guillaume Lecué and Philippe Rigollet. Optimal learning with q-aggregation. 2014
2014
-
[63]
Proof of the optimality of the empirical star algorithm.Technical note, 2007
Jean-Yves Audibert. Proof of the optimality of the empirical star algorithm.Technical note, 2007
2007
-
[64]
Cambridge University Press, USA, 2nd edition, 2009
Judea Pearl.Causality: Models, Reasoning and Inference. Cambridge University Press, USA, 2nd edition, 2009
2009
-
[65]
MIT press, 2001
Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2001
2001
-
[66]
MIT press, 2000
Peter Spirtes, Clark N Glymour, and Richard Scheines.Causation, prediction, and search. MIT press, 2000
2000
-
[67]
Confidence sets for causal orderings.Journal of the American Statistical Association, 121(553):690–703, 2026
Y Samuel Wang, Mladen Kolar, and Mathias Drton. Confidence sets for causal orderings.Journal of the American Statistical Association, 121(553):690–703, 2026
2026
-
[68]
Yihong Gu, Cong Fang, Peter Bühlmann, and Jianqing Fan. Causality pursuit from heterogeneous environments via neural adversarial invariance learning.arXiv preprint arXiv:2405.04715, 2024
-
[69]
On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias.Artificial Intelligence, 172(16-17):1873–1896, 2008
Jiji Zhang. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias.Artificial Intelligence, 172(16-17):1873–1896, 2008
2008
-
[70]
Complete graphical charac- terization and construction of adjustment sets in markov equivalence classes of ancestral graphs.Journal of Machine Learning Research, 18(220):1–62, 2018
Emilija Perković, Johannes Textor, Markus Kalisch, and Marloes H Maathuis. Complete graphical charac- terization and construction of adjustment sets in markov equivalence classes of ancestral graphs.Journal of Machine Learning Research, 18(220):1–62, 2018
2018
-
[71]
Causal discovery from observational and interventional data across multiple environments.Advances in Neural Information Processing Systems, 36:16942–16956, 2023
Adam Li, Amin Jaber, and Elias Bareinboim. Causal discovery from observational and interventional data across multiple environments.Advances in Neural Information Processing Systems, 36:16942–16956, 2023
2023
-
[72]
Characterizationandgreedylearningofinterventionalmarkovequivalence classes of directed acyclic graphs.The Journal of Machine Learning Research, 13(1):2409–2464, 2012
AlainHauserandPeterBühlmann. Characterizationandgreedylearningofinterventionalmarkovequivalence classes of directed acyclic graphs.The Journal of Machine Learning Research, 13(1):2409–2464, 2012
2012
-
[73]
Characterizing and learning equivalence classes of causal dags under interventions
Karren Yang, Abigail Katcoff, and Caroline Uhler. Characterizing and learning equivalence classes of causal dags under interventions. InInternational Conference on Machine Learning, pages 5541–5550. PMLR, 2018
2018
-
[74]
Random design analysis of ridge regression
Daniel Hsu, Sham M Kakade, and Tong Zhang. Random design analysis of ridge regression. InConference on learning theory, pages 9–1. JMLR Workshop and Conference Proceedings, 2012
2012
-
[75]
Deviation optimal learning using greedy q-aggregation
Dong Dai, Philippe Rigollet, and Tong Zhang. Deviation optimal learning using greedy q-aggregation. 2012
2012
-
[76]
Out-of-distribution generalization via risk extrapolation (REx)
David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution generalization via risk extrapolation (REx). In International Conference on Machine Learning, pages 5815–5826. PMLR, 2021
2021
-
[77]
Hashimoto, and Percy Liang
Shiori Sagawa*, Pang Wei Koh*, Tatsunori B. Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst-case generalization. InInternational Conference on Learning Representations, 2020
2020
-
[78]
Gamella, Jonas Peters, and Peter Bühlmann
Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal chambers as a real-world physical testbed for AI methodology.Nature Machine Intelligence, 2025
2025
-
[79]
Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.Cell, 185(14):2559–2575, 2022
Joseph M Replogle, Reuben A Saunders, Angela N Pogson, Jeffrey A Hussmann, Alexander Lenail, Alina Guna, Lauren Mascibroda, Eric J Wagner, Karen Adelman, Gila Lithwick-Yanai, et al. Mapping information-rich genotype-phenotype landscapes with genome-scale Perturb-seq.Cell, 185(14):2559–2575, 2022
2022
-
[80]
Hyper-sparse optimal aggregation.The Journal of Machine Learning Research, 12:1813–1833, 2011
Stéphane Gaîffas and Guillaume Lecué. Hyper-sparse optimal aggregation.The Journal of Machine Learning Research, 12:1813–1833, 2011
2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.