pith. sign in

arxiv: 2605.16828 · v1 · pith:D5UPXKKZnew · submitted 2026-05-16 · 📊 stat.ML · cs.AI· cs.LG· stat.ME

Prediction-Intervention Games and Invariant Sets

Pith reviewed 2026-05-19 20:01 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.LGstat.ME
keywords prediction-intervention gamesstable blanketinvariant setscausal parentsStackelberg gamesdistribution generalizationstructural causal modelsworst-case risk
0
0 comments X

The pith

In prediction-intervention games, stable-blanket predictors are always at least as good as causal-parent predictors for two common follower objective classes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines a two-player Stackelberg game in which a leader selects a prediction function for response Y from covariates observed in a structural causal model, after which a follower intervenes on known target covariates to maximize their own objective. The leader may have only partial knowledge of that objective. For two standard classes of follower objectives the authors prove that any predictor built on the stable blanket—an invariant subset of covariates—achieves post-intervention performance that is never worse than a predictor built only on the causal parents of Y. They bound the leader’s risk after intervention by the worst-case risk over all permitted interventions and supply sufficient conditions under which stable-blanket predictors attain that bound, showing by counter-example that the conditions cannot be omitted in general.

Core claim

In a prediction-intervention game the leader chooses a predictor from observational data while knowing the follower’s intervention targets but possibly not the exact objective. For two common classes of follower objectives, predictors that use the stable blanket—an invariant subset of covariates—are always better or as good as predictors that use only the causal parents of Y. The post-intervention risk is upper-bounded by the worst-case risk over allowed interventions; under additional sufficient conditions the stable-blanket predictor is worst-case optimal, and these conditions are necessary in general.

What carries the argument

The stable blanket, defined as a specific invariant subset of covariates that remains stable under the permitted interventions.

If this is right

  • The leader can guarantee at least as good post-intervention performance by selecting predictors based on the stable blanket rather than the causal parents.
  • The leader’s actual post-intervention risk is bounded above by the maximum risk taken over all interventions the follower is allowed to choose.
  • Under stated sufficient conditions the stable-blanket predictor achieves the worst-case risk bound and is therefore optimal in that sense.
  • The sufficient conditions cannot be dropped in general, because counter-examples exist where they fail and the optimality claim collapses.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same invariant-set construction may yield robust predictors in other adversarial settings where a model faces strategic responses after deployment.
  • When the underlying graph is unknown, data-driven methods that approximate the stable blanket could be combined with existing causal discovery algorithms.
  • The worst-case bound supplies a concrete way to certify predictor robustness before any real intervention occurs.

Load-bearing premise

The follower’s objective belongs to one of two specific common classes.

What would settle it

A structural causal model together with a follower objective outside the two classes in which the post-intervention risk of a causal-parent predictor is strictly lower than that of the stable-blanket predictor.

Figures

Figures reproduced from arXiv: 2605.16828 by Felix Schur, Jonas Peters, Linus K\"uhne.

Figure 1
Figure 1. Figure 1: DAG in § 6.1. PA(Y ) = {X1, X2} (blue), SB(Y ) = {X1, X2, X3} (blue and red). SB(Y ) contains the non-intervened child X3, but not the intervened child X4 and its descendant X5. PA(Y ) = {X1, X2}, while the stable blan￾ket is SB(Y ) = {X1, X2, X3}. The follower may intervene on X1 and on X4. We compare three predictors (either in a population or finite￾sample version): based on PA(Y ), SB(Y ), or all varia… view at source ↗
Figure 2
Figure 2. Figure 2: Synthetic prediction-intervention games. Left two panels: linear-Gaussian SCM with [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prediction-intervention game on Causal Chambers data (§ 6.2). Follower-induced shift [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Additional nonlinear SCM results under follower perturbation. Left: train-size sweep. [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Overview of the Causal Chamber light tunnel ( [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
read the original abstract

We consider the following two-player game: using observational data, the leader chooses a prediction function for a response variable $Y$ from given covariates. The follower then reacts with an intervention on some covariates in the underlying structural causal model to maximize their own objective. The leader knows the intervention targets, but may have limited knowledge of the follower's objective. We call this setup a prediction-intervention game, a special case of a Stackelberg game. Finding an optimal strategy for the leader is generally difficult. To avoid severe performance loss, the leader may base their prediction on the causal parents of $Y$, or more generally on an invariant subset of covariates. We prove, for two common classes of follower objectives, that predictors based on the stable blanket, a specific invariant subset, are always better or as good as those based on the causal parents. We further upper bound the leader's post-intervention risk by a worst-case risk over allowed interventions and strengthen existing distribution generalization results to analyze this bound: we give sufficient conditions under which stable-blanket predictors are worst-case optimal, and show by examples that these conditions cannot in general be dropped. Finally, we discuss practical strategies for settings with known and unknown graph, and test them on simulated and real-world data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The manuscript introduces prediction-intervention games as a Stackelberg setup in which a leader selects a predictor for response Y from observational covariates while anticipating a follower's intervention on known targets in an underlying SCM to optimize the follower's objective. For two explicitly defined classes of follower objectives, the paper proves that predictors based on the stable blanket (a specific invariant subset) achieve post-intervention risk less than or equal to that of causal-parent predictors by exploiting invariance of the blanket under the follower's optimal response. It further derives an upper bound on the leader's risk via worst-case interventions, provides sufficient conditions for worst-case optimality of stable-blanket predictors (with counterexamples showing necessity), and discusses practical strategies for known and unknown graphs with empirical validation on simulated and real data.

Significance. If the central claims hold, the work supplies a theoretically grounded method for selecting robust predictors under strategic post-deployment interventions, extending distribution generalization results to a game-theoretic setting. The explicit definitions of the two follower-objective classes, the accompanying invariance lemmas, and the parameter-free risk ordering constitute a concrete advance; the empirical tests and practical strategies for graph-known/unknown cases add applicability to domains such as policy evaluation and adaptive ML systems.

major comments (2)
  1. [§4.2] §4.2, Lemmas 4.3 and 4.5: the invariance property used to establish the risk ordering for the two follower classes is shown only under the assumption that the intervention targets are exactly the variables outside the stable blanket; a brief remark on robustness when the leader's knowledge of targets is noisy would strengthen the result without altering the core proof.
  2. [§5.3] §5.3, Theorem 5.4: the sufficient conditions for worst-case optimality of the stable-blanket predictor are stated in terms of the follower's objective class and the graph; the paper should explicitly note whether these conditions remain sufficient when the SCM contains latent variables not observed in the training data.
minor comments (3)
  1. [§3 and §6] Notation for the stable blanket is introduced in §3 but reused in §6 without a forward reference; adding a brief reminder in the practical-strategies section would improve readability.
  2. [Figure 2] Figure 2 caption states 'risk vs. intervention strength' but the x-axis label is 'intervention magnitude'; harmonizing the terminology would prevent minor confusion.
  3. [§7.2] The real-world data experiment in §7.2 reports results for one dataset; a short sentence on sensitivity to the choice of that dataset would be useful.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. The comments are helpful and we address each one below, agreeing to incorporate clarifications that strengthen the presentation.

read point-by-point responses
  1. Referee: [§4.2] §4.2, Lemmas 4.3 and 4.5: the invariance property used to establish the risk ordering for the two follower classes is shown only under the assumption that the intervention targets are exactly the variables outside the stable blanket; a brief remark on robustness when the leader's knowledge of targets is noisy would strengthen the result without altering the core proof.

    Authors: We agree that the invariance in Lemmas 4.3 and 4.5 is established under exact knowledge of the intervention targets. We will add a short remark in §4.2 after the lemmas, observing that when the leader's knowledge of targets contains limited noise, the risk ordering continues to hold approximately provided the risk functions are continuous in the intervention parameters; the exact inequality, however, requires precise target identification. This addition clarifies the scope without modifying the proofs. revision: yes

  2. Referee: [§5.3] §5.3, Theorem 5.4: the sufficient conditions for worst-case optimality of the stable-blanket predictor are stated in terms of the follower's objective class and the graph; the paper should explicitly note whether these conditions remain sufficient when the SCM contains latent variables not observed in the training data.

    Authors: The manuscript defines the stable blanket and the risk bounds with respect to the observed covariates in the SCM. The sufficient conditions of Theorem 5.4 rely on the invariance properties holding for the observed variables. We will insert an explicit clarifying sentence in §5.3 stating that the conditions remain sufficient when there are no latent variables that directly affect both the intervention targets and Y in a manner that breaks the relevant conditional independences; we will also note that the presence of such latents would require additional assumptions for the result to carry over. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper formalizes a Stackelberg prediction-intervention game and proves, via internal lemmas on invariance, that stable-blanket predictors are at least as good as causal-parent predictors for two explicitly defined classes of follower objectives. These classes and the risk-ordering inequalities are introduced and derived directly in the manuscript from the structural causal model and intervention targets, without reducing to fitted parameters, self-definitional loops, or load-bearing self-citations. Existing distribution-generalization results are strengthened with new sufficient conditions that are shown to be non-redundant by counterexamples; all steps remain externally falsifiable against the stated assumptions and do not collapse to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 2 invented entities

The central claims rest on the existence of a structural causal model, knowledge of intervention targets, and the restriction of follower objectives to two unspecified common classes. No free parameters or invented entities beyond the stable blanket are introduced in the abstract.

axioms (2)
  • domain assumption The data are generated by a structural causal model with known intervention targets.
    Invoked when defining the leader's knowledge and the follower's reaction.
  • ad hoc to paper Follower objectives belong to two common classes for which the stable-blanket comparison holds.
    The proof is stated only for these two classes; their exact form is not given in the abstract.
invented entities (2)
  • stable blanket no independent evidence
    purpose: Invariant subset of covariates that yields predictors at least as good as causal parents under the stated conditions.
    Introduced as the key variable set for robust prediction; no independent evidence outside the paper is provided in the abstract.
  • prediction-intervention game no independent evidence
    purpose: Stackelberg game modeling leader prediction and follower intervention.
    New framing; no external falsifiable handle given in the abstract.

pith-pipeline@v0.9.0 · 5756 in / 1499 out tokens · 27340 ms · 2026-05-19T20:01:42.584525+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 3 internal anchors

  1. [1]

    Invariant Risk Minimization

    Martin Arjovsky, Léon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant Risk Mini- mization.arXiv e-prints (1907.02893), 2019

  2. [2]

    Geometric and Computational Hardness of Bilevel Programming.Mathematical Pro- gramming, 215(1):539–574, 2026

    Bolte. Geometric and Computational Hardness of Bilevel Programming.Mathematical Pro- gramming, 215(1):539–574, 2026

  3. [3]

    Stephan Bongers, Patrick Forré, Jonas Peters, and Joris M. Mooij. Foundations of Structural Causal Models with Cycles and Latent Variables.The Annals of Statistics, 49(5):2885–2915, 2021

  4. [4]

    Stackelberg Games for Adversarial Prediction Prob- lems

    Michael Brückner and Tobias Scheffer. Stackelberg Games for Adversarial Prediction Prob- lems. InProceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 547–555, 2011. 10

  5. [5]

    Invariance, Causality and Robustness.Statistical Science, 35(3):404–426, 2020

    Peter Bühlmann. Invariance, Causality and Robustness.Statistical Science, 35(3):404–426, 2020

  6. [6]

    Fair lending needs explainable models for responsible recommendation

    Jiahao Chen. Fair Lending Needs Explainable Models for Responsible Recommendation. arXiv e-prints (1809.04684), 2018

  7. [7]

    Linear Classifiers that Encourage Constructive Adap- tation.arXiv e-prints (2011.00355), 2021

    Yatong Chen, Jialu Wang, and Yang Liu. Linear Classifiers that Encourage Constructive Adap- tation.arXiv e-prints (2011.00355), 2021

  8. [8]

    A Causal Framework for Distribution Generalization.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 44(10):6614–6630, 2022

    Rune Christiansen, Niklas Pfister, Martin Jakobsen, Nicola Gnecco, and Jonas Peters. A Causal Framework for Distribution Generalization.IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 44(10):6614–6630, 2022

  9. [9]

    Computing the Optimal Strategy to Commit To

    Vincent Conitzer and Tuomas Sandholm. Computing the Optimal Strategy to Commit To. In Proceedings of the 7th ACM Conference on Electronic Commerce, pages 82–90, 2006

  10. [10]

    Consumer Financial Protection Bureau. Consumer Financial Protection Circular 2022-03: Ad- verse action notification requirements in connection with credit decisions based on complex algorithms.https://www.consumerfinance.gov/compliance/circulars/circula r-2022-03-adverse-action-notification-requirements-in-connection-wit h-credit-decisions-based-on-comple...

  11. [11]

    DeLong, David M

    Elizabeth R. DeLong, David M. DeLong, and Daniel L. Clarke-Pearson. Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.Biometrics, 44(3):837–845, 1988

  12. [12]

    Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness

    Maayan Ehrenberg, Roy Ganz, and Nir Rosenfeld. Adversaries With Incentives: A Strategic Alternative to Adversarial Robustness. InInternational Conference on Learning Representa- tions, 2025

  13. [13]

    Maximum Risk Minimization with Random Forests.arXiv e-prints (2512.10445), 2025

    Francesco Freni, Anya Fries, Linus Kühne, Markus Reichstein, and Jonas Peters. Maximum Risk Minimization with Random Forests.arXiv e-prints (2512.10445), 2025

  14. [14]

    Gamella, Jonas Peters, and Peter Bühlmann

    Juan L. Gamella, Jonas Peters, and Peter Bühlmann. Causal Chambers as a Real-World Phys- ical Testbed for AI Methodology.Nature Machine Intelligence, 7(1):107–118, 2025. See updated causal graph underhttps://cchamber-box.s3.eu-central-2.amazonaws.com /gt_graph_lt_mk2_standard.pdf. Accessed: May 6, 2026

  15. [15]

    Boosted Control Func- tions: Distribution Generalization and Invariance in Confounded Models.Journal of Machine Learning Research, 27(46):1–57, 2026

    Nicola Gnecco, Jonas Peters, Sebastian Engelke, and Niklas Pfister. Boosted Control Func- tions: Distribution Generalization and Invariance in Confounded Models.Journal of Machine Learning Research, 27(46):1–57, 2026

  16. [16]

    Making and Evaluating Point Forecasts.Journal of the American Statistical Association, 106(494):746–762, 2011

    Tilmann Gneiting. Making and Evaluating Point Forecasts.Journal of the American Statistical Association, 106(494):746–762, 2011

  17. [17]

    Strictly Proper Scoring Rules, Prediction, and Esti- mation.Journal of the American Statistical Association, 102(477):359–378, 2007

    Tilmann Gneiting and Adrian E Raftery. Strictly Proper Scoring Rules, Prediction, and Esti- mation.Journal of the American Statistical Association, 102(477):359–378, 2007

  18. [18]

    Performative Prediction: Past and Future.Sta- tistical Science, 40(3):417–436, 2025

    Moritz Hardt and Celestine Mendler-Dünner. Performative Prediction: Past and Future.Sta- tistical Science, 40(3):417–436, 2025

  19. [19]

    Strategic Classi- fication

    Moritz Hardt, Nimrod Megiddo, Christos Papadimitriou, and Mary Wootters. Strategic Classi- fication. InProceedings of the 2016 ACM Conference on Innovations in Theoretical Computer Science, pages 111–122, 2016

  20. [20]

    Invariant Causal Prediction for Nonlinear Models.Journal of Causal Inference, 6(2):20170016, 2018

    Christina Heinze-Deml, Jonas Peters, and Nicolai Meinshausen. Invariant Causal Prediction for Nonlinear Models.Journal of Causal Inference, 6(2):20170016, 2018

  21. [21]

    Causal Strategic Classification: A Tale of Two Shifts

    Guy Horowitz and Nir Rosenfeld. Causal Strategic Classification: A Tale of Two Shifts. In Proceedings of the 40th International Conference on Machine Learning, pages 13233–13253, 2023

  22. [22]

    Jordan, and Jacob Steinhardt

    Xinyan Hu, Meena Jagadeesan, Michael I. Jordan, and Jacob Steinhardt. Incentivizing High- Quality Content in Online Recommender Systems.arXiv e-prints (2306.07479), 2023

  23. [23]

    Springer International Publishing, Cham, 3rd edition, 2021

    Olav Kallenberg.Foundations of Modern Probability. Springer International Publishing, Cham, 3rd edition, 2021

  24. [24]

    Does Invariant Risk Minimization Capture Invariance? InProceedings of the 24th International Conference on Artificial Intelligence and Statistics, pages 4069–4077, 2021

    Pritish Kamath, Akilesh Tangella, Danica Sutherland, and Nathan Srebro. Does Invariant Risk Minimization Capture Invariance? InProceedings of the 24th International Conference on Artificial Intelligence and Statistics, pages 4069–4077, 2021. 11

  25. [25]

    Kingma and Jimmy Ba

    Diederik P. Kingma and Jimmy Ba. Adam: A Method for Stochastic Optimization. InInter- national Conference on Learning Representations, 2015

  26. [26]

    How Do Classifiers Induce Agents To Invest Effort Strategically? InProceedings of the 2019 ACM Conference on Economics and Computation, pages 825–844, 2019

    Jon Kleinberg and Manish Raghavan. How Do Classifiers Induce Agents To Invest Effort Strategically? InProceedings of the 2019 ACM Conference on Economics and Computation, pages 825–844, 2019

  27. [27]

    Lucas Kook.tramicp: Model-Based Causal Feature Selection for General Response Types,

  28. [28]

    R package version 0.1-0

  29. [29]

    Model-Based Causal Feature Selection for General Response Types.Journal of the American Statistical Association, 120(550):1090–1101, 2025

    Lucas Kook, Sorawit Saengkyongam, Anton Rask Lundborg, Torsten Hothorn, and Jonas Pe- ters. Model-Based Causal Feature Selection for General Response Types.Journal of the American Statistical Association, 120(550):1090–1101, 2025

  30. [30]

    Distributionally Robust Optimiza- tion.Acta Numerica, 34:579–804, 2025

    Daniel Kuhn, Soroosh Shafiee, and Wolfram Wiesemann. Distributionally Robust Optimiza- tion.Acta Numerica, 34:579–804, 2025

  31. [31]

    Causal Prediction Can Induce Performative Stability

    Bogdan Kulynych. Causal Prediction Can Induce Performative Stability. InICML 2022: Workshop on Spurious Correlations, Invariance and Stability, 2022

  32. [32]

    The Role of Monitoring Effect in Risk Classification: Evidence from Telematics Adoption.Management Science, 71(12):10122– 10143, 2025

    Ho Cheung Brian Lee, Xinxin Li, and Siyuan Liu. The Role of Monitoring Effect in Risk Classification: Evidence from Telematics Adoption.Management Science, 71(12):10122– 10143, 2025

  33. [33]

    Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions

    Sara Magliacane, Thijs van Ommen, Tom Claassen, Stephan Bongers, Philip Versteeg, and Joris Mooij. Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions. InProceedings of the 32nd International Conference on Neural Information Processing Systems, pages 10869–10879, 2018

  34. [34]

    Causality from a Distributional Robustness Point of View

    Nicolai Meinshausen. Causality from a Distributional Robustness Point of View. In2018 IEEE Data Science Workshop (DSW), pages 6–10, 2018

  35. [35]

    Anticipating Performativity by Predicting from Predictions

    Celestine Mendler-Dünner, Frances Ding, and Yixin Wang. Anticipating Performativity by Predicting from Predictions. InProceedings of the 36th International Conference on Neural Information Processing Systems, pages 31171–31185, 2022

  36. [36]

    Strategic Classification is Causal Modeling in Disguise

    John Miller, Smitha Milli, and Moritz Hardt. Strategic Classification is Causal Modeling in Disguise. InProceedings of the 37th International Conference on Machine Learning, pages 6917–6926, 2020

  37. [37]

    Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2nd revised printing edition, 1988

    Judea Pearl.Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, Inc., San Francisco, CA, 2nd revised printing edition, 1988

  38. [38]

    Cambridge University Press, New York, NY , 2nd edition, 2009

    Judea Pearl.Causality: Models, Reasoning, and Inference. Cambridge University Press, New York, NY , 2nd edition, 2009

  39. [39]

    Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research, 12(85):2825–2830, 2011

    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine Learning in Python.Journal of Machine Learning Research...

  40. [40]

    Performative Pre- diction

    Juan Perdomo, Tijana Zrnic, Celestine Mendler-Dünner, and Moritz Hardt. Performative Pre- diction. InProceedings of the 37th International Conference on Machine Learning, pages 7599–7609, 2020

  41. [41]

    Jonas Peters, Peter Bühlmann, and Nicolai Meinshausen. Causal Inference by Using Invariant Prediction: Identification and Confidence Intervals.Journal of the Royal Statistical Society Series B: Statistical Methodology, 78(5):947–1012, 2016

  42. [42]

    Williams, Jonas Peters, Ruedi Aebersold, and Peter Bühlmann

    Niklas Pfister, Evan G. Williams, Jonas Peters, Ruedi Aebersold, and Peter Bühlmann. Stabi- lizing Variable Selection and Regression.The Annals of Applied Statistics, 15(3):1220–1246, 2021

  43. [43]

    Invariant Models for Causal Transfer Learning.Journal of Machine Learning Research, 19(1):1–34, 2018

    Mateo Rojas-Carulla, Bernhard Schölkopf, Richard Turner, and Jonas Peters. Invariant Models for Causal Transfer Learning.Journal of Machine Learning Research, 19(1):1–34, 2018

  44. [44]

    The Risks of Invariant Risk Minimization

    Elan Rosenfeld, Pradeep Kumar Ravikumar, and Andrej Risteski. The Risks of Invariant Risk Minimization. InInternational Conference on Learning Representations, 2021. 12

  45. [45]

    Anchor Regression: Heterogeneous Data Meet Causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021

    Dominik Rothenhäusler, Nicolai Meinshausen, Peter Bühlmann, and Jonas Peters. Anchor Regression: Heterogeneous Data Meet Causality.Journal of the Royal Statistical Society Series B: Statistical Methodology, 83(2):215–246, 2021

  46. [46]

    Exploiting In- dependent Instruments: Identification and Distribution Generalization

    Sorawit Saengkyongam, Leonard Henckel, Niklas Pfister, and Jonas Peters. Exploiting In- dependent Instruments: Identification and Distribution Generalization. InProceedings of the 39th International Conference on Machine Learning, pages 18935–18958, 2022

  47. [47]

    Invariant Policy Learning: A Causal Perspective.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(7):8606–8620, 2023

    Sorawit Saengkyongam, Nikolaj Thams, Jonas Peters, and Niklas Pfister. Invariant Policy Learning: A Causal Perspective.IEEE Transactions on Pattern Analysis and Machine Intelli- gence, 45(7):8606–8620, 2023

  48. [48]

    Hashimoto, and Percy Liang

    Shiori Sagawa, Pang Wei Koh, Tatsunori B. Hashimoto, and Percy Liang. Distributionally Robust Neural Networks. InInternational Conference on Learning Representations, 2020

  49. [49]

    Lamb, Dun- can Watson-Parris, Paula Harder, and Nis Meinert

    Emiliano Díaz Salas-Porras, Kenza Tazi, Ashwin Braude, Daniel Okoh, Kara D. Lamb, Dun- can Watson-Parris, Paula Harder, and Nis Meinert. Identifying the Causes of Pyrocumulonim- bus (PyroCb).arXiv e-prints (2211.08883), 2022

  50. [50]

    The Weighted Generalised Covari- ance Measure.Journal of Machine Learning Research, 23(273):1–68, 2022

    Cyrill Scheidegger, Julia Hörrmann, and Peter Bühlmann. The Weighted Generalised Covari- ance Measure.Journal of Machine Learning Research, 23(273):1–68, 2022

  51. [51]

    Shah and Jonas Peters

    Rajen D. Shah and Jonas Peters. The Hardness of Conditional Independence Testing and the Generalised Covariance Measure.The Annals of Statistics, 48(3):1514–1538, 2020

  52. [52]

    Causal Strategic Linear Regression

    Yonadav Shavit, Benjamin Edelman, and Brian Axelrod. Causal Strategic Linear Regression. InProceedings of the 37th International Conference on Machine Learning, pages 8676–8686, 2020

  53. [53]

    Causality-Oriented Robustness: Exploiting General Noise Interventions.Journal of the American Statistical Association, 121(553):704– 715, 2026

    Xinwei Shen, Peter Bühlmann, and Armeen Taeb. Causality-Oriented Robustness: Exploiting General Noise Interventions.Journal of the American Statistical Association, 121(553):704– 715, 2026

  54. [54]

    MIT Press, Cambridge, MA, 2nd edition, 2001

    Peter Spirtes, Clark Glymour, and Richard Scheines.Causation, Prediction, and Search. MIT Press, Cambridge, MA, 2nd edition, 2001

  55. [55]

    A Unifying Causal Framework for An- alyzing Dataset Shift-stable Learning Algorithms.Journal of Causal Inference, 10(1):64–89, 2022

    Adarsh Subbaswamy, Bryant Chen, and Suchi Saria. A Unifying Causal Framework for An- alyzing Dataset Shift-stable Learning Algorithms.Journal of Causal Inference, 10(1):64–89, 2022

  56. [56]

    Julius Springer, Wien und Berlin, 1934

    Heinrich von Stackelberg.Marktform und Gleichgewicht. Julius Springer, Wien und Berlin, 1934

  57. [57]

    Stable Blanket with Hidden Variables and Cycles

    Hanqing Xiang. Stable Blanket with Hidden Variables and Cycles.arXiv e-prints (2605.01856), 2026

  58. [58]

    Discovering Optimal Scoring Mechanisms in Causal Strategic Prediction.arXiv e-prints (2302.06804), 2023

    Tom Yan, Shantanu Gupta, and Zachary Lipton. Discovering Optimal Scoring Mechanisms in Causal Strategic Prediction.arXiv e-prints (2302.06804), 2023

  59. [59]

    An Introduction to Bilevel Optimization: Foundations and Applications in Signal Pro- cessing and Machine Learning.IEEE Signal Processing Magazine, 41(1):38–59, 2024

    Yihua Zhang, Prashant Khanduri, Ioannis Tsaknakis, Yuguang Yao, Mingyi Hong, and Sijia Liu. An Introduction to Bilevel Optimization: Foundations and Applications in Signal Pro- cessing and Machine Learning.IEEE Signal Processing Magazine, 41(1):38–59, 2024. 13 Appendix contents A Extended related work 14 B Additional theoretical results 15 C Stabilized re...

  60. [60]

    Instead of modeling a follower optimizing its own utility, the deployment-induced shift inXis encoded directly in the structural assignments of the SCM

    extends an SCM over(X, Y)by including the deployed predictor as an additional nodeM, with edgesM→Xbut no edge toY. Instead of modeling a follower optimizing its own utility, the deployment-induced shift inXis encoded directly in the structural assignments of the SCM. In this setting, they study conditions under which the Markov-blanket predictor remains o...

  61. [61]

    The classifierf {1} achieves3/4accuracy in every environment

    = 1/4for alle∈ E. The classifierf {1} achieves3/4accuracy in every environment. IRM: consider the representationΦ(x) :=x 2. AlthoughP e(Y= 1|X 2 = 1)varies acrossE tr, it stays above1/2in every training environment, so the classifierw= idis Bayes-optimal under the 0–1loss in alle∈ E tr. The resulting accuracy ranges from9/10to7/10acrossE tr, exceeding the...

  62. [62]

    The chamber measures the infrared reading ir_1, and the target is defined asY :=1{ir_1> τ}, whereτ= 12,500corresponds approximately to the median ofir_1

    Target generation: TheRGBvalues are sampled. The chamber measures the infrared reading ir_1, and the target is defined asY :=1{ir_1> τ}, whereτ= 12,500corresponds approximately to the median ofir_1

  63. [63]

    27 Table 2: Interventions in different environments

    Downstream interventions: Given the value ofY, we intervene onir_3andvis_3by setting led_3_ir :=β led 0 +Y·β led 1 ,pol_2 :=β pol 0 +Y·β pol 1 , 7https://github.com/juangamella/causal-chamber-package 8The intensity of light passing through both polarizers is proportional tocos 2(pol_1−pol_2). 27 Table 2: Interventions in different environments. The coeffi...