pith. sign in

arxiv: 2603.06984 · v2 · pith:4R3QHBT3new · submitted 2026-03-07 · 📊 stat.ML · cs.AI· cs.GT· cs.LG· cs.SI

Masking Causality and Conditional Dependence

Pith reviewed 2026-05-21 12:09 UTC · model grok-4.3

classification 📊 stat.ML cs.AIcs.GTcs.LGcs.SI
keywords causal maskingconditional independenceaveraged constraintspath-specific fairnesslinear programmingpolicy optimizationstratum-wise violations
0
0 comments X

The pith

Averaged constraints on conditional effects almost surely yield policies that satisfy the average but violate the independence requirement within individual strata.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies enforcement of rules requiring a prohibited variable to affect decisions only through an allowed channel, which amounts to a conditional-independence condition. Regulators commonly impose this via a single averaged constraint across subgroups rather than checking each subgroup separately. The analysis demonstrates that optimizing under the averaged constraint produces policies satisfying the average exactly while violating the per-stratum condition with probability one. These policies become more advantageous as confounding and outcome differences across strata grow, yet detecting the violations requires the same conditional-independence tests that averaging is intended to avoid. The same setup shows that such masked policies capture most of the gains from ignoring the rule while remaining harder to spot, indicating that aggregate statistics alone are insufficient for meaningful regulation.

Core claim

Formulating causal masking as a linear program on the regulator's side shows that averaged-constraint optimization almost surely produces policies violating the stratum-wise requirement while satisfying the averaged constraint exactly. The gains from this masking increase with confounding and outcome heterogeneity. On the optimizer's side, the same construction shows masked policies recover most of the reward of unconstrained exploitation while being far harder to detect, since detection would require the conditional-independence tests averaging seeks to bypass. These results establish that regulating direct dependence through averaged statistics on observed decisions is structurally limited

What carries the argument

The linear-program formulation of causal masking that places the averaged constraint on the conditional effect of the prohibited variable across strata.

If this is right

  • The advantage of masked policies over stratum-wise enforcement grows as confounding and outcome heterogeneity increase.
  • Detection of violations requires conditional-independence tests that the averaged-constraint approach is designed to circumvent.
  • Masked policies achieve nearly the performance of fully unconstrained policies while evading detection more effectively.
  • Meaningful regulation of direct dependence requires constraints applied at the level of the decision rule itself rather than aggregate observed statistics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • In applications such as path-specific fairness, averaged metrics may permit hidden influences within subgroups that average statistics overlook.
  • Regulators dealing with classified information or insider trading could benefit from auditing the full decision rule instead of relying on summary statistics alone.
  • The linear-program construction suggests similar masking effects could appear in other constrained optimization settings with heterogeneous populations.

Load-bearing premise

The enforcement of the conditional-independence requirement admits a linear-program formulation in which the constraint is imposed on the average conditional effect rather than enforced separately within each stratum.

What would settle it

A concrete instance or numerical optimization in which the solution to the averaged-constraint linear program satisfies the aggregate conditional effect exactly but produces a positive violation probability within at least one stratum.

Figures

Figures reproduced from arXiv: 2603.06984 by Bijan Mazaheri, Sophia Xiao, Zou Yang.

Figure 1
Figure 1. Figure 1: Two causal diagrams depicting the depen [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Two relaxations that allow improvements from causal masking (on the left), and one where there is no improvement from fairness (on the right). 5.3 The Role of Regulation Testing for non-zero ATEs can be done with a simple z-test whose sample complexity does not depend on k, so the statistical power does not get worse when k in￾creases. Causally masked solutions require a full het￾erogeneity test, which par… view at source ↗
Figure 3
Figure 3. Figure 3: Synthetic data experiments showing that relaxing masking and fairness yield tempting performance [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Simulations showing the average (+- standard error) sample size needed to reject each null hy [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

Many regulatory and analytic problems require that a prohibited variable influence a decision only through a designated allowable channel -- a conditional-independence requirement that arises in path-specific fairness, the handling of classified information, and the regulation of trading on non-public information, among other settings. Such requirements may be enforced either stratum-by-stratum or, more commonly (and more efficiently), through a single averaged constraint on the conditional effect. We study the resulting enforcement problem from two perspectives. From the regulator's side, we formulate causal masking as a linear program and show that averaged-constraint optimization almost surely produces policies that violate the stratum-wise requirement while satisfying the averaged one exactly. The gains from masking grow with confounding and outcome heterogeneity, and detection requires precisely the conditional-independence tests that average constraints aim to avoid. From the optimizer's side, the same construction shows that masked policies recover most of the reward of unconstrained exploitation while being far harder to detect, making them attractive in any setting where the basis of decisions is itself sensitive. Together, these results argue that regulating direct dependence through averaged statistics on observed decisions is structurally limited, and that meaningful enforcement must operate at the level of the decision rule itself.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript studies enforcement of conditional-independence requirements (e.g., path-specific fairness or regulation of sensitive information) in decision policies. It formulates causal masking as a linear program and claims that optimizing under a single averaged constraint on the conditional effect across strata almost surely produces policies that satisfy the averaged constraint exactly while violating the stratum-wise requirement. The magnitude of the masking gains increases with confounding strength and outcome heterogeneity. From the optimizer's perspective, the same construction shows that masked policies recover most of the reward of unconstrained policies while being substantially harder to detect. The paper concludes that regulating direct dependence via averaged observed statistics is structurally limited and that meaningful enforcement must operate at the level of the decision rule itself.

Significance. If the central claims are rigorously established, the work identifies a concrete limitation of averaged-constraint approaches to causal regulation in machine learning and decision systems. It supplies an LP formulation that can be used to analyze the gap between averaged and stratum-wise enforcement, together with a generic (almost-sure) violation result and a dual perspective on detection difficulty. These elements would be useful for fairness auditing, privacy regulation, and the design of decision rules that must remain opaque to certain tests.

major comments (2)
  1. [Abstract / regulator's side LP construction] Abstract and regulator-side construction: the claim that averaged-constraint optimization 'almost surely' produces policies violating the stratum-wise requirement is load-bearing for the central thesis. The probability space (measure over reward functions, confounding strengths, or outcome distributions) on which this 'almost surely' is taken must be defined explicitly, together with the argument that the set of objectives for which an optimum satisfies both constraints has measure zero. Without this, the result reduces to an existence statement rather than a generic one.
  2. [Linear-program formulation] LP formulation section: the explicit linear program for causal masking, the precise statement of the averaged constraint on the conditional effect, and the proof that its feasible set strictly contains policies violating per-stratum conditional independence are not visible in the provided material. These derivations are required to support both the regulator-side violation result and the optimizer-side reward-recovery claim.
minor comments (2)
  1. [Introduction / notation] Clarify early in the paper the precise mathematical definitions of 'conditional effect', 'stratum-wise requirement', and 'causal masking' with consistent notation.
  2. [Theoretical results] If the full proofs appear only in an appendix, consider adding a short proof sketch or key steps in the main text to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The comments correctly identify points where additional explicitness will strengthen the presentation. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: Abstract and regulator-side construction: the claim that averaged-constraint optimization 'almost surely' produces policies violating the stratum-wise requirement is load-bearing for the central thesis. The probability space (measure over reward functions, confounding strengths, or outcome distributions) on which this 'almost surely' is taken must be defined explicitly, together with the argument that the set of objectives for which an optimum satisfies both constraints has measure zero. Without this, the result reduces to an existence statement rather than a generic one.

    Authors: We agree that the probability space must be stated explicitly. In the revision we will define it as the uniform (Lebesgue) measure on a compact subset of the space of bounded continuous reward functions and confounding strengths. We will prove that the set of objectives for which an optimum of the averaged LP also satisfies all stratum-wise constraints is a proper algebraic subvariety of positive codimension and therefore has measure zero under this measure. This converts the result from an existence claim to the generic statement asserted in the abstract. revision: yes

  2. Referee: LP formulation section: the explicit linear program for causal masking, the precise statement of the averaged constraint on the conditional effect, and the proof that its feasible set strictly contains policies violating per-stratum conditional independence are not visible in the provided material. These derivations are required to support both the regulator-side violation result and the optimizer-side reward-recovery claim.

    Authors: The manuscript contains a dedicated LP section, but we accept that the excerpt supplied to the referee may have omitted the full derivations. We will expand the section to display the complete linear program (variables are stratum-specific decision probabilities, objective is expected reward, constraints are the single averaged conditional-effect equality together with probability simplex constraints), state the averaged constraint explicitly as the linear equality summing stratum-weighted conditional effects to zero, and supply the short constructive proof that any solution with opposing nonzero stratum effects lies in the feasible set while violating per-stratum independence. The same construction yields the quantitative reward-recovery bounds used for the optimizer-side claims. revision: yes

Circularity Check

0 steps flagged

No circularity: LP formulation and almost-sure claim are self-contained

full rationale

The paper formulates causal masking directly as a linear program whose feasible set for the averaged constraint strictly contains policies violating per-stratum conditional independence. The almost-sure violation result follows from the geometry of that feasible set under generic objectives; no step defines a quantity in terms of itself, renames a fitted parameter as a prediction, or relies on a load-bearing self-citation whose content reduces to the present claim. The derivation remains independent of any data-fitting procedure or prior result by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the ability to cast causal masking as a linear program with an averaged conditional-effect constraint; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)
  • domain assumption Causal masking can be expressed as a linear program whose feasible set encodes the averaged conditional-independence constraint.
    Invoked when the regulator's side formulates the enforcement problem.

pith-pipeline@v0.9.0 · 5743 in / 1130 out tokens · 58248 ms · 2026-05-21T12:09:54.521456+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    URLhttps: //dl.acm.org/doi/10.1145/3188745.3188756

    doi: 10.1145/3188745.3188756. URLhttps: //dl.acm.org/doi/10.1145/3188745.3188756. Silvia Chiappa. Path-specific counterfactual fairness. InProceedings of the AAAI conference on artificial intelligence, volume 33, pages 7801–7808, 2019. Alexandra Chouldechova. Fair Prediction with Dis- parate Impact: A Study of Bias in Recidivism Pre- diction Instruments.B...

  2. [2]

    doi: 10.1089/big.2016

    ISSN 2167-6461. doi: 10.1089/big.2016

  3. [3]

    Big Data 5(2), 153–163 (2017) https://doi.org/10.1089/big.2016.0047

    URLhttps://www.liebertpub.com/doi/ abs/10.1089/big.2016.0047. Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, and Aziz Huq. Algorithmic Deci- sion Making and the Cost of Fairness. InProceed- ings of the 23rd ACM SIGKDD International Con- ference on Knowledge Discovery and Data Min- ing, KDD ’17, pages 797–806, New York, NY, USA, August 2017. As...

  4. [4]

    Google-Books-ID: Bf1tBwAAQBAJ

    ISBN 978-0-521-88588-1. Google-Books-ID: Bf1tBwAAQBAJ. Kwangho Kim and Jose R. Zubizarreta. Fair and Robust Estimation of Heterogeneous Treat- ment Effects for Policy Learning. InProceed- ings of the 40th International Conference on Ma- chine Learning, pages 16997–17014. PMLR, July

  5. [5]

    Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva

    URLhttps://proceedings.mlr.press/ v202/kim23ab.html. Matt J Kusner, Joshua Loftus, Chris Russell, and Ricardo Silva. Counterfactual Fairness. In I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Gar- nett, editors,Advances in Neural Information Processing Systems, volume 30. Curran Asso- ciates, Inc., 2017. URLhttps://pro...

  6. [6]

    ISSN: 2640-3498

    URLhttps://proceedings.mlr.press/ v258/mazaheri25a.html. ISSN: 2640-3498. John H. McDonald.Handbook of Biological Statistics. Sparky House Publishing, Baltimore, Maryland, 3 edition, 2014. Web page version contains content of pages 77–85 from the printed book. Christopher Meek. Strong completeness and faith- fulness in Bayesian networks. InProceedings of ...

  7. [7]

    doi: 10.1609/aaai.v32i1

    ISSN 2374-3468. doi: 10.1609/aaai.v32i1. 11553. URLhttps://ojs.aaai.org/index.php/ AAAI/article/view/11553. Razieh Nabi, Daniel Malinsky, and Ilya Shpitser. Learning Optimal Fair Policies. InProceed- ings of the 36th International Conference on Ma- chine Learning, pages 4674–4682. PMLR, May

  8. [8]

    Judea Pearl.Causality

    URLhttps://proceedings.mlr.press/ v97/nabi19a.html. Judea Pearl.Causality. Cambridge university press, 2009. Krishna Pendakur and Simon Woodcock. Glass Ceil- ings or Glass Doors? Wage Disparity Within and Between Firms.Journal of Business & Economic Statistics, 28(1):181–189, January 2010. ISSN 0735-0015. doi: 10.1198/jbes.2009.08124. URL https://doi.org/...

  9. [9]

    URLhttp://www

    doi: 10.1561/2200000106. URLhttp://www. nowpublishers.com/article/Details/MAL-106. Donald B. Rubin. Causal Inference Using Potential Outcomes: Design, Modeling, Decisions.Journal of the American Statistical Association, 100(469): 322–331, 2005. ISSN 0162-1459. URLhttps:// www.jstor.org/stable/27590541. Jakob Runge. Conditional independence testing based o...

  10. [10]

    The utility is: W(D fair) =ρ·w avg i

    Utility of the Optimal Fair Policy:Under the smallρassumption, the optimal fair policy is to participate only in stratumiwith ratesα i,1 =α i,0 =ρ/Pr(X=i). The utility is: W(D fair) =ρ·w avg i

  11. [11]

    Step A: Define the Unscaled Arbitrage Policy (D ′).Let’s construct an intermediate policyD ′ that has non-zero participation only for ratesα ′ i,1 andα ′ j,0

    Construct a Valid Masking Candidate Policy,D ′′ p,j:We build a candidate policy from scratch that is inspired by an arbitrage between states (i,1) and (j,0) (the argument for (i,0) and (j,1) is symmetric). Step A: Define the Unscaled Arbitrage Policy (D ′).Let’s construct an intermediate policyD ′ that has non-zero participation only for ratesα ′ i,1 andα...

  12. [12]

    The performance gap ∆ is therefore bounded by: ∆ =W(D mask)− W(D fair) ⩾max j̸=i,p∈{0,1} W(D ′′ p,j) − W(D fair) =ρ·max j̸=i,p∈{0,1} {Rp(j)} −ρ·w avg i

    Bounding the Gap:The optimal masking policy’s utility,W(D mask), must be at least as high as the utility of the best of these constructed candidates. The performance gap ∆ is therefore bounded by: ∆ =W(D mask)− W(D fair) ⩾max j̸=i,p∈{0,1} W(D ′′ p,j) − W(D fair) =ρ·max j̸=i,p∈{0,1} {Rp(j)} −ρ·w avg i . Factoring outρand ensuring the gap is non-negative co...

  13. [13]

    freedom to operate

    Vol(V mask(ε))∝ε 1 16 Proof (Sketch).Theε-fair problem constrains the policy vector⃗ αto lie withinkindependent slabs, each of width proportional toε. The volume of their intersection thus scales withε k. Theε-masking problem imposes only a single slab constraint, so its volume scales linearly withε. The ratio of the feasible volumes, Vol(V mask)/Vol(Vfai...