COPF: An Online Framework for Deployment-Stable Counterfactual Fairness in Evolving Graphs
Pith reviewed 2026-06-28 19:16 UTC · model grok-4.3
The pith
COPF bounds exposure-counterfactual group gaps in evolving graphs through residual outcome indistinguishability audits on graph-aware doubly robust estimators.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
COPF defines group-level opportunity gaps over exposure counterfactuals and renders them estimable by logging propensities under explicit exploration. It audits and controls fairness via residual outcome indistinguishability over a configurable auditor family that employs graph-aware doubly robust estimators. A noisy transfer theorem establishes that residual-OI evaluated on the estimated GA-DR residuals implies bounds on the exposure-counterfactual group gaps whenever the underlying process satisfies temporal mixing and bounded local interference. The framework instantiates an online multicalibration auditor together with a primal-dual controller that acts on these residuals to enforce the
What carries the argument
The noisy transfer theorem that converts residual outcome indistinguishability on graph-aware doubly robust residuals into bounds on exposure-counterfactual group gaps under temporal mixing and bounded local interference.
If this is right
- Fairness estimates remain valid after the policy is updated because the controller acts on residuals that already account for future graph changes.
- Worst-case spikes in group exposure disparities can be reduced by online adjustment without separate post-deployment retraining.
- Explicit propensity logging makes counterfactual exposure gaps directly estimable even when recommendations alter the observed data distribution.
- The multicalibration auditor can be reconfigured for different protected attributes while reusing the same graph-aware doubly robust estimator.
Where Pith is reading between the lines
- The same residual-audit structure could be applied to other performative recommendation settings, such as content or product suggestion, if propensity logging is feasible.
- If real-world social graphs exhibit the required mixing rates, the theorem supplies a practical route to deployment-stable fairness without full causal modeling of the entire graph.
- Tightening the local-interference bound or relaxing the mixing requirement would directly widen the class of graphs on which the controller guarantees hold.
Load-bearing premise
The translation from residual outcome indistinguishability to bounds on exposure gaps requires that the graph process mixes temporally and that local interference remains bounded.
What would settle it
A controlled stream in which temporal mixing is deliberately broken while the residual-OI auditor still reports low values, yet measured exposure-counterfactual group gaps exceed the claimed bounds.
Figures
read the original abstract
Online link recommendation on evolving graphs is performative: by choosing which candidate links to show users, the system changes which links form and what feedback it later observes. Consequently, fairness estimates from logged outcomes can be misleading and may drift after deployment when the recommendation policy is updated. We introduce COPF (Counterfactual Online Performative Fairness), a decision-layer framework for deployment-stable fairness monitoring and control in online link recommendation. COPF (i) defines group-level opportunity gaps over exposure (shown vs. not shown) counterfactuals, (ii) makes them estimable by explicit exploration and by logging the probability (propensity) that each candidate is shown, and (iii) audits and controls fairness using residual outcome indistinguishability (OI) over a configurable auditor family with graph-aware doubly robust (GA-DR) estimators. We provide a noisy transfer theorem showing that Residual-OI on estimated GA-DR residuals implies bounds on exposure-counterfactual group gaps under temporal mixing and bounded local interference, and we instantiate an online multicalibration auditor together with a primal-dual controller. Experiments on two TGB streams and a controlled synthetic bipartite stream show that COPF reduces worst-case spikes in exposure-counterfactual group disparities with modest impact on ranking utility. Our code is available at https://github.com/lsnnnnnnnn/COPF.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces COPF, a decision-layer framework for deployment-stable counterfactual fairness in online link recommendation on evolving graphs. It defines group-level opportunity gaps over exposure counterfactuals, makes them estimable via explicit exploration and propensity logging, and audits/controls fairness using residual outcome indistinguishability (OI) over a configurable auditor family with graph-aware doubly robust (GA-DR) estimators. A noisy transfer theorem is stated showing that Residual-OI on estimated GA-DR residuals implies bounds on exposure-counterfactual group gaps under temporal mixing and bounded local interference; an online multicalibration auditor and primal-dual controller are instantiated. Experiments on two TGB streams and a synthetic bipartite stream report reduced worst-case spikes in disparities with modest impact on ranking utility; code is released.
Significance. If the noisy transfer theorem is valid, the framework supplies a theoretically grounded approach to fairness monitoring and control that remains stable under policy updates in performative, evolving-graph settings. This is relevant for real-world recommendation systems. Strengths include the explicit statement of assumptions in the theorem, the release of code, and the use of real TGB streams alongside controlled synthetic data.
major comments (2)
- [Abstract / Theorem] Abstract / noisy transfer theorem: The theorem asserts that Residual-OI on estimated GA-DR residuals implies bounds on exposure-counterfactual group gaps under temporal mixing and bounded local interference. No verification, sensitivity analysis, or discussion of whether these assumptions hold on the TGB streams (e.g., estimated mixing times or interference ranges) is provided, which is load-bearing for converting observable Residual-OI into the claimed counterfactual guarantees.
- [Experiments] Experiments: The reported reductions in worst-case spikes are presented as supporting deployment stability, yet without checks on the theorem's assumptions the empirical plots alone cannot substantiate the transfer from Residual-OI to group-gap bounds; a direct comparison of observed vs. theorem-predicted gaps under varying mixing/interference regimes would be needed.
minor comments (2)
- [Abstract] The abstract introduces GA-DR without expansion; ensure the term is defined on first use in the main text.
- [Method] Notation for the auditor family and primal-dual controller could be introduced with a short table of symbols for clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, particularly the focus on strengthening the link between the noisy transfer theorem and the empirical results. We address each major comment below.
read point-by-point responses
-
Referee: [Abstract / Theorem] Abstract / noisy transfer theorem: The theorem asserts that Residual-OI on estimated GA-DR residuals implies bounds on exposure-counterfactual group gaps under temporal mixing and bounded local interference. No verification, sensitivity analysis, or discussion of whether these assumptions hold on the TGB streams (e.g., estimated mixing times or interference ranges) is provided, which is load-bearing for converting observable Residual-OI into the claimed counterfactual guarantees.
Authors: We agree that the assumptions of temporal mixing and bounded local interference are load-bearing for the theorem, and that the original manuscript provides no empirical discussion or verification of these on the TGB streams. In the revision we will add a dedicated subsection (in the theory/experiments bridge) that (i) states how mixing times can be estimated from the observed temporal streams using standard autocorrelation-based methods, (ii) reports rough estimates for the two TGB datasets, and (iii) includes a sensitivity analysis that varies the assumed mixing horizon and interference radius while recomputing the implied group-gap bounds. This will make the transfer from Residual-OI to counterfactual guarantees more transparent. revision: yes
-
Referee: [Experiments] Experiments: The reported reductions in worst-case spikes are presented as supporting deployment stability, yet without checks on the theorem's assumptions the empirical plots alone cannot substantiate the transfer from Residual-OI to group-gap bounds; a direct comparison of observed vs. theorem-predicted gaps under varying mixing/interference regimes would be needed.
Authors: We concur that the current experiments demonstrate practical disparity reduction but do not directly test the theorem's predictive transfer. We will therefore extend the experimental section with an additional controlled study on the synthetic bipartite stream. In this study we will explicitly modulate the temporal mixing rate and local interference radius, compute both the observed exposure-counterfactual group gaps and the bounds predicted by the noisy transfer theorem from the measured Residual-OI, and plot the two quantities against each other across regimes. This will provide the direct comparison requested and will be reported alongside the existing TGB results. revision: yes
Circularity Check
No significant circularity; theorem is a standard implication under stated assumptions
full rationale
The paper's central claim is a noisy transfer theorem establishing that Residual-OI on GA-DR residuals implies bounds on exposure-counterfactual group gaps, conditional on temporal mixing and bounded local interference. This is presented as a derived mathematical result rather than a self-definition, fitted prediction, or reduction to prior self-citations. The auditor and primal-dual controller are instantiated separately, with empirical evaluation on TGB streams providing independent content. No load-bearing step reduces by construction to its own inputs, and the derivation chain remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- configurable auditor family
axioms (2)
- domain assumption Temporal mixing
- domain assumption Bounded local interference
Reference graph
Works this paper leans on
-
[1]
Coston, A., Mishler, A., Kennedy, E
URL https://openreview.net/ forum?id=ayPPc0SyLv1. Coston, A., Mishler, A., Kennedy, E. H., and Chouldechova, A. Counterfactual risk assessments, evaluation, and fair- ness. InProceedings of the 2020 conference on fairness, accountability, and transparency, pp. 582–593,
2020
-
[2]
N., Sculley, D., and Halpern, Y
D’Amour, A., Srinivasan, H., Atwood, J., Baljekar, P. N., Sculley, D., and Halpern, Y . Fairness is not static: deeper understanding of long term fairness via simulation stud- ies.Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency,
2020
-
[3]
Dud´ık, M., Langford, J., and Li, L
doi: 10.1007/s11257-023-09364-z. Dud´ık, M., Langford, J., and Li, L. Doubly robust policy evaluation and learning. InProceedings of the 28th In- ternational Conference on International Conference on Machine Learning, ICML’11, pp. 1097–1104, Madison, WI, USA,
-
[4]
ISBN 9781450306195
Omnipress. ISBN 9781450306195. Dwork, C., Kim, M. P., Reingold, O., Rothblum, G. N., and Yona, G. Outcome indistinguishability. InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing, STOC 2021, pp. 1095–1108,
2021
-
[5]
Dwork, C., Hays, C., Immorlica, N., Perdomo, J
doi: 10.1145/3406325.3451064. Dwork, C., Hays, C., Immorlica, N., Perdomo, J. C., and Tankala, P. From fairness to infinity: Outcome- indistinguishable (omni)prediction in evolving graphs. InProceedings of Thirty Eighth Conference on Learning Theory, pp. 1564–1637,
-
[6]
E., Karimi, F., and Wagner, C
Ferrara, A., Noboa, L. E., Karimi, F., and Wagner, C. Link recommendations: Their impact on network structure and minorities.Proceedings of the 14th ACM Web Science Conference 2022,
2022
-
[7]
Temporal graph benchmark for ma- chine learning on temporal graphs.Advances in Neural Information Processing Systems, 36:2056–2073,
Huang, S., Poursafaei, F., Danovitch, J., Fey, M., Hu, W., Rossi, E., Leskovec, J., Bronstein, M., Rabusseau, G., and Rabbany, R. Temporal graph benchmark for ma- chine learning on temporal graphs.Advances in Neural Information Processing Systems, 36:2056–2073,
2056
-
[8]
N., Zeng, Z., Pan, S., and Foulds, J
Islam, R., Keya, K. N., Zeng, Z., Pan, S., and Foulds, J. Debiasing career recommendations with neural fair col- laborative filtering. InProceedings of the Web Conference 2021, pp. 3779–3790, New York, NY , USA,
2021
-
[9]
ACM. doi: 10.1145/3442381.3449904. Khajehnejad, A., Khajehnejad, M., Babaei, M., Gummadi, K. P., Weller, A., and Mirzasoleiman, B. Crosswalk: Fairness-enhanced node representation learning. InPro- ceedings of the AAAI Conference on Artificial Intelli- gence, volume 36, pp. 11963–11970,
-
[10]
Fairgan: Gans-based fairness- aware learning for recommendations with implicit feed- back
Li, J., Ren, Y ., and Deng, K. Fairgan: Gans-based fairness- aware learning for recommendations with implicit feed- back. InProceedings of the ACM Web Conference 2022, pp. 297–307,
2022
-
[11]
doi: 10.1145/3485447.3511958. Liu, L. T., Dean, S., Rolf, E., Simchowitz, M., and Hardt, M. Delayed impact of fair machine learning. InInternational Conference on Machine Learning,
-
[12]
Mishler, A. and Dalmasso, N. Fair when trained, un- fair when deployed: Observable fairness measures are unstable in performative prediction settings.CoRR, abs/2202.05049,
-
[13]
Temporal Graph Networks for Deep Learning on Dynamic Graphs
Rossi, E., Chamberlain, B. P., Frasca, F., Eynard, D., Monti, F., and Bronstein, M. M. Temporal graph networks for deep learning on dynamic graphs.ArXiv, abs/2006.10637,
work page internal anchor Pith review Pith/arXiv arXiv 2006
-
[14]
doi: 10.1609/aaai.v36i7.20789. Santos, F. P., Lelkes, Y ., and Levin, S. A. Link recommen- dation algorithms and dynamics of polarization in online social networks.Proceedings of the National Academy of Sciences, 118,
-
[15]
Off-policy learning over heterogeneous information for recommendation
Wang, X., Li, Q., Yu, D., and Xu, G. Off-policy learning over heterogeneous information for recommendation. In Proceedings of the ACM Web Conference 2022, pp. 2348– 2359,
2022
-
[16]
group on = src; outcome = bandit audit off audit=1000; pre-cal; PD+cov in Deploy/Post GraphMixer: tokens=neighbors
Dataset Groups / outcome Base Base+COPF Backbone notes Synth-bip. group on = src; outcome = bandit audit off audit=1000; pre-cal; PD+cov in Deploy/Post GraphMixer: tokens=neighbors. TGBL-Wiki group on = dst; attr = nodemod(2) audit off audit=200; cf upd=200; pre-cal; PD+cov in Deploy/Post GraphMixer: neighbors=10, tokens=20 (stable). TGBL-Review group on ...
1935
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.