pith. sign in

arxiv: 2605.22291 · v1 · pith:6VLU4NVQnew · submitted 2026-05-21 · 💻 cs.LG

Long-term Fairness with Selective Labels

Pith reviewed 2026-05-22 07:45 UTC · model grok-4.3

classification 💻 cs.LG
keywords long-term fairnessselective labelsfair machine learningreinforcement learningbias decompositiondynamic fairnesslabel prediction
0
0 comments X

The pith

A decomposition of true fairness into observed fairness plus bounded prediction bias yields sufficient conditions for long-term fairness under selective labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Long-term fairness requires accounting for how decisions shape future population behavior, yet in domains such as lending or hiring the relevant labels are revealed only after positive decisions. This selective-labels problem renders standard fairness calculations incomplete and biased. The paper demonstrates that approaches relying solely on observed labels fail to guarantee fairness over time. It then introduces a framework that pairs observed data with a label predictor model, decomposes the true fairness measure into an observed component and a bias term attributable to prediction errors, and replaces the unobserved term with a function of the predictor's reported . The resulting sufficient conditions support a reinforcement-learning policy that attains fairness and performance levels comparable to an oracle agent possessing the true labels in semisynthetic environments.

Core claim

The true fairness measure under long-term dynamics and selective labels decomposes exactly into the fairness computed from observed labels plus a bias term induced by the label predictor. When the predictor supplies well-calibrated estimates, the bias can be bounded from observable quantities alone, producing sufficient conditions that replace the missing fairness term and thereby allow enforcement of true fairness without direct access to all labels.

What carries the argument

Decomposition of the fairness measure into observed fairness and label-prediction bias, with the bias term controlled by the predictor model's reported .

If this is right

  • Algorithms that treat observed labels as complete fail to guarantee long-term fairness once selective revelation is taken into account.
  • Sufficient conditions for true fairness become expressible using only observable data and the predictor's scalar.
  • A reinforcement-learning policy built on these conditions reaches fairness and utility levels statistically indistinguishable from an oracle with full label access in controlled semisynthetic trials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition could be applied to other partially observed dynamic systems such as medical treatment allocation or content recommendation where outcomes are revealed only after positive decisions.
  • Joint training of the label predictor and the decision policy might tighten the bias bounds and improve sample efficiency.
  • Empirical stress tests on real credit or hiring datasets would reveal whether the independence of the bias bound from policy dynamics holds outside semisynthetic regimes.

Load-bearing premise

The label predictor supplies well-calibrated values whose bias term can be bounded independently of the evolving policy and population dynamics.

What would settle it

A setting in which the predictor's values are miscalibrated yet the derived sufficient conditions are reported as satisfied, producing a measurable gap between the algorithm's claimed fairness and the true long-term fairness computed from fully observed labels.

Figures

Figures reproduced from arXiv: 2605.22291 by Giovani Valdrighi, Isabel Valera, Marcos Medeiros Raimundo.

Figure 1
Figure 1. Figure 1: Graphical model for F-MDP. Yt is partially observed depending on At = 1. groups: ∆ := µ 1 − µ 0 , where µ i := E[U(Y, A)|Ci ] is the expected utility of a group i with conditioning event C i . A fair policy π must satisfy |∆| ≤ ω, for some small tolerance ω ∈ R +. Different fairness notions can be expressed by the choice of U and C. In this work, we consider three com￾mon formulations1 : 1) Qualification P… view at source ↗
Figure 2
Figure 2. Figure 2: SELLF algorithm executed in the lending environment with β1 = 5 and varying values of β2. We display measures during learning and the disparity of the final policy. Results are averaged with 25 repetitions. 0 5000 10000 Timestep 1000 1200 1400 1600 Cumulative Reward 0 5000 10000 Timestep 0.0 0.1 0.2 0.3 0.4 Disparity (| t|) SELLF SELLF (Semi-stoc.) ELBERT FOCOPS POCAR POCAR (Oracle) PPO 0.150.200.25 P(A=1|… view at source ↗
Figure 3
Figure 3. Figure 3: Reward and true disparity (equality of opportunity) over time obtained by agents in the lending environment. Results are obtained with 10 repetitions. SELLF can ensure the same fairness as the baseline with oracle access and a higher reward. upon repayment (yt = 1) or decreases by one upon default (yt = 0). We set the cost of acceptance as c = 0.8, moti￾vated by the high cost of false positives (defaults) … view at source ↗
Figure 4
Figure 4. Figure 4: Reward and true disparity (accuracy) obtained in the recidivism environment. Results are obtained with 10 repetitions. reward above 1, 000. For the same reason, POCAR (Ora￾cle) outperformed PPO only when the weight of fairness penalization was increased. Our solution was the only al￾gorithm to produce a disparity below 0.05. Interestingly, SELLF (Semi-sto.) obtained the lowest disparity. However, it result… view at source ↗
Figure 6
Figure 6. Figure 6: Probability distributions calculated from the FICO dataset to define the environment. unit if Y = −1. Criminal Recidivism COMPAS (Angwin et al., 2016) is a software used in courts in the US to assess the likelihood of recidivism. A study of great importance by ProPublica showed that this tool consistently predicted a higher likelihood for African-Americans, indicating a discriminatory practice. In this env… view at source ↗
Figure 7
Figure 7. Figure 7: Probability distributions calculated from the COMPAS dataset to define the environment. 0 1 Age group 0.1 0.2 0.3 0.4 0.5 P(Age| Z) Distribution of age groups Black/Pardo White 0 1 Age group 0.20 0.25 0.30 0.35 0.40 0.45 P(Y = 1 | Age, Z) Distribution of Y based on the dataset Black/Pardo White 0 1 Age group 0.20 0.25 0.30 0.35 0.40 0.45 P(Y =1|extA ge,Z) Distribution of Y based on the model Black/Pardo Wh… view at source ↗
Figure 8
Figure 8. Figure 8: Estimated distributions from the ENEM dataset. features. We will refer to this variation as “School Admission (Continuous)”. The introduced features are strongly correlated with the true label (indicating whether a student scored above 575 on the languages exam), since students’ performance will be similar across the four exams. F. Experimental Setting Implementation details All algorithms and experiments … view at source ↗
Figure 9
Figure 9. Figure 9: Average (and std.) reward and disparity of multiple configurations of hyperparameter β1 in the lending environment with equality of opportunity fairness principle. given access to the true disparity measure; that is, PPO and POCAR had their hyperparameters tuned based on ∆A=1 , POCAR (Oracle) with ∆, and SELLF with ∆˜ . In more detail, we set |∆| = 1 T PT i=1 |∆t| (with the respective variation of the disp… view at source ↗
Figure 10
Figure 10. Figure 10: Error term and divergence term of the bound from Theo. 3.5. The Renyi divergence can present high values and make the bound loose. G.2. Analysis of Terms from Theo. 3.5 As discussed in Theo. 3.5 and detailed in Appendix B.4, the error ϵ i t of predictor ϕ on the rejected population can be bounded by ϵ i t , which is composed of two terms: ϵ i t ≤ ϵˆ i A,w + 25/4 q d2(Di R||Di A) 3 8 vuuut p log 2Ni e p + … view at source ↗
Figure 11
Figure 11. Figure 11: Reward and true disparity over time obtained by optimized agents. Results are obtained with 10 repetitions in the lending environment (a) and the school admission environment (b) [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Behavior of importance weights w(x, z) during learning and probability of acceptance at previous iterations with the lending environment with accuracy parity fairness principle. this effect is reduced. While β2 = 0.1, the probability for the unprivileged group reaches 0.2, and with β2 = 0.2, it stays fixed at 1 after a few initial iterations. G.5. Environments with Other Fairness Notions Lending with Accu… view at source ↗
Figure 13
Figure 13. Figure 13: Reward and true disparity (accuracy parity) over time obtained by optimized agents in the lending environment. Results are obtained with 10 repetitions. 0 5000 10000 Timestep 1000 1200 1400 1600 Cumulative Reward 0 5000 10000 Timestep 0.40 0.41 0.42 0.43 0.44 Disparity (| t |) SELLF SELLF (Semi-stoc.) ELBERT FOCOPS POCAR POCAR (Oracle) PPO [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Reward and true disparity (qualification parity) over time obtained by optimized agents in the lending environment. Results are obtained with 10 repetitions. 0 1000 2000 Timestep 750 800 850 900 950 1000 Cumulative Reward 0 1000 2000 Timestep 0.0 0.1 0.2 0.3 0.4 Disparity (| t |) SELLF SELLF (Semi-stoc.) ELBERT FOCOPS POCAR POCAR (Oracle) PPO [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Reward and true disparity (equality of opportunity) over time obtained by optimized agents in the recidivism environment. Results are obtained with 10 repetitions. 31 [PITH_FULL_IMAGE:figures/full_fig_p031_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Reward and true disparity (qualification parity) over time obtained by optimized agents in the recidivism environment. Results are obtained with 10 repetitions. 0 1000 2000 Timestep 1000 1050 1100 1150 1200 1250 Cumulative Reward 0 1000 2000 Timestep 0.1 0.2 0.3 Disparity (| t |) SELLF SELLF (Semi-stoc.) ELBERT FOCOPS POCAR POCAR (Oracle) PPO [PITH_FULL_IMAGE:figures/full_fig_p032_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Reward and true disparity (equality of opportunity) over time obtained by optimized agents in the school admission environment. Results are obtained with 10 repetitions. 0 1000 2000 Timestep 1000 1050 1100 1150 1200 1250 Cumulative Reward 0 1000 2000 Timestep 0.02 0.04 0.06 0.08 0.10 0.12 Disparity (| t |) SELLF SELLF (Semi-stoc.) ELBERT FOCOPS POCAR POCAR (Oracle) PPO [PITH_FULL_IMAGE:figures/full_fig_p… view at source ↗
Figure 18
Figure 18. Figure 18: Reward and true disparity (accuracy parity) over time obtained by optimized agents in the school admission environment. Results are obtained with 10 repetitions. 32 [PITH_FULL_IMAGE:figures/full_fig_p032_18.png] view at source ↗
read the original abstract

Long-term fairness algorithms aim to satisfy fairness beyond static and short-term notions by accounting for the dynamics between decision-making policies and population behavior. Most previous approaches evaluate performance and fairness measures from observable features and a label, which is assumed to be fully observed. However, in scenarios such as hiring or lending, the labels (e.g., ability to repay the loan) are selective labels as they are only revealed based on positive decisions (e.g., when a loan is granted). In this paper, we study long-term fairness in the selective labels setting and analytically show that naive solutions do not guarantee fairness. To address this gap, we then introduce a novel framework that leverages both the observed data and a label predictor model to estimate the true fairness measure value by decomposing it into the observed fairness and bias from label predictions. This allows us to derive sufficient conditions to satisfy true fairness from observable quantities by using the confidence in the predictor model. Finally, we rely on our theoretical results to propose a novel reinforcement learning algorithm for effective long-term fair decision-making with selective labels. In semisynthetic environments, the proposed algorithm reached comparable fairness and performance to an agent with oracle access to the true labels.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper examines long-term fairness under selective labels, where outcomes are observed only for positive decisions. It analytically proves that naive approaches fail to ensure fairness, proposes a decomposition of the true fairness measure into observed fairness plus bias from a label predictor, derives sufficient conditions for true fairness in terms of observable quantities and predictor confidence, and introduces an RL algorithm whose semisynthetic performance matches an oracle with access to true labels.

Significance. If the derivation holds, the work provides a concrete way to achieve long-term fairness without requiring full label observability, by leveraging a predictor to bound the missing bias term. The analytical failure proof for naive methods and the RL formulation that reaches oracle-level fairness and utility in semisynthetic settings are clear strengths. The result would be of interest to the fairness-in-ML community provided the calibration assumption survives policy-induced shifts.

major comments (1)
  1. [Section 4] Section 4 (derivation after Eq. (3)): the move from the decomposition identity to the sufficient conditions replaces the unobserved fairness term with a function of the predictor's confidence. This replacement is valid only if the predictor's calibration error (or bias bound) remains controlled independently of the distribution shifts induced by the evolving policy. In the selective-labels regime the observed data are already conditioned on past positive decisions, so any change in policy alters the feature-label distribution seen by the predictor; the manuscript supplies no argument or bound showing that calibration error stays controlled under such shifts.
minor comments (1)
  1. [Abstract and Experiments] The abstract and experimental section should explicitly state the base datasets, how the selective-label mechanism is simulated, and whether the label predictor is retrained on data generated by the learned policy or only on the initial fixed environment.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the potential impact of this work on long-term fairness under selective labels. We address the single major comment below.

read point-by-point responses
  1. Referee: [Section 4] Section 4 (derivation after Eq. (3)): the move from the decomposition identity to the sufficient conditions replaces the unobserved fairness term with a function of the predictor's confidence. This replacement is valid only if the predictor's calibration error (or bias bound) remains controlled independently of the distribution shifts induced by the evolving policy. In the selective-labels regime the observed data are already conditioned on past positive decisions, so any change in policy alters the feature-label distribution seen by the predictor; the manuscript supplies no argument or bound showing that calibration error stays controlled under such shifts.

    Authors: We thank the referee for this precise observation. The identity decomposition itself holds by definition and does not depend on calibration. The sufficient conditions that follow replace the unobserved bias term with a bound derived from the predictor's reported confidence; this step implicitly treats the predictor as calibrated on the data distribution induced by the current policy. The manuscript does not supply an explicit argument or bound establishing that calibration error remains controlled when the policy evolves and thereby changes the feature-label distribution seen by the predictor. We agree this is a substantive modeling assumption that warrants explicit discussion. In the revised manuscript we will (i) state the assumption clearly after Eq. (3), (ii) add a remark on its implications for long-term deployment, and (iii) outline a practical safeguard—periodic recalibration of the predictor on newly observed selective labels—to keep the bound valid in practice. These additions will appear in Section 4 and in the discussion of the RL algorithm. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected; derivation relies on external predictor model.

full rationale

The paper decomposes the true fairness measure into an observed component plus a bias term attributable to the label predictor (Section 4, post-Eq. (3)), then substitutes a function of the predictor's reported confidence to obtain sufficient conditions on observable quantities. This step treats the label predictor as an independent, externally provided model whose calibration properties are assumed rather than fitted inside the fairness objective or derived from the target metric. No equation reduces the claimed sufficient conditions to the input fairness value by algebraic identity or by renaming a fitted parameter. No self-citations, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing steps. The derivation therefore remains self-contained once the external predictor and its bounded-bias assumption are granted.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on a label predictor whose calibration properties are taken as given and on the existence of a decomposition identity that isolates the unobserved bias term. No new physical entities are postulated.

free parameters (1)
  • predictor confidence threshold
    Used to bound the bias term in the sufficient conditions; value chosen to satisfy the derived inequality.
axioms (1)
  • domain assumption The label predictor produces confidence scores that can be used to bound the difference between predicted and true labels in the unobserved region.
    Invoked when converting the decomposition into observable sufficient conditions for true fairness.

pith-pipeline@v0.9.0 · 5736 in / 1423 out tokens · 40379 ms · 2026-05-22T07:45:41.686682+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

70 extracted references · 70 canonical work pages · 1 internal anchor

  1. [1]

    Advances in Neural Information Processing Systems , volume = 36, pages =

    Long-term fairness with unknown dynamics , author =. Advances in Neural Information Processing Systems , volume = 36, pages =

  2. [2]

    Causality: Models, Reasoning and Inference , author =

  3. [3]

    Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining , pages =

    Algorithmic decision making and the cost of fairness , author =. Proceedings of the 23rd acm sigkdd international conference on knowledge discovery and data mining , pages =

  4. [4]

    Sociological Methods & Research , publisher =

    Fairness in criminal justice risk assessments: The state of the art , author =. Sociological Methods & Research , publisher =

  5. [5]

    ACM Comput

    A Survey on Bias and Fairness in Machine Learning , author =. ACM Comput. Surv. , publisher =. doi:10.1145/3457607 , issn =

  6. [6]

    Raab, Reilly and Boczar, Ross and Fazel, Maryam and Liu, Yang , year = 2024, month = mar, journal =. Fair. doi:10.1609/aaai.v38i13.29394 , issn =

  7. [7]

    Ge, Yingqiang and Liu, Shuchang and Gao, Ruoyuan and Xian, Yikun and Li, Yunqi and Zhao, Xiangyu and Pei, Changhua and Sun, Fei and Ge, Junfeng and Ou, Wenwu and Zhang, Yongfeng , year = 2021, month = mar, booktitle =. Towards. doi:10.1145/3437963.3441824 , url =

  8. [8]

    Yin, Tongxin and Raab, Reilly and Liu, Mingyan and Liu, Yang , year = 2023, booktitle =. Long-

  9. [9]

    Performative

    Perdomo, Juan and Zrnic, Tijana and Mendler-Dünner, Celestine and Hardt, Moritz , year = 2020, month = jul, booktitle =. Performative

  10. [10]

    Equal opportunity in online classification with partial feedback , author =

  11. [11]

    Creager, Elliot and Madras, David and Pitassi, Toniann and Zemel, Richard , year = 2020, month = jul, booktitle =. Causal

  12. [12]

    Equality of opportunity in supervised learning , author =

  13. [13]

    The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , url =

    The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning , author =. The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track , url =

  14. [14]

    Proceedings of the 3rd innovations in theoretical computer science conference , pages =

    Fairness through awareness , author =. Proceedings of the 3rd innovations in theoretical computer science conference , pages =

  15. [15]

    Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , location =

    Fairness is not static: deeper understanding of long term fairness via simulation studies , author =. Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency , location =. doi:10.1145/3351095.3372878 , isbn = 9781450369367, url =

  16. [16]

    Proceedings of the 1st Conference on Fairness, Accountability and Transparency , publisher =

    Runaway Feedback Loops in Predictive Policing , author =. Proceedings of the 1st Conference on Fairness, Accountability and Transparency , publisher =

  17. [17]

    Adapting Static Fairness to Sequential Decision-Making: Bias Mitigation Strategies towards Equal Long-term Benefit Rate , author =

  18. [18]

    International Conference on Machine Learning , pages =

    Delayed impact of fair machine learning , author =. International Conference on Machine Learning , pages =

  19. [19]

    Report to the congress on credit scoring and its effects on the availability and affordability of credit , author =

  20. [20]

    Weapons of Math Destruction , author =

  21. [21]

    Nber Books , publisher =

    Risk elements in consumer instalment financing , author =. Nber Books , publisher =

  22. [22]

    Machine Bias , author =

  23. [23]

    Journal of Machine Learning Research , volume = 16, number = 42, pages =

    A Comprehensive Survey on Safe Reinforcement Learning , author =. Journal of Machine Learning Research , volume = 16, number = 42, pages =

  24. [24]

    IEEE Transactions on Neural Networks and Learning Systems , volume =

    A Survey on Offline Reinforcement Learning: Taxonomy, Review, and Open Problems , author =. IEEE Transactions on Neural Networks and Learning Systems , volume =

  25. [25]

    Remembering to Be Fair: Non-Markovian Fairness in Sequential Decision Making , author =

  26. [26]

    Long-Term Fairness Inquiries and Pursuits in Machine Learning: A Survey of Notions, Methods, and Challenges , author =

  27. [27]

    ACM Comput

    Preserving the Fairness Guarantees of Classifiers in Changing Environments: A Survey , author =. ACM Comput. Surv. , publisher =. doi:10.1145/3637438 , issn =

  28. [28]

    Advances in Neural Information Processing Systems , volume = 33, pages =

    How do fair decisions fare in long-term qualification? , author =. Advances in Neural Information Processing Systems , volume = 33, pages =

  29. [29]

    Advances in Neural Information Processing Systems , volume = 34, pages =

    Unintended selection: Persistent qualification rate disparities and interventions , author =. Advances in Neural Information Processing Systems , volume = 34, pages =

  30. [30]

    Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , publisher =

    Towards Return Parity in Markov Decision Processes , author =. Proceedings of The 25th International Conference on Artificial Intelligence and Statistics , publisher =

  31. [31]

    2023 IEEE International Conference on Big Data (BigData) , pages =

    Striking a balance in fairness for dynamic systems through reinforcement learning , author =. 2023 IEEE International Conference on Big Data (BigData) , pages =

  32. [32]

    The Thirteenth International Conference on Learning Representations , url =

    A Causal Lens for Learning Long-term Fair Policies , author =. The Thirteenth International Conference on Learning Representations , url =

  33. [33]

    Advances in Neural Information Processing Systems , url =

    Policy Optimization with Advantage Regularization for Long-Term Fairness in Decision Systems , author =. Advances in Neural Information Processing Systems , url =

  34. [34]

    Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society , location =

    A Dynamic Decision-Making Framework Promoting Long-Term Fairness , author =. Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society , location =. doi:10.1145/3514094.3534127 , isbn = 9781450392471, url =

  35. [35]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume = 36, pages =

    Achieving long-term fairness in sequential decision making , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume = 36, pages =

  36. [36]

    CoRR , volume =

    Addressing Polarization and Unfairness in Performative Prediction , author =. CoRR , volume =

  37. [37]

    Proximal policy optimization algorithms , author =

  38. [38]

    International Conference on Artificial Intelligence and Statistics , pages =

    Algorithms for fairness in sequential decision making , author =. International Conference on Artificial Intelligence and Statistics , pages =

  39. [39]

    Designing

    Rateike, Miriam and Valera, Isabel and Forré, Patrick , year = 2024, month = jun, booktitle =. Designing. doi:10.1145/3630106.3658538 , isbn = 9798400704505, url =

  40. [40]

    Conference on Uncertainty in Artificial Intelligence , pages =

    Fair contextual multi-armed bandits: Theory and experiments , author =. Conference on Uncertainty in Artificial Intelligence , pages =

  41. [41]

    Uncertainty in Artificial Intelligence , pages =

    Efficient resource allocation with fairness constraints in restless multi-armed bandits , author =. Uncertainty in Artificial Intelligence , pages =

  42. [42]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume = 38, pages =

    Online restless multi-armed bandits with long-term fairness constraints , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume = 38, pages =

  43. [43]

    Addressing Polarization and Unfairness in Performative Prediction , author =

  44. [44]

    Fairness and Machine Learning: Limitations and Opportunities , author =

  45. [45]

    Inherent Trade-Offs in the Fair Determination of Risk Scores

    Inherent trade-offs in the fair determination of risk scores , author=. arXiv preprint arXiv:1609.05807 , year=

  46. [46]

    Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

    Don’t throw it away! the utility of unlabeled data in fair decision making , author=. Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency , pages=

  47. [47]

    Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

    M ^2 FGB: A Min-Max Gradient Boosting Framework for Subgroup Fairness , author=. Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency , pages=

  48. [48]

    Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection , volume =

    Alghamdi, Wael and Hsu, Hsiang and Jeong, Haewon and Wang, Hao and Michalak, Peter and Asoodeh, Shahab and Calmon, Flavio , booktitle =. Beyond Adult and COMPAS: Fair Multi-Class Prediction via Information Projection , volume =

  49. [49]

    Journal of Banking & Finance , volume=

    Does reject inference really improve the performance of application scoring models? , author=. Journal of Banking & Finance , volume=. 2004 , publisher=

  50. [50]

    Foundations and Trends

    An introduction to deep reinforcement learning , author=. Foundations and Trends. 2018 , publisher=

  51. [51]

    International Conference on Artificial Intelligence and Statistics , pages=

    Fair decisions despite imperfect predictions , author=. International Conference on Artificial Intelligence and Statistics , pages=. 2020 , organization=

  52. [52]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    The importance of modeling data missingness in algorithmic fairness: A causal perspective , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  53. [53]

    advances in neural information processing systems , volume=

    The self-normalized estimator for counterfactual learning , author=. advances in neural information processing systems , volume=

  54. [54]

    International conference on machine learning , pages=

    Constrained policy optimization , author=. International conference on machine learning , pages=. 2017 , organization=

  55. [55]

    Proceedings of the 24th International Conference on Neural Information Processing Systems - Volume 1 , pages =

    Cortes, Corinna and Mansour, Yishay and Mohri, Mehryar , title =. Proceedings of the 24th International Conference on Neural Information Processing Systems - Volume 1 , pages =. 2010 , publisher =

  56. [56]

    Elisa , title =

    Keswani, Vijay and Mehrotra, Anay and Celis, L. Elisa , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

  57. [57]

    and Hawn, Aaron , title=

    Baker, Ryan S. and Hawn, Aaron , title=. International Journal of Artificial Intelligence in Education , year=. doi:10.1007/s40593-021-00285-9 , url=

  58. [58]

    The Journal of Finance , volume =

    Fuster, Andreas and Goldsmith-Pinkham, Paul and Ramadorai, Tarun and Walther, Ansgar , title =. The Journal of Finance , volume =. doi:https://doi.org/10.1111/jofi.13090 , url =. https://onlinelibrary.wiley.com/doi/pdf/10.1111/jofi.13090 , abstract =

  59. [59]

    Journal of Machine Learning Research , volume=

    Nearly-tight VC-dimension and pseudodimension bounds for piecewise linear neural networks , author=. Journal of Machine Learning Research , volume=

  60. [60]

    Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=

    Fairness in reinforcement learning , author=. Proceedings of the 34th International Conference on Machine Learning-Volume 70 , pages=

  61. [61]

    Transactions on Machine Learning Research , year=

    Group fairness in reinforcement learning , author=. Transactions on Machine Learning Research , year=

  62. [62]

    arXiv preprint arXiv:2412.17123 , year=

    Fairness in Reinforcement Learning with Bisimulation Metrics , author=. arXiv preprint arXiv:2412.17123 , year=

  63. [63]

    Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=

    What hides behind unfairness? exploring dynamics fairness in reinforcement learning , author=. Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence , pages=

  64. [64]

    Proceedings of the 41st International Conference on Machine Learning , pages=

    Fair off-policy learning from observational data , author=. Proceedings of the 41st International Conference on Machine Learning , pages=

  65. [65]

    Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining , pages=

    The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables , author=. Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining , pages=

  66. [66]

    Advances in Neural Information Processing Systems , volume=

    First order constrained optimization in policy space , author=. Advances in Neural Information Processing Systems , volume=

  67. [67]

    Proceedings of machine learning research , volume=

    From biased selective labels to pseudo-labels: an expectation-maximization framework for learning from biased decisions , author=. Proceedings of machine learning research , volume=

  68. [68]

    Advances in Neural Information Processing Systems , volume=

    Automating data annotation under strategic human agents: Risks and potential solutions , author=. Advances in Neural Information Processing Systems , volume=

  69. [69]

    Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=

    Algorithmic fairness in performative policy learning: Escaping the impossibility of group fairness , author=. Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency , pages=

  70. [70]

    2015 , publisher=

    Causal inference in statistics, social, and biomedical sciences , author=. 2015 , publisher=