Finite-Sample and Distribution-Free Fair Classification: Optimal Trade-off Between Excess Risk and Fairness, and the Cost of Group-Blindness

Linjun Zhang; Xiaotian Hou

arxiv: 2410.16477 · v3 · pith:XA4DK2MDnew · submitted 2024-10-21 · 📊 stat.ME · stat.ML

Finite-Sample and Distribution-Free Fair Classification: Optimal Trade-off Between Excess Risk and Fairness, and the Cost of Group-Blindness

Xiaotian Hou , Linjun Zhang This is my paper

Pith reviewed 2026-05-23 19:36 UTC · model grok-4.3

classification 📊 stat.ME stat.ML

keywords fair classificationdistribution-freefinite-sample guaranteespost-processinggroup fairnessgroup-blindexcess riskminimax optimality

0 comments

The pith

A post-processing procedure on any black-box classifier enforces distribution-free finite-sample group fairness while bounding excess risk in both aware and blind settings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a unified post-processing method that takes the output of an arbitrary pre-trained classifier and produces a new predictor satisfying multiple group fairness notions. The guarantees on fairness hold with explicit probability bounds for any finite sample size and require no assumptions on the data distribution. The same procedure applies whether group membership is observed at decision time or withheld, and the increase in classification risk stays controlled. The authors also derive a matching minimax lower bound establishing that the excess-risk rate achieved is optimal up to logarithmic factors. A reader cares because many deployed systems must satisfy fairness rules with limited data and sometimes without access to sensitive attributes.

Core claim

The central claim is that there exists a post-processing procedure applicable to arbitrary black-box models which enforces distribution-free finite-sample fairness guarantees for various group fairness notions in both group-aware and group-blind scenarios, while controlling the excess risk, and that this procedure is minimax rate-optimal up to logarithmic factors as shown by a matching lower bound.

What carries the argument

A post-processing procedure that adjusts the output of an arbitrary black-box classifier to satisfy fairness constraints with finite-sample guarantees.

If this is right

Fairness constraints can be met with high probability using only finite samples and no knowledge of the data distribution.
The same procedure achieves the guarantees whether or not group membership is available at prediction time.
The excess risk incurred by the fairness adjustment is bounded and cannot be improved substantially in the worst case beyond logarithmic factors.
The framework applies uniformly to multiple common group fairness notions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The result suggests that fairness post-processing can be inserted as a lightweight final step in existing ML pipelines that already use black-box models.
The quantified cost of group-blindness could inform policy decisions about when access to group attributes should be permitted.
Because the lower bound matches the upper bound up to logs, further reductions in excess risk would require either larger samples or a richer class of base models.

Load-bearing premise

That the post-processing procedure applied to arbitrary black-box models can enforce the stated finite-sample fairness guarantees while controlling excess risk, without requiring internal model access or any distributional assumptions on the data.

What would settle it

An experiment in which the empirical probability that the post-processed classifier violates the target fairness constraint exceeds the finite-sample bound stated in the paper for the given sample size and fairness tolerance.

Figures

Figures reproduced from arXiv: 2410.16477 by Linjun Zhang, Xiaotian Hou.

**Figure 2.** Figure 2: The curve of |λ ∗G α | on α under (M1). The red line is for the group-aware scenario and the cyan line is for the group-blind scenario. the excess risk of the unconstrained classification problem in group-aware and group-blind scenarios, respectively. When the unfairness level decreases, the group-blind excess risk grows significantly, indicating the tradeoff between group-blind excess risk and the fairnes… view at source ↗

**Figure 3.** Figure 3: The trade-off between prediction error and unfairness on the Adult Census [PITH_FULL_IMAGE:figures/full_fig_p031_3.png] view at source ↗

**Figure 4.** Figure 4: (a) The trade-off between prediction error and unfairness under (M2). The X [PITH_FULL_IMAGE:figures/full_fig_p047_4.png] view at source ↗

**Figure 5.** Figure 5: The curve of |λ ∗G α | on α under (M2). The red line is for the group-aware scenario and the cyan line is for the group-blind scenario. α Methods 0.08 0.11 0.14 0.17 0.20 Ours U¯ EOO 0.039(0.033) 0.051(0.038) 0.074(0.044) 0.098(0.048) 0.138(0.048) UEOO,95 0.107 0.128 0.143 0.180 0.216 Error 0.333(0.024) 0.320(0.024) 0.306(0.025) 0.294(0.023) 0.273(0.023) FPIR U¯ EOO 0.101(0.061) 0.124(0.054) 0.157(0.066) 0… view at source ↗

**Figure 6.** Figure 6: (a) The trade-off between prediction error and unfairness under (M3). The X [PITH_FULL_IMAGE:figures/full_fig_p049_6.png] view at source ↗

**Figure 7.** Figure 7: The curve of |λ ∗G α | on α under (M3). The red line is for the group-aware scenario and the cyan line is for the group-blind scenario. 50 [PITH_FULL_IMAGE:figures/full_fig_p050_7.png] view at source ↗

read the original abstract

Algorithmic fairness has become a central concern in modern machine learning and AI applications. However, two pressing challenges remain: (1) The fairness guarantees of existing methods often rely on specific data distributional assumptions and large sample sizes, which can lead to fairness violations in practice. (2) Due to legal and societal considerations, using sensitive group attributes during decision-making (referred to as the group-blind setting) may not always be feasible. In this work, we quantify the impact of enforcing algorithmic fairness and group-blindness/awareness in binary classification under group fairness constraints. Specifically, we propose a unified framework for fair classification that provides distribution-free and finite-sample fairness guarantees with controlled excess risk. This framework is applicable to various group fairness notions in both group-aware and group-blind scenarios. Our approach is based on a post-processing procedure that can be applied to arbitrary black-box models, making it directly compatible with modern machine learning pipelines. Furthermore, we establish a minimax lower bound showing the minimax rate-optimality of our proposed algorithm up to logarithmic factors. Through extensive synthetic and real data studies, we further demonstrate the competitive or superior performance of our algorithm compared to existing methods, and provide empirical support for our theoretical findings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a post-processing method with explicit finite-sample distribution-free fairness bounds for both group-aware and group-blind classification plus a matching minimax lower bound.

read the letter

The central result is a calibration-based post-processing step that enforces group fairness constraints with high-probability finite-sample bounds that require no distributional assumptions and work on the outputs of any black-box model. It covers both the usual group-aware setting and the group-blind setting where the sensitive attribute is unavailable at test time, and it supplies a minimax lower bound showing the rates are optimal up to logarithmic factors. The experiments on synthetic data and standard real datasets (Adult, COMPAS) show the method matches or beats prior post-processing baselines on the risk-fairness trade-off while respecting the stated guarantees. That combination of black-box compatibility, explicit non-asymptotic bounds, and the lower bound is the concrete advance over earlier work that either needed large samples or distributional assumptions. The argument looks internally consistent: the concentration steps are standard and the lower-bound construction matches the upper bound structure. The main limitation is that the excess-risk overhead scales with calibration-set size and the number of groups, so performance can degrade when the calibration set is small or groups are numerous; the paper reports this but does not explore the very-small-calibration regime in detail. The claims do not appear to rest on fitted quantities or circular definitions. This is useful reading for anyone working on practical fair classification who needs non-asymptotic guarantees rather than asymptotic rates. It is solid enough to merit a full referee process.

Referee Report

0 major / 2 minor

Summary. The paper proposes a unified post-processing framework for binary classification that delivers distribution-free, finite-sample fairness guarantees (for multiple group fairness notions) in both group-aware and group-blind settings. The method applies to arbitrary black-box models via output scores and a calibration set, controls excess risk, and is shown to be minimax rate-optimal up to logarithmic factors via a matching lower bound; empirical results on synthetic and real data are provided to support the theory.

Significance. If the finite-sample bounds and minimax result hold, the work is significant: it supplies explicit, assumption-light guarantees that address two practical barriers (distributional assumptions and group-blindness) while remaining compatible with existing ML pipelines, and the matching lower bound strengthens the optimality claim.

minor comments (2)

[Abstract / §1] The abstract and introduction would benefit from a short explicit statement of the precise fairness notions covered and the exact form of the excess-risk bound (e.g., in terms of sample size n and number of groups).
[§3] Notation for the calibration-set size versus training-set size should be introduced once and used consistently when stating the finite-sample bounds.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, recognition of the significance of the finite-sample and minimax results, and the recommendation of minor revision. No specific major comments were listed in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claims rest on an explicit post-processing construction applied to black-box model outputs plus a held-out calibration set, together with concentration inequalities that yield finite-sample distribution-free fairness bounds and a separate minimax lower bound argument. No step equates a derived quantity to its own fitted input by construction, renames a known empirical pattern, or reduces the optimality claim to a self-citation chain whose supporting result is itself unverified. The derivation therefore remains self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract identifies no explicit free parameters, axioms, or invented entities; review is limited to abstract text only.

pith-pipeline@v0.9.0 · 5758 in / 1204 out tokens · 39422 ms · 2026-05-23T19:36:57.200644+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a unified framework for fair classification that provides distribution-free and finite-sample fairness guarantees with controlled excess risk... post-processing procedure that can be applied to arbitrary black-box models
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

establish a minimax lower bound showing the minimax rate-optimality of our proposed algorithm up to logarithmic factors

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

finite-sample and distribution-free fair classification: Optimal excess risk-fairness trade-off and the cost of group-blindness

URL https://openreview.net/forum?id=FM5xfcaR2Y. Jaewoong Cho, Gyeongjo Hwang, and Changho Suh. A fair classifier using kernel density estimation. Advances in neural information processing systems , 33:15088–15099, 2020. 34 Evgenii Chzhen and Nicolas Schreuder. A minimax framework for quantifying risk-fairness trade-off in regression. The Annals of Statist...

work page arXiv 2020
[2]

Tim Van Erven and Peter Harremos

Proceedings, Part II 23 , pages 35–50. Springer, 2012. Michael P Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post- processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Con- ference on AI, Ethics, and Society , pages 247–254, 2019. Samory Kpotufe and Guillaume Martinet. Marginal singularity, and the benefits of ...

work page doi:10.1007/b13794 2012
[3]

demographic parity (DP) is UDP(f) = |P(Yf(X, A) = 1|A = 1) − P(Yf(X, A) = 1|A = 2)|,

work page
[4]

overall accuracy equality (OAE) is UOAE(f) =|P(Yf(X, A) = 1|A = 1, Y = 1) + P(Yf(X, A) = 0|A = 1, Y = 0) − P(Yf(X, A) = 1|A = 2, Y = 1) − P(Yf(X, A) = 0|A = 2, Y = 0)|,

work page
[5]

Another commonly used fairness notion is equalized odds as defined in Definition 8

predictive equality (PE) is UPE(f) = |P(Yf(X, A) = 1|A = 1, Y = 0) − P(Yf(X, A) = 1|A = 2, Y = 0)|. Another commonly used fairness notion is equalized odds as defined in Definition 8. However, since the unfairness measure corresponding to equalized odds can not be reduced in this way to the absolute value of a linear combination of conditional expectation...

work page
[6]

demographic parity is UDP(f) = max a∈[K] |P(Yf(X, A) = 1|A = a) − P(Yf(X, A) = 1)|,

work page
[7]

equalized odds is UEO(f) = max a∈[K] |P(Yf(X, A) = 1|A = a, Y = 1) − P(Yf(X, A) = 1|Y = 1)| ∨ |P(Yf(X, A) = 0|A = a, Y = 0) − P(Yf(X, A) = 0|Y = 0)| ,

work page
[8]

equality of opportunity is UEOO(f) = max a∈[K] |P(Yf(X, A) = 1|A = a, Y = 1) − P(Yf(X, A) = 1|Y = 1)|,

work page
[9]

overall accuracy equality is UOAE(f) = max a∈[K] P(Yf(X, A) = 1|A = a, Y = 1) + P(Yf(X, A) = 0|A = a, Y = 0) − P(Yf(X, A) = 1|Y = 1) − P(Yf(X, A) = 0|Y = 0) ,

work page
[10]

41 A.2.1 Post-processing Algorithm In this section, we propose a general algorithm for various fairness notions with fairness and excess risk guarantee

predictive equality is UPE(f) = max a∈[K] |P(Yf(X, A) = 1|A = a, Y = 0) − P(Yf(X, A) = 1|Y = 0)|. 41 A.2.1 Post-processing Algorithm In this section, we propose a general algorithm for various fairness notions with fairness and excess risk guarantee. For the unfairness measures in Definition 8, the vector norms ∥ · ∥ and ∥ · ∥∗ in Proposition 2 equal to t...

work page
[11]

If D0 ≤ −˜ϵG η − 2ϵα, we set ˜ϵα = 0

work page
[12]

(23) Therefore, when D0 ≤ −˜ϵG η − 2ϵα, we have ˜α = α and λ∗ ˜α = λ∗ α = 0

If D0 > −˜ϵG η − 2ϵα, we choose ˜ϵα such that ˜ϵα ≥ 2ϵα + ˜ϵG g,˜α. (23) Therefore, when D0 ≤ −˜ϵG η − 2ϵα, we have ˜α = α and λ∗ ˜α = λ∗ α = 0. Remark 13. Now we give an example where Equation (23) is satisfied. Suppose the density of 2ηG(X, A) − 1 − λ⊤ΦG(X, A) is upper bounded for all λ ∈ R ˜K. Since ϕG k , k ∈ [ ˜K] are bounded, we have ˜ϵG g,˜α ≤ cϵη ...

work page 2004
[13]

(24) In Theorem 5, we compare the prediction error of ˆf G ˆλα with that of f ∗G ˜α , rather than f ∗G α

Under Assumption 12, we have with probability at least 1 − 2δpost, for any α ≥ ˜ϵα, R( ˆf G ˆλα ) − R(f ∗G ˜α ) ≲ ∥λ∗G ˜α ∥1˜ϵα + ϵη + ∥λ∗G ˜α ∥1ϵϕ 1+˜γ + ˜K log n + log 1 δpost n 1+˜γ 2+˜γ . (24) In Theorem 5, we compare the prediction error of ˆf G ˆλα with that of f ∗G ˜α , rather than f ∗G α . So R( ˆf G ˆλα ) − R(f ∗G ˜α ) may not always be non-negat...

work page 2013
[14]

Then |λ∗ α| = 0 is the smallest non-negative real number λ+ such that sEϕG(X, A)1 2ηG(X, A) − 1 > sλ+ϕG(X, A) ≤ α

If λ∗ α = 0, we have |EϕG(X, A)1 2ηG(X, A) > 1 | ≤ α. Then |λ∗ α| = 0 is the smallest non-negative real number λ+ such that sEϕG(X, A)1 2ηG(X, A) − 1 > sλ+ϕG(X, A) ≤ α

work page
[15]

If λ∗ α ̸= 0, we know sλEϕG(X, A)1 2ηG(X, A) − 1 > s λ|λ∗ α|ϕG(X, A) = α. Due to the non-increasing property of λ+ → sλEϕG(X, A)1 2ηG(X, A) − 1 > s λλ+ϕG(X, A) , we get sλEϕG(X, A)1 2ηG(X, A) > 1 ≥ sλEϕG(X, A)1 2ηG(X, A)−1 > s λ|λ∗ α|ϕG(X, A) ≥ α, which implies s = sλ. In the following, we prove that |λ∗ α| is the smallest non-negative real number λ+ such...

work page 2013
[16]

69 SupposeP j∈I 1 (σj = 1) = Cσm for some constant Cσ > 0

So pX is piecewise uniform. 69 SupposeP j∈I 1 (σj = 1) = Cσm for some constant Cσ > 0. For the specified distribu- tion, if we denote ∆ = CψmωM −βA, then pY = Eη(X) = 1 2 , p σ 1,1 = Eρσ 1|1(X)η(X) = 1 4 + 1 2 − 7 12 Cη Cρ + 2Cη(1 − 2Cσ)∆. So Assumption 7 is satisfied if Cη, Cρ and ∆ are small enough

work page
[17]

At first, we verify the group-blind assumptions. On the support of pX, ϕσ equals pσ 1,1(pY − pσ 1,1)ϕσ(x) =(pY ρσ 1|1(x) − pσ 1,1)η(x) =    − Cη − ˜Cη( 1 7 − x1) d γ (1 − 7 12 Cη)Cρ + 2Cη(1 − 2Cσ)∆ , if x ∈ B1( e1 14 , 1 14), −Cη (1 − 7 12 Cη)Cρ + 2Cη(1 − 2Cσ)∆ + 1 2 σjM −βAψ(M(x − nM(x))) , if x ∈ X j, j ∈ I , − Cη + ˜Cη(x1 − 2 7) d γ (1...

work page
[18]

Then we verify the group-aware assumptions. Note that ηaware(x, a) = ηblind(x)ρa|1(x) ηblind(x)ρa|1(x) + (1 − ηblind(x))ρa|0(x) , 72 then it is straightforward to verify that ηaware(·, 1), ηaware(·, 2) are also βY -H¨ older smooth. Moreover, we have U 1 (2ηaware(X, A) > 1) = 0. Therefore g∗aware α (x, a) = 2 ηaware(x, a) − 1. Similar to the group-blind sc...

work page 2009
[19]

We start from the group-blind assumptions. Now we have on the support of px, ϕσ = pσ Y ρ1|1−pσ 1,1 pσ 1,1(pσ Y −pσ 1,1) ησ equals pσ 1,1(pσ Y − pσ 1,1)ϕσ(x) =    − (1 − 7 12 Cη)Cρ + (1 2 − 2(1 − 2Cσ)∆) ˜Cρ( 1 7 − x1) d γ Cη, if x ∈ B1( e1 14 , 1 14), −(1 − 7 12 Cη)Cρ Cη + σjM −βY ψ(M(x − nM(x))) , if x ∈ X j, j ∈ I , − (1 − 7 12 Cη)Cρ − (...

work page
[20]

Then we verify the group-aware assumptions. Similar to the analysis of ρ1|1, it is straightforward to verify thatηaware(·, a) are βY -H¨ older smooth,U 1 (2ηaware(X, A) > 1) = 0, so g∗aware α (x, a) = 2 ηaware(x, a) − 1, and the group-aware Assumptions 2, 3, 4 are also satisfied. Then we are ready to prove the lower bound. For the same Ω defined in the an...

work page 2007
[21]

At first, we verify the group-blind assumptions. On the support of pX, ϕ equals p1,1(pY − p1,1)ϕ(x) =    −{Cη − ˜Cη( 1 7 − x1) d γ }(1 − 7 12 Cη)Cρ, if ∥x−1∥1 ≤ 1 7 − x1, x1 ∈ [0, 1 7], −{Cη + ˜Cη(x1 − 1 7) d γ }(1 − 7 12 Cη)Cρ, if ∥x−1∥1 ≤ x1 − 1 7 , x1 ∈ [ 1 7 , 1 7 + CX], −{Cη + 2 ˜CηC d γ X − ˜Cη( 2 7 − x1) d γ }(1 − 7 12 Cη...

work page
[22]

Then we verify the group-aware assumptions. Similar to the analysis of the er- rors of ρ1|1 and η, we can show ηaware P (·, a), ηaware¯P (·, a) are both βY -H¨ older smooth, where ηaware P (X, A) = PP (Y = 1|X, A), and UP 1 (2ηaware P (X, A) > 1) = 0, U ¯P 1 (2ηaware¯P (X, A) > 1) =( 1 2 + ¯Cρ)(1 − Cη) ¯p1,1 − 3 4 − 3 2 ¯Cρ − 1 2 Cη + 3 4 ¯CρCη ¯p1,2 + µ ...

work page 2019

[1] [1]

finite-sample and distribution-free fair classification: Optimal excess risk-fairness trade-off and the cost of group-blindness

URL https://openreview.net/forum?id=FM5xfcaR2Y. Jaewoong Cho, Gyeongjo Hwang, and Changho Suh. A fair classifier using kernel density estimation. Advances in neural information processing systems , 33:15088–15099, 2020. 34 Evgenii Chzhen and Nicolas Schreuder. A minimax framework for quantifying risk-fairness trade-off in regression. The Annals of Statist...

work page arXiv 2020

[2] [2]

Tim Van Erven and Peter Harremos

Proceedings, Part II 23 , pages 35–50. Springer, 2012. Michael P Kim, Amirata Ghorbani, and James Zou. Multiaccuracy: Black-box post- processing for fairness in classification. In Proceedings of the 2019 AAAI/ACM Con- ference on AI, Ethics, and Society , pages 247–254, 2019. Samory Kpotufe and Guillaume Martinet. Marginal singularity, and the benefits of ...

work page doi:10.1007/b13794 2012

[3] [3]

demographic parity (DP) is UDP(f) = |P(Yf(X, A) = 1|A = 1) − P(Yf(X, A) = 1|A = 2)|,

work page

[4] [4]

overall accuracy equality (OAE) is UOAE(f) =|P(Yf(X, A) = 1|A = 1, Y = 1) + P(Yf(X, A) = 0|A = 1, Y = 0) − P(Yf(X, A) = 1|A = 2, Y = 1) − P(Yf(X, A) = 0|A = 2, Y = 0)|,

work page

[5] [5]

Another commonly used fairness notion is equalized odds as defined in Definition 8

predictive equality (PE) is UPE(f) = |P(Yf(X, A) = 1|A = 1, Y = 0) − P(Yf(X, A) = 1|A = 2, Y = 0)|. Another commonly used fairness notion is equalized odds as defined in Definition 8. However, since the unfairness measure corresponding to equalized odds can not be reduced in this way to the absolute value of a linear combination of conditional expectation...

work page

[6] [6]

demographic parity is UDP(f) = max a∈[K] |P(Yf(X, A) = 1|A = a) − P(Yf(X, A) = 1)|,

work page

[7] [7]

equalized odds is UEO(f) = max a∈[K] |P(Yf(X, A) = 1|A = a, Y = 1) − P(Yf(X, A) = 1|Y = 1)| ∨ |P(Yf(X, A) = 0|A = a, Y = 0) − P(Yf(X, A) = 0|Y = 0)| ,

work page

[8] [8]

equality of opportunity is UEOO(f) = max a∈[K] |P(Yf(X, A) = 1|A = a, Y = 1) − P(Yf(X, A) = 1|Y = 1)|,

work page

[9] [9]

overall accuracy equality is UOAE(f) = max a∈[K] P(Yf(X, A) = 1|A = a, Y = 1) + P(Yf(X, A) = 0|A = a, Y = 0) − P(Yf(X, A) = 1|Y = 1) − P(Yf(X, A) = 0|Y = 0) ,

work page

[10] [10]

41 A.2.1 Post-processing Algorithm In this section, we propose a general algorithm for various fairness notions with fairness and excess risk guarantee

predictive equality is UPE(f) = max a∈[K] |P(Yf(X, A) = 1|A = a, Y = 0) − P(Yf(X, A) = 1|Y = 0)|. 41 A.2.1 Post-processing Algorithm In this section, we propose a general algorithm for various fairness notions with fairness and excess risk guarantee. For the unfairness measures in Definition 8, the vector norms ∥ · ∥ and ∥ · ∥∗ in Proposition 2 equal to t...

work page

[11] [11]

If D0 ≤ −˜ϵG η − 2ϵα, we set ˜ϵα = 0

work page

[12] [12]

(23) Therefore, when D0 ≤ −˜ϵG η − 2ϵα, we have ˜α = α and λ∗ ˜α = λ∗ α = 0

If D0 > −˜ϵG η − 2ϵα, we choose ˜ϵα such that ˜ϵα ≥ 2ϵα + ˜ϵG g,˜α. (23) Therefore, when D0 ≤ −˜ϵG η − 2ϵα, we have ˜α = α and λ∗ ˜α = λ∗ α = 0. Remark 13. Now we give an example where Equation (23) is satisfied. Suppose the density of 2ηG(X, A) − 1 − λ⊤ΦG(X, A) is upper bounded for all λ ∈ R ˜K. Since ϕG k , k ∈ [ ˜K] are bounded, we have ˜ϵG g,˜α ≤ cϵη ...

work page 2004

[13] [13]

(24) In Theorem 5, we compare the prediction error of ˆf G ˆλα with that of f ∗G ˜α , rather than f ∗G α

Under Assumption 12, we have with probability at least 1 − 2δpost, for any α ≥ ˜ϵα, R( ˆf G ˆλα ) − R(f ∗G ˜α ) ≲ ∥λ∗G ˜α ∥1˜ϵα + ϵη + ∥λ∗G ˜α ∥1ϵϕ 1+˜γ + ˜K log n + log 1 δpost n 1+˜γ 2+˜γ . (24) In Theorem 5, we compare the prediction error of ˆf G ˆλα with that of f ∗G ˜α , rather than f ∗G α . So R( ˆf G ˆλα ) − R(f ∗G ˜α ) may not always be non-negat...

work page 2013

[14] [14]

Then |λ∗ α| = 0 is the smallest non-negative real number λ+ such that sEϕG(X, A)1 2ηG(X, A) − 1 > sλ+ϕG(X, A) ≤ α

If λ∗ α = 0, we have |EϕG(X, A)1 2ηG(X, A) > 1 | ≤ α. Then |λ∗ α| = 0 is the smallest non-negative real number λ+ such that sEϕG(X, A)1 2ηG(X, A) − 1 > sλ+ϕG(X, A) ≤ α

work page

[15] [15]

If λ∗ α ̸= 0, we know sλEϕG(X, A)1 2ηG(X, A) − 1 > s λ|λ∗ α|ϕG(X, A) = α. Due to the non-increasing property of λ+ → sλEϕG(X, A)1 2ηG(X, A) − 1 > s λλ+ϕG(X, A) , we get sλEϕG(X, A)1 2ηG(X, A) > 1 ≥ sλEϕG(X, A)1 2ηG(X, A)−1 > s λ|λ∗ α|ϕG(X, A) ≥ α, which implies s = sλ. In the following, we prove that |λ∗ α| is the smallest non-negative real number λ+ such...

work page 2013

[16] [16]

69 SupposeP j∈I 1 (σj = 1) = Cσm for some constant Cσ > 0

So pX is piecewise uniform. 69 SupposeP j∈I 1 (σj = 1) = Cσm for some constant Cσ > 0. For the specified distribu- tion, if we denote ∆ = CψmωM −βA, then pY = Eη(X) = 1 2 , p σ 1,1 = Eρσ 1|1(X)η(X) = 1 4 + 1 2 − 7 12 Cη Cρ + 2Cη(1 − 2Cσ)∆. So Assumption 7 is satisfied if Cη, Cρ and ∆ are small enough

work page

[17] [17]

At first, we verify the group-blind assumptions. On the support of pX, ϕσ equals pσ 1,1(pY − pσ 1,1)ϕσ(x) =(pY ρσ 1|1(x) − pσ 1,1)η(x) =    − Cη − ˜Cη( 1 7 − x1) d γ (1 − 7 12 Cη)Cρ + 2Cη(1 − 2Cσ)∆ , if x ∈ B1( e1 14 , 1 14), −Cη (1 − 7 12 Cη)Cρ + 2Cη(1 − 2Cσ)∆ + 1 2 σjM −βAψ(M(x − nM(x))) , if x ∈ X j, j ∈ I , − Cη + ˜Cη(x1 − 2 7) d γ (1...

work page

[18] [18]

Then we verify the group-aware assumptions. Note that ηaware(x, a) = ηblind(x)ρa|1(x) ηblind(x)ρa|1(x) + (1 − ηblind(x))ρa|0(x) , 72 then it is straightforward to verify that ηaware(·, 1), ηaware(·, 2) are also βY -H¨ older smooth. Moreover, we have U 1 (2ηaware(X, A) > 1) = 0. Therefore g∗aware α (x, a) = 2 ηaware(x, a) − 1. Similar to the group-blind sc...

work page 2009

[19] [19]

We start from the group-blind assumptions. Now we have on the support of px, ϕσ = pσ Y ρ1|1−pσ 1,1 pσ 1,1(pσ Y −pσ 1,1) ησ equals pσ 1,1(pσ Y − pσ 1,1)ϕσ(x) =    − (1 − 7 12 Cη)Cρ + (1 2 − 2(1 − 2Cσ)∆) ˜Cρ( 1 7 − x1) d γ Cη, if x ∈ B1( e1 14 , 1 14), −(1 − 7 12 Cη)Cρ Cη + σjM −βY ψ(M(x − nM(x))) , if x ∈ X j, j ∈ I , − (1 − 7 12 Cη)Cρ − (...

work page

[20] [20]

Then we verify the group-aware assumptions. Similar to the analysis of ρ1|1, it is straightforward to verify thatηaware(·, a) are βY -H¨ older smooth,U 1 (2ηaware(X, A) > 1) = 0, so g∗aware α (x, a) = 2 ηaware(x, a) − 1, and the group-aware Assumptions 2, 3, 4 are also satisfied. Then we are ready to prove the lower bound. For the same Ω defined in the an...

work page 2007

[21] [21]

At first, we verify the group-blind assumptions. On the support of pX, ϕ equals p1,1(pY − p1,1)ϕ(x) =    −{Cη − ˜Cη( 1 7 − x1) d γ }(1 − 7 12 Cη)Cρ, if ∥x−1∥1 ≤ 1 7 − x1, x1 ∈ [0, 1 7], −{Cη + ˜Cη(x1 − 1 7) d γ }(1 − 7 12 Cη)Cρ, if ∥x−1∥1 ≤ x1 − 1 7 , x1 ∈ [ 1 7 , 1 7 + CX], −{Cη + 2 ˜CηC d γ X − ˜Cη( 2 7 − x1) d γ }(1 − 7 12 Cη...

work page

[22] [22]

Then we verify the group-aware assumptions. Similar to the analysis of the er- rors of ρ1|1 and η, we can show ηaware P (·, a), ηaware¯P (·, a) are both βY -H¨ older smooth, where ηaware P (X, A) = PP (Y = 1|X, A), and UP 1 (2ηaware P (X, A) > 1) = 0, U ¯P 1 (2ηaware¯P (X, A) > 1) =( 1 2 + ¯Cρ)(1 − Cη) ¯p1,1 − 3 4 − 3 2 ¯Cρ − 1 2 Cη + 3 4 ¯CρCη ¯p1,2 + µ ...

work page 2019