pith. sign in

arxiv: 2605.09945 · v1 · submitted 2026-05-11 · 💻 cs.LG

Selection of the Best Policy under Fairness Constraints for Subpopulations

Pith reviewed 2026-05-12 04:04 UTC · model grok-4.3

classification 💻 cs.LG
keywords policy selectionfairness constraintssubpopulationssample complexitybest arm identificationtrack-and-stopmulti-armed bandits
0
0 comments X

The pith

An algorithm selects the single best policy that meets minimum performance in every pre-specified subpopulation while using the fewest samples possible.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the Selection of the Best with Fairness Constraints problem, which asks for the policy with the highest average performance among those that clear a given threshold in each fixed subpopulation. It derives an instance-specific lower bound on the number of observations required and presents the Track-and-Stop with Constraints on Subpopulation algorithm that matches the bound asymptotically. The same guarantees hold for closed-set and penalty-based fairness rules. A reader would care because high-stakes decisions in health care and policy often require one uniform rule that still protects every group without collecting unnecessary data.

Core claim

We formalize the Selection of the Best with Fairness Constraints problem to identify the policy with the highest average performance among those meeting a minimum per-subpopulation threshold. We establish an instance-specific lower bound on its sample complexity and develop the Track-and-Stop with Constraints on Subpopulation algorithm that achieves this bound asymptotically. The framework extends to general closed-set and penalty-based fairness specifications with matching guarantees.

What carries the argument

The Track-and-Stop with Constraints on Subpopulation (T-a-S-CS) algorithm, which adaptively samples policies and subpopulations to verify fairness thresholds while stopping as soon as the optimal fair policy is identified with high probability.

If this is right

  • The algorithm minimizes the long-run number of samples needed for any fixed instance of the SBFC problem.
  • Matching lower bounds and algorithms apply directly to closed-set and penalty-based fairness specifications.
  • Numerical experiments and the International Stroke Trial case study show efficiency gains over policy-level allocation methods.
  • The approach supports committing to one policy with statistical guarantees that it satisfies every subgroup threshold.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The adaptive sampling logic could be applied to other constrained selection tasks such as identifying safe policies in reinforcement learning.
  • If subpopulations must instead be discovered from data rather than pre-specified, new lower bounds would be required.
  • The method suggests a route to sample-efficient auditing of deployed policies for compliance with group fairness rules.

Load-bearing premise

Subpopulations are fixed and known ahead of time, and performance observations within each subpopulation concentrate sufficiently to allow reliable estimation of means and thresholds.

What would settle it

On a concrete problem instance where the lower bound is computed explicitly, if the algorithm requires strictly more samples than the bound or returns an incorrect policy, the asymptotic optimality claim is false.

Figures

Figures reproduced from arXiv: 2605.09945 by Tingyu Zhu, Yuhang Wu, Zeyu Zheng.

Figure 1
Figure 1. Figure 1: Asymptotic scaling analysis. Left: Normalized stopping time ˆ𝜏 [PITH_FULL_IMAGE:figures/full_fig_p020_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Empirical sampling proportions at 𝛿 = 0.0005, displayed as a 5 × 3 heatmap (rows = policies, columns = subpopulations). Cell values are mean sampling proportions over 3000 replications. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Empirical PCS of T-a-S-CS (Gaussian) and T-a-S over sampling rounds at [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Theoretical sample complexity 𝑇 ∗ (𝝁, 𝛾) vs. penalty 𝛾. Dashed line: SBFC baseline; verti￾cal line at 𝛾 ∗ = 3: switch of the optimal penalized policy from policy 4 (infeasible) to policy 1 (fea￾sible). Formulation 𝑎 ∗ 𝜏ˆ𝛿 Std 𝑃ˆ 𝝁 SBFC 𝑎1 853 ± 5 271 0.999 𝛾-SBFC (𝛾 = 0.5) 𝑎4 969 ± 10 550 1.000 𝛾-SBFC (𝛾 = 1.0) 𝑎4 1,473 ± 17 946 0.998 𝛾-SBFC (𝛾 = 5) 𝑎1 3,179 ± 38 2,104 0.991 𝛾-SBFC (𝛾 = 10) 𝑎1 1,080 ± 8 44… view at source ↗
Figure 5
Figure 5. Figure 5: Empirical ˆ𝜏𝛿/ln(1/𝛿) versus theoretical 𝑇 ∗ (𝝁, 𝜀) for 𝜀 ∈ {0.1, 0.3, 0.6, 0.9, 1.2} at 𝛿 = 0.01. The dashed line is 𝑦 = 𝑥. Error bars show ±1 SE over 3000 replications. 𝜀 𝜇3,1 𝑇 ∗ (𝝁, 𝜀) 𝜏ˆ𝛿/ln(1/𝛿) (mean ± SE) Ratio to 𝑇 ∗ 0.1 −0.1 109.27 202.66 ± 1.34 1.85 0.3 −0.3 61.56 112.23 ± 0.90 1.82 0.6 −0.6 57.32 105.46 ± 0.87 1.84 0.9 −0.9 56.28 98.20 ± 0.86 1.75 1.2 −1.2 56.12 95.31 ± 0.83 1.70 [PITH_FULL_IM… view at source ↗
Figure 6
Figure 6. Figure 6: Empirical sampling proportions by policy and subpopulation for SBFC and [PITH_FULL_IMAGE:figures/full_fig_p051_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Mean sample allocations. T-a-S-CS significantly oversamples the two critical subpopulation cells [PITH_FULL_IMAGE:figures/full_fig_p052_7.png] view at source ↗
read the original abstract

Many high-stakes decisions in health care, public policy, and clinical development require committing to a single policy that will be applied uniformly across a heterogeneous population. Regulatory and fairness standards sometime requires that the chosen policy performs adequately in every pre-specified subpopulation, not only on average. We formalize this as a Selection of the Best with Fairness Constraints (SBFC) problem, in order to identify the policy with the highest average performance among those policies that meet a minimum per-subpopulation threshold. We establish an instance-specific lower bound on sample complexity of the SBFC problem. We then develop a Track-and-Stop with Constraints on Subpopulation (T-a-S-CS) algorithm that achieves the lower bound asymptotically. We extend the framework to general closed-set and penalty-based fairness specifications with matching guarantees. Numerical experiments and a case study using the International Stroke Trial demonstrate substantial efficiency gains over policy-level allocation baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper formalizes the Selection of the Best with Fairness Constraints (SBFC) problem: identify the policy with highest average performance among those meeting a minimum per-subpopulation performance threshold. It derives an instance-specific lower bound on sample complexity for this problem and proposes the Track-and-Stop with Constraints on Subpopulation (T-a-S-CS) algorithm that achieves the bound asymptotically. The framework is extended to general closed-set and penalty-based fairness specifications, with numerical experiments and a case study on the International Stroke Trial showing efficiency gains over policy-level baselines.

Significance. If the lower bound and asymptotic optimality hold, the work advances fair multi-armed bandit methods by delivering an instance-specific, asymptotically optimal algorithm for subpopulation-constrained policy selection. This is directly relevant to high-stakes applications in healthcare and policy. The explicit credit for building on the established Track-and-Stop framework and providing matching guarantees for extended fairness specifications strengthens the contribution.

minor comments (1)
  1. The abstract states that subpopulations are 'pre-specified' but does not explicitly list the concentration assumptions used for the lower bound derivation; a one-sentence clarification in §2 or §3 would improve readability.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, accurate summary of the SBFC formalization and T-a-S-CS algorithm, and recommendation to accept. We are pleased that the instance-specific lower bound, asymptotic optimality, and extensions to closed-set and penalty-based fairness were viewed as advancing fair multi-armed bandit methods with relevance to high-stakes applications.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper derives an instance-specific lower bound on sample complexity for the SBFC problem via standard information-theoretic arguments typical in bandit literature, then constructs the T-a-S-CS algorithm to match this bound asymptotically. No self-definitional reductions, fitted parameters renamed as predictions, or load-bearing self-citation chains appear in the claims. The lower bound and algorithm are presented as independent of each other, with the bound serving as an external benchmark that the algorithm is shown to achieve. Extensions to general fairness specifications follow the same non-circular structure. This is a standard, self-contained derivation in the field.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; no explicit free parameters or invented entities are named. The framework implicitly relies on standard statistical assumptions for sequential sampling.

axioms (1)
  • standard math Performance observations within each subpopulation satisfy concentration inequalities sufficient for reliable mean estimation and threshold checking.
    Required to establish finite-sample and asymptotic sample-complexity bounds.

pith-pipeline@v0.9.0 · 5451 in / 1243 out tokens · 27604 ms · 2026-05-12T04:04:59.252552+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

  1. [1]

    R´emy Degenne and Wouter M Koolen

    doi: 10.1287/opre.2021.0662. R´emy Degenne and Wouter M Koolen. Pure exploration with multiple correct answers.Advances in Neural Information Processing Systems, 32,

  2. [2]

    Guideline on the investigation of subgroups in confirmatory clinical trials

    European Medicines Agency. Guideline on the investigation of subgroups in confirmatory clinical trials. EMA/CHMP/539146/2013,

  3. [3]

    Julian Katz-Samuels and Clayton Scott

    doi: 10.1016/S0140-6736(97)04011-7. Julian Katz-Samuels and Clayton Scott. Top feasible arm identification. InProceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS), pages 1593–1601. PMLR,

  4. [4]

    Andreas W ¨achter and Lorenz T Biegler

    doi: 10.1186/1745-6215-12-101. Andreas W ¨achter and Lorenz T Biegler. On the implementation of an interior-point filter line-search algorithm for large-scale nonlinear programming.Mathematical Programming, 106(1):25–57,

  5. [5]

    doi: 10.1287/opre.2020

  6. [6]

    A Proof of the Lower bound Results A.1 Proof of Theorem 5 The proof of Theorem 5 is based on the change of measure inequality presented in Lemma 1 of Kaufmann et al. (2016). We first restate the lemma for clarity, adapting its binary relative entropy notation d(·,·)to kl(·,·). Lemma 17(Lemma 1 in Kaufmann et al. (2016)).Let𝜈and𝜈 ′ be two bandit models wit...

  7. [7]

    Lemma 6 states𝑓 ∗ 𝝁 (w)=min 𝑓 opt 𝝁 (w), 𝑓 feas 𝝁 (w)

    AssumeC (𝝁)≠∅and𝑘 ∗(𝝁)=1. Lemma 6 states𝑓 ∗ 𝝁 (w)=min 𝑓 opt 𝝁 (w), 𝑓 feas 𝝁 (w) . Case 1:𝑓 opt 𝝁 (w)< 𝑓 feas 𝝁 (w)Here,𝑓 ∗ 𝝁 (w)=𝑓 opt(𝑘) 𝝁 (w)for𝑘=arg min 𝑖 {𝑓 opt(𝑖) 𝝁 (w)} 𝑖≠1. Let𝝀 ∗ ∈ A (𝝁) be the minimizer attaining this value in (9). The vectorc(w)defined in the lemma has components 𝑐𝑖,𝑙 =d(𝜇 𝑖,𝑙 , 𝜆∗ 𝑖,𝑙 )for𝑖∈ {1, 𝑘}and𝑐 𝑖,𝑙 =0 otherwise. Thus,c(...

  8. [8]

    We begin with the𝛿-PAC guarantee

    The proof consists of two main parts: establishing the𝛿-PAC guarantee and proving the asymptotic optimality. We begin with the𝛿-PAC guarantee. The𝛿-PAC property ensures that the algorithm returns a correct answer with probability at least 1−𝛿. This guarantee stems from the choice of the stopping rule and the threshold function𝛽(𝑡, 𝛿). The stopping statist...

  9. [9]

    Second, the constraint map for the maximization isΦ(𝝁)= Σ 𝐾×𝐿

    First, the parameter spaceS ⊆R 𝐾 𝐿 and the choice spaceΣ 𝐾×𝐿 ⊂R 𝐾 𝐿 are standard Hausdorff topological spaces. Second, the constraint map for the maximization isΦ(𝝁)= Σ 𝐾×𝐿 . This map is constant. The setΣ 𝐾×𝐿 is a non-empty, closed, and bounded subset ofR 𝐾 𝐿, hence it is compact. Thus, condition (a) of Theorem 18 is satisfied. Next, we require the objec...

  10. [10]

    (2014), which provides conditions for the continuity of an infimum value function when the constraint set depends on the parameter

    D.2 Proof of Lemma 19 We use a theorem from Feinberg et al. (2014), which provides conditions for the continuity of an infimum value function when the constraint set depends on the parameter. Theorem 20(Feinberg et al. (2014)).Let𝑋and𝑌be Hausdorff topological spaces, with𝑋compactly generated. Assume that (i) The mapΦ:𝑋→S(𝑌)is lower hemicontinuous. (ii) Th...

  11. [11]

    Second, we consider the objective function𝑢(𝝁,𝝀)= Í 𝑘,𝑙 𝑤 𝑘,𝑙 𝑑(𝜇 𝑘,𝑙 , 𝜆𝑘,𝑙 )

    First, the parameter and choice spaces are compactly generated as it is a metric space, and therefore are Hausdorff. Second, we consider the objective function𝑢(𝝁,𝝀)= Í 𝑘,𝑙 𝑤 𝑘,𝑙 𝑑(𝜇 𝑘,𝑙 , 𝜆𝑘,𝑙 ). The KL divergence𝑑(𝑥, 𝑦) is continuous in both arguments within its domain. The K-inf-compactness condition requires that for any compact set𝐶⊂ M, the level set...

  12. [12]

    D.3 Proof of Corollary 13 The proof of this corollary follows the same structure as that of Theorem 10 in Appendix D.1

    Since all conditions of Theorem 20 are met for𝝁∈ S, we conclude that𝑓 ∗ 𝝁 (w)is continuous with respect to𝝁for any fixedw. D.3 Proof of Corollary 13 The proof of this corollary follows the same structure as that of Theorem 10 in Appendix D.1. The key distinction lies in the definition of the counter set and the internal optimization problem: under the gen...

  13. [13]

    D.4 Proof of Corollary 16 The proof of this corollary follows the same structure as that of Theorem 10 in Appendix D.1. The key distinction lies in the definition of optimality and the internal optimization problem: under soft fairness constraints with penalty parameters𝜸, the optimal policy is determined by maximizing penalized performance rather than sa...

  14. [14]

    42 E Numerical Experiment Supplementary Materials E.1 Optimization Subproblems under Gaussian KL-Divergence This appendix provides the detailed formulation of the optimization subproblems for the basic SBFC problem whenP 𝑘,𝑙 belongs to the Gaussian family with known variance𝜎 2 =1, as referenced in Section 3.4. Internal Optimization.With the Gaussian KL-d...

  15. [15]

    •Gradient and Hessian: The analytical gradient 𝜕dB 𝜕𝜆 = 𝜆−𝜇 𝜆(1−𝜆) and Hessian 𝜕2dB 𝜕𝜆2 = 𝜇 𝜆2 + 1−𝜇 (1−𝜆) 2 are provided to the solver for improved convergence

    with the following considerations: •Domain constraints: To avoid numerical issues at boundary values where dB(𝜇, 𝜆) → ∞, we constrain 𝜆𝑖,𝑙 ∈ [𝜖,1−𝜖]with𝜖=10 −6. •Gradient and Hessian: The analytical gradient 𝜕dB 𝜕𝜆 = 𝜆−𝜇 𝜆(1−𝜆) and Hessian 𝜕2dB 𝜕𝜆2 = 𝜇 𝜆2 + 1−𝜇 (1−𝜆) 2 are provided to the solver for improved convergence. •Initialization: We initialize𝜆 𝑖,...

  16. [16]

    Policy 1 is the optimal feasible policy with ¯𝜇1 =0.550

    and five are infeasible (policies 4–8). Policy 1 is the optimal feasible policy with ¯𝜇1 =0.550. Policy 4 achieves the highest overall performance ¯𝜇4 =0.700 but is infeasible due to𝜇 4,1 =0.05<0.2. Observations are Gaussian with𝜎=0.5. E.6.2 Weight Allocation Figure 6 presents the empirical sampling proportions for the SBFC and𝛾-SBFC formulations describe...

  17. [17]

    For𝛾∈ {0.5,1.0}< 𝛾 ∗, the optimal penalized policy is𝑎 4 (infeasible under hard constraints); for 𝛾∈ {5,10}> 𝛾 ∗, it is𝑎 1 (same as SBFC)

    0.000 0.025 0.050 0.075 0.100 0.125 0.150 0.175 0.200 Sampling proportion Figure 6: Empirical sampling proportions by policy and subpopulation for SBFC and𝛾-SBFC formulations (𝛿=0.2). For𝛾∈ {0.5,1.0}< 𝛾 ∗, the optimal penalized policy is𝑎 4 (infeasible under hard constraints); for 𝛾∈ {5,10}> 𝛾 ∗, it is𝑎 1 (same as SBFC). 51 Age < 80 Age >= 80 Aspirin only...