Deployment of AI-Assisted Interventions: Capacity Constraints and Noisy Compliance

Benjamin L. Ranard; Carri W. Chan; Hannah Li; Yi Han

arxiv: 2604.14370 · v1 · submitted 2026-04-15 · 📊 stat.ME · cs.LG

Deployment of AI-Assisted Interventions: Capacity Constraints and Noisy Compliance

Carri W. Chan , Yi Han , Hannah Li , Benjamin L. Ranard This is my paper

Pith reviewed 2026-05-10 12:12 UTC · model grok-4.3

classification 📊 stat.ME cs.LG

keywords capacity constraintsAI deploymentalgorithmic selectionpredictive modelshealthcare interventionsoperational metricsthreshold optimizationnoisy compliance

0 comments

The pith

When capacity is limited, thresholds and algorithms chosen for predictive accuracy are generally suboptimal for AI-guided interventions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines settings where an AI score triggers outreach to encourage requests for scarce service slots, such as medical care or educational support, but only some requesters can be served. Standard practice maximizes accuracy or AUC across all possible cutoffs, yet the authors show this ignores how capacity forces trade-offs between filling all available slots and avoiding competition that wastes outreach on similar high-scorers. They derive the capacity-aware threshold that optimally balances utilization against cannibalization of requests, prove accuracy-based rules fail to achieve it in general, and propose Operational AUC to rank algorithms by their true operational value instead.

Core claim

Under fixed capacity and probabilistic compliance, the optimal outreach threshold solves a balance between ensuring all slots are used and minimizing redundant requests from near-identical candidates; policies that set thresholds solely to maximize predictive accuracy are suboptimal except in knife-edge cases. Because the optimal cutoff shifts with available capacity, metrics like AUC that average uniformly over thresholds mis-rank algorithms for real performance. Operational AUC (OpAUC) corrects this misalignment and produces superior algorithm selection, as shown in a sepsis early-warning case study where the gains are quantifiable.

What carries the argument

Operational AUC (OpAUC), a capacity-dependent performance metric that ranks algorithms by the expected value of the service allocation they induce rather than by uniform accuracy across thresholds.

If this is right

Optimal score thresholds must be adjusted whenever service capacity changes.
Algorithm selection for deployment should replace AUC with OpAUC to maximize realized service value.
Ignoring capacity leads to either unused slots or excessive overlap in requests from similar individuals.
The same framework applies to any domain with scored outreach and constrained follow-up resources.
Case-study results quantify the operational gains achievable by switching to the new metric and threshold rule.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be extended to settings where capacity itself varies over time or can be expanded based on demand signals.
If compliance probabilities also depend on individual covariates beyond the score, the optimal threshold rule would require additional conditioning.
Deployments could test the model by running parallel arms with accuracy-based versus OpAUC-based policies and measuring served-value differences directly.

Load-bearing premise

The model assumes that each individual's probability of requesting service depends only on their score and is independent of others' scores or the total capacity level.

What would settle it

Collect deployment data on actual request rates and served outcomes; if the observed allocation value under an accuracy-maximizing threshold equals or exceeds the value under the capacity-adjusted threshold derived in the paper, the suboptimality result does not hold.

read the original abstract

AI tools increasingly guide targeted interventions in healthcare, education, and recruiting. Algorithms score individuals, trigger outreach to those above a threshold (e.g., high-risk or high-value), and encourage them to request service; then providers deliver service to those who request. Standard practice sets the threshold and selects the algorithm to maximize predictive accuracy, assuming that better predictions yield better outcomes. We show that this approach is suboptimal when limited service capacity and probabilistic behavioral responses influence who receives service. In such settings, the optimal score threshold must balance two effects: ensuring all capacity is filled (utilization) and ensuring high-value individuals are served despite competition between requests (cannibalization). We characterize the optimal threshold and prove that policies based solely on predictive accuracy are generally suboptimal. Further, because optimal thresholds vary with service capacity, algorithm selection metrics like AUC, which weight all thresholds equally, are misaligned with operational performance. We introduce a new metric--Operational AUC (OpAUC)--and show it leads to optimal algorithm selection. Finally, we conduct a case study on sepsis early warning data and illustrate the magnitude of improvement that can be achieved from improved threshold and algorithm selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Accuracy-based thresholds are suboptimal under capacity limits and probabilistic requests, but the optimality proof and OpAUC edge rest on specific parametric assumptions about request behavior.

read the letter

The core point here is that in settings where capacity is fixed and individuals request service with probability increasing in their score, the usual practice of setting thresholds or picking models by predictive accuracy alone leaves money on the table. The paper derives the threshold that balances filling the slots (utilization) against high-value people getting crowded out (cannibalization), and shows that accuracy-maximizers generally miss it. They then build OpAUC as a capacity-aware selection criterion that integrates over the relevant operating points rather than averaging all thresholds equally. That framing is clean and directly useful for anyone deploying scores in healthcare or similar constrained domains. The sepsis early-warning case study gives a concrete sense of the gap, which is helpful even if illustrative. The modeling is explicit and the math follows from the stated assumptions without obvious internal contradictions. The main limitation is that everything turns on the request probability being strictly monotone in score, independent across people, and on capacity being allocated randomly or proportionally among requesters. If requests are deterministic, correlated, or if providers ration differently, the claimed suboptimality of accuracy policies and the superiority of OpAUC do not necessarily carry over. The abstract and stress-test note both flag this, and without seeing the full derivations or robustness checks it is hard to judge how sensitive the results are. This is worth sending to referees who work on operational AI and constrained resource allocation; they can pressure-test the behavioral model and ask for sensitivity plots. Readers who care about moving from pure prediction to deployment value will find it worth their time, though it is more of a modeling contribution than a broad empirical one.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that in AI-assisted intervention settings with fixed service capacity and noisy (probabilistic) compliance, thresholds and algorithms optimized for predictive accuracy are suboptimal. It provides a characterization of the optimal score threshold that accounts for utilization and cannibalization effects, proves the general suboptimality of accuracy-based policies, proposes Operational AUC (OpAUC) as an alternative metric that aligns with operational performance, and illustrates the approach with a sepsis early warning case study.

Significance. If the results hold under the stated model, this work identifies a misalignment between standard ML evaluation practices (AUC maximization) and operational outcomes when capacity is constrained and responses are probabilistic. The explicit characterization of the optimal threshold, the proof of suboptimality, and the introduction of OpAUC provide a principled framework for deployment decisions in healthcare and similar domains. Credit is due for the operational modeling of request behavior and capacity competition, the focus on capacity-varying metrics, and the concrete case-study illustration of potential gains.

major comments (3)

[§3.2] §3.2 (optimal threshold derivation): the characterization of τ* as the solution balancing utilization and cannibalization rests on the assumption that each individual requests independently with p(s_i) strictly increasing in score s_i and that capacity is allocated randomly among requesters. The manuscript should state whether the optimality condition (and the subsequent suboptimality proof) continues to hold under alternative request mechanisms such as deterministic thresholds or correlated requests across individuals.
[§5] §5 (OpAUC definition and algorithm selection): OpAUC is constructed by integrating performance over a range of capacity levels using the same request-probability model. The claim that it leads to optimal algorithm selection therefore inherits the same parametric assumptions; a direct comparison showing that OpAUC recovers the true operational optimum (while AUC does not) should be provided for at least one non-logistic p(s) form to demonstrate robustness.
[Case-study section] Case-study section (sepsis data): the reported magnitude of improvement from OpAUC-based selection versus AUC-based selection must include the precise capacity levels examined, the rule for excluding patients or time periods, and error bars or sensitivity checks. Without these details it is impossible to assess whether post-hoc choices affect the claimed gains.

minor comments (2)

[§2] The notation for the request probability function p(s) and the capacity-allocation rule should be introduced with an explicit equation early in §2 rather than only in the optimality derivation.
[Figures] Figure captions for the sepsis results should state the exact capacity values plotted and whether the curves are averaged over multiple random seeds or data splits.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped clarify the scope and robustness of our results. We address each major point below and have revised the manuscript to incorporate the requested clarifications and additional analyses.

read point-by-point responses

Referee: [§3.2] §3.2 (optimal threshold derivation): the characterization of τ* as the solution balancing utilization and cannibalization rests on the assumption that each individual requests independently with p(s_i) strictly increasing in score s_i and that capacity is allocated randomly among requesters. The manuscript should state whether the optimality condition (and the subsequent suboptimality proof) continues to hold under alternative request mechanisms such as deterministic thresholds or correlated requests across individuals.

Authors: We agree that the derivation in §3.2 relies on independent requests with strictly increasing p(s_i) and random capacity allocation. The optimality condition and suboptimality proof are established under this model. For deterministic thresholds or correlated requests, the precise form of τ* may change because the utilization and cannibalization effects are altered. In the revised manuscript we have added a dedicated paragraph in §3.2 (and a short extension in the appendix) that explicitly states the maintained assumptions, shows that the general suboptimality of accuracy-based policies continues to hold under mild correlation structures, and notes that deterministic mechanisms would require a separate fixed-point characterization. These additions delineate the scope without altering the core claims. revision: yes
Referee: [§5] §5 (OpAUC definition and algorithm selection): OpAUC is constructed by integrating performance over a range of capacity levels using the same request-probability model. The claim that it leads to optimal algorithm selection therefore inherits the same parametric assumptions; a direct comparison showing that OpAUC recovers the true operational optimum (while AUC does not) should be provided for at least one non-logistic p(s) form to demonstrate robustness.

Authors: We acknowledge that the numerical illustration in §5 uses a logistic request model. While the theoretical definition of OpAUC is nonparametric in p(s) (requiring only monotonicity), we have added a new simulation subsection in the revised §5 that repeats the algorithm-selection experiment with a non-logistic form (piecewise-linear p(s)). In this experiment OpAUC continues to select the algorithm that maximizes realized operational value across capacity levels, whereas AUC does not. The added figure and accompanying text provide the direct robustness check requested. revision: yes
Referee: [Case-study section] Case-study section (sepsis data): the reported magnitude of improvement from OpAUC-based selection versus AUC-based selection must include the precise capacity levels examined, the rule for excluding patients or time periods, and error bars or sensitivity checks. Without these details it is impossible to assess whether post-hoc choices affect the claimed gains.

Authors: We have expanded the case-study section and its appendix to report: (i) results for capacity levels 5 %, 10 %, 20 %, 30 %, and 50 % of the daily patient volume; (ii) the explicit exclusion rule (patients with missing vital-sign covariates or time periods with incomplete EHR coverage are dropped, affecting <8 % of records); and (iii) bootstrap-derived 95 % intervals together with a sensitivity table that varies the exclusion threshold. The revised text and supplementary table confirm that the reported gains remain statistically distinguishable from zero and are insensitive to these choices. revision: yes

Circularity Check

0 steps flagged

No circularity; optimality and OpAUC derived from explicit model without self-referential reduction

full rationale

The paper constructs the optimal threshold by solving an explicit optimization balancing utilization (filling fixed capacity) against cannibalization (high-score individuals displaced by lower-score requesters) under the stated probabilistic request model p(s_i) and capacity allocation rule. OpAUC is then defined directly from the same operational objective rather than fitted to data or renamed from an existing metric. No load-bearing step equates a claimed result to its own inputs by definition, and the suboptimality proof for accuracy-based thresholds follows from the model assumptions without requiring self-citation chains or uniqueness theorems imported from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on domain assumptions about request generation and capacity competition; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Individuals respond probabilistically to outreach and compete for fixed service slots
Invoked to create the utilization-cannibalization tradeoff that makes accuracy suboptimal.
domain assumption Service capacity is strictly limited and known
Required for the threshold to depend on capacity level.

pith-pipeline@v0.9.0 · 5509 in / 1273 out tokens · 39589 ms · 2026-05-10T12:12:50.275578+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Forτ < τ c, we have ˜W ′(τ) =M ˜R′(τ)>0, so ˜Wis strictly increasing on [0, τc)

Ifτ c ≤τ ∗A score, then ˜R′(τ)>0 for allτ < τ c. Forτ < τ c, we have ˜W ′(τ) =M ˜R′(τ)>0, so ˜Wis strictly increasing on [0, τc). Forτ > τ c, we have ˜W ′(τ)<0, so ˜Wis strictly decreasing on (τ c,1]. Therefore, ˜W attains its maximum atτ ∗A =τ c = min{τ∗A score, τc}

work page
[2]

Forτ < τ ∗A score, we have ˜W ′(τ) =M ˜R′(τ)>0, and for τ ∗A score < τ < τ c, we have ˜W ′(τ) =M ˜R′(τ)<0

Ifτ c > τ ∗A score, then ˜R′(τ) = 0 atτ=τ ∗A score < τc. Forτ < τ ∗A score, we have ˜W ′(τ) =M ˜R′(τ)>0, and for τ ∗A score < τ < τ c, we have ˜W ′(τ) =M ˜R′(τ)<0. Forτ > τ c, we also have ˜W ′(τ) =−N∆ P E(r|ˆr A = qA(τ))<0.Therefore, ˜Wattains its maximum atτ ∗A =τ ∗A score = min{τ∗A score, τc}. Combining all three cases, we conclude that for capacity ra...

work page 1900

[1] [1]

Forτ < τ c, we have ˜W ′(τ) =M ˜R′(τ)>0, so ˜Wis strictly increasing on [0, τc)

Ifτ c ≤τ ∗A score, then ˜R′(τ)>0 for allτ < τ c. Forτ < τ c, we have ˜W ′(τ) =M ˜R′(τ)>0, so ˜Wis strictly increasing on [0, τc). Forτ > τ c, we have ˜W ′(τ)<0, so ˜Wis strictly decreasing on (τ c,1]. Therefore, ˜W attains its maximum atτ ∗A =τ c = min{τ∗A score, τc}

work page

[2] [2]

Forτ < τ ∗A score, we have ˜W ′(τ) =M ˜R′(τ)>0, and for τ ∗A score < τ < τ c, we have ˜W ′(τ) =M ˜R′(τ)<0

Ifτ c > τ ∗A score, then ˜R′(τ) = 0 atτ=τ ∗A score < τc. Forτ < τ ∗A score, we have ˜W ′(τ) =M ˜R′(τ)>0, and for τ ∗A score < τ < τ c, we have ˜W ′(τ) =M ˜R′(τ)<0. Forτ > τ c, we also have ˜W ′(τ) =−N∆ P E(r|ˆr A = qA(τ))<0.Therefore, ˜Wattains its maximum atτ ∗A =τ ∗A score = min{τ∗A score, τc}. Combining all three cases, we conclude that for capacity ra...

work page 1900