Sequential Sensitivity Analysis for Multiple Assumptions: A Framework for Understanding Racial Disparity in Police Use of Force

Jake Bowers; Luke Miratrix; Thomas Leavitt

arxiv: 2605.21893 · v2 · pith:K6FKWPFWnew · submitted 2026-05-21 · 📊 stat.ME · stat.AP

Sequential Sensitivity Analysis for Multiple Assumptions: A Framework for Understanding Racial Disparity in Police Use of Force

Thomas Leavitt , Jake Bowers , Luke Miratrix This is my paper

Pith reviewed 2026-05-25 05:52 UTC · model grok-4.3

classification 📊 stat.ME stat.AP

keywords sensitivity analysiscausal inferenceracial disparitypolice use of forceconfoundingNYPD datasequential analysis

0 comments

The pith

A sequential sensitivity analysis shows racial disparity in police force holds under stop discrimination but is fragile to small encounter biases.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper creates a sequential sensitivity analysis method to examine two assumptions needed to claim racial discrimination in police use of force from observed disparities. The first assumption is that officers do not discriminate in deciding whom to stop. The second is that, given patrol context, the chance an encounter involves a minority civilian stays constant. Applying the method to New York Police Department data, the authors show substantial disparity persists under realistic stop discrimination but disappears with small, demographically plausible violations of the encounter assumption. Readers should care because separate checks on each assumption miss how they combine to shape conclusions about discrimination.

Core claim

The authors introduce a sequential sensitivity analysis framework to jointly assess two assumptions required to infer racial discrimination in police use of force: no discrimination in stops and no bias in encounters conditional on patrol context. Applying it to NYPD Stop, Question, and Frisk data from 2003 to 2013, they find substantial racial disparity under plausible levels of discrimination in stops. Yet this disparity's attribution to discrimination in force proves fragile when allowing modest departures from no bias in encounters, departures that census-based calibration indicates are demographically feasible. The framework demonstrates how the two confounding channels interact in ways

What carries the argument

Sequential sensitivity analysis framework that varies the assumptions of no discrimination in stops and no bias in encounters one after the other.

If this is right

Substantial racial disparity in force remains even after allowing for plausible discrimination in stops.
The conclusion that disparity reflects discrimination in force is fragile to modest departures from no bias in encounters.
Census-based calibration identifies demographically feasible levels of bias in encounters.
Joint sequential variation reveals interactions between the two assumptions that separate analyses miss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could apply to other causal settings where multiple pre-treatment selection processes must be assessed together.
Collecting finer-grained data on patrol contexts might narrow the range of feasible encounter biases and strengthen or weaken the fragility result.
Interventions aimed at reducing force disparities may need to target both stop decisions and encounter dynamics rather than force alone.

Load-bearing premise

Census-based calibration can reliably indicate which departures from no bias in encounters are demographically feasible.

What would settle it

Direct measurements of the racial composition of encounters across patrol contexts that show the bias levels needed to erase the disparity exceed the range census calibration deems feasible.

Figures

Figures reproduced from arXiv: 2605.21893 by Jake Bowers, Luke Miratrix, Thomas Leavitt.

**Figure 2.** Figure 2: The top panel shows the observed data, which omit all nonforce white-civilian [PITH_FULL_IMAGE:figures/full_fig_p018_2.png] view at source ↗

**Figure 3.** Figure 3: The top panel displays the augmented data, observed assignment, and assignment [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: Upper-tail p-values for a one-sided test of no racial discrimination in force. The horizontal axis shows bias in encounters, governed by Γ; the vertical axis shows the common lower bound ρ on discrimination in stops (ρ g = ρ for every g ∈ G∗ ). The white contour marks the 5% critical boundary. The interaction structure described in Section 5.4 is borne out in the SQF application. When ρ = 0, the test trans… view at source ↗

**Figure 5.** Figure 5: 95% confidence sets (shaded ribbon) and median nonrejected null hypotheses [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: 95% confidence sets (shaded ribbon) and median nonrejected null hypotheses [PITH_FULL_IMAGE:figures/full_fig_p029_6.png] view at source ↗

read the original abstract

Inferring racial discrimination in police use of force -- the average causal effect of civilian race on use of force -- requires two assumptions about policing prior to potential use of force: that officers do not discriminate in whom they would stop (no discrimination in stops) and that, conditional on patrol context, the probability that an encounter is with a minority rather than a white civilian does not vary across encounters (no bias in encounters). As Knox et al. (2020) show, violations of the first can mask racial disparity in force. Whether it reflects discrimination in force also depends on the second. Existing sensitivity analyses address one assumption at a time. We develop a framework that varies both sequentially and apply it to NYPD Stop, Question, and Frisk data (2003--2013). Under plausible levels of discrimination in stops, we find substantial racial disparity in force. However, the conclusion that this disparity reflects discrimination is fragile to modest departures from no bias in encounters that census-based calibration suggests are demographically feasible. By jointly addressing both confounding channels, the framework reveals how they interact in ways that separate analyses cannot, contributing to understanding what generates racial disparities and how they might be addressed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Sequential sensitivity for two assumptions in disparity studies, but the census calibration for encounter bias is the part that needs checking.

read the letter

The main thing here is a sequential sensitivity framework that varies discrimination in stops and bias in encounters together instead of one at a time. On the NYPD Stop, Question, and Frisk data it reports substantial racial disparity in force once you allow for plausible stop discrimination, yet that conclusion is fragile once you introduce modest departures from no bias in encounters, with the range set by a census-based calibration. That joint treatment is the actual addition over Knox et al. (2020). The paper shows how the two channels interact in ways that separate one-at-a-time checks miss, which is useful for anyone trying to bound disparity estimates under multiple confounding routes. The framework itself looks like a straightforward extension that could be applied elsewhere. The soft spot is the calibration step that turns census counts into bounds on feasible encounter bias. The stress-test concern is on point: if the unconditional census margins do not line up with the conditional patrol context (time, location, officer assignment), then the size of the “modest” departures that get labeled demographically feasible could be off, which would change both the substantial-disparity result and the fragility claim. The abstract does not spell out how they map census data to the conditional probabilities or run robustness checks on that mapping, so the central empirical takeaway rests on whether that step is reliable. Without the full derivations it is also hard to judge whether any post-hoc choices in the sequential procedure affect the reported fragility. This is aimed at causal-inference people who work on policing or disparity studies and want a practical way to handle two assumptions at once. A reader already familiar with single-assumption sensitivity methods will see the incremental value quickly. It shows clear engagement with the literature and a concrete application, so it deserves a serious referee even if the calibration needs more defense or alternative bounds in revision.

Referee Report

2 major / 2 minor

Summary. The paper develops a sequential sensitivity analysis framework for jointly varying two assumptions required to infer racial discrimination in police use of force (no discrimination in stops; no bias in encounters conditional on patrol context). Applied to NYPD Stop, Question, and Frisk data (2003–2013), it reports substantial racial disparity in force under plausible stop discrimination but concludes that interpreting this disparity as evidence of discrimination in force is fragile to modest departures from no bias in encounters, where the feasible range is set by a census-based calibration.

Significance. If the calibration and sequential procedure hold, the framework is significant for enabling joint sensitivity analysis of two confounding channels whose interaction is not captured by separate analyses (e.g., Knox et al. 2020). This advances causal inference methods for observational policing data and clarifies how stop and encounter biases can jointly shape disparity estimates.

major comments (2)

[Abstract and application to NYPD data] Abstract (application paragraph) and calibration description: The central fragility claim—that the disparity conclusion is fragile to 'modest departures from no bias in encounters that census-based calibration suggests are demographically feasible'—treats the calibration step as defining the relevant sensitivity range. The manuscript must supply the explicit mapping from unconditional census margins to bounds on the encounter-probability parameter, including any conditioning on patrol context (time-of-day, location, officer assignment). Absent this, the declared size of 'modest' departures cannot be evaluated and the fragility result is not load-bearing.
[Framework description] Sequential procedure (framework section): The claim that joint variation reveals interactions 'in ways that separate analyses cannot' is load-bearing for the methodological contribution. The paper should include the explicit equations defining the two sensitivity parameters, the sequential ordering, and a demonstration (analytic or small simulation) that the joint bounds differ from the product of marginal bounds.

minor comments (2)

[Notation and equations] Clarify notation for the two sensitivity parameters throughout to prevent reader confusion between stop-discrimination and encounter-bias parameters.
[Data and application] Ensure data exclusion rules and any post-hoc choices in the NYPD analysis are fully documented so that the empirical results can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and will revise the manuscript to incorporate the requested clarifications.

read point-by-point responses

Referee: [Abstract and application to NYPD data] Abstract (application paragraph) and calibration description: The central fragility claim—that the disparity conclusion is fragile to 'modest departures from no bias in encounters that census-based calibration suggests are demographically feasible'—treats the calibration step as defining the relevant sensitivity range. The manuscript must supply the explicit mapping from unconditional census margins to bounds on the encounter-probability parameter, including any conditioning on patrol context (time-of-day, location, officer assignment). Absent this, the declared size of 'modest' departures cannot be evaluated and the fragility result is not load-bearing.

Authors: We agree that an explicit mapping from census margins to the encounter-probability bounds is required to evaluate the 'modest' claim. In the revised manuscript we will add a dedicated subsection detailing the calibration formulas, including how unconditional census proportions are mapped to conditional bounds on the bias parameter and how patrol-context variables (time-of-day, location) are incorporated via the available NYPD covariates. revision: yes
Referee: [Framework description] Sequential procedure (framework section): The claim that joint variation reveals interactions 'in ways that separate analyses cannot' is load-bearing for the methodological contribution. The paper should include the explicit equations defining the two sensitivity parameters, the sequential ordering, and a demonstration (analytic or small simulation) that the joint bounds differ from the product of marginal bounds.

Authors: We accept that the interaction claim requires explicit support. We will revise the framework section to state the two sensitivity parameters mathematically, specify the sequential ordering, and insert either an analytic derivation or a small simulation showing that the joint bounds are not the product of the marginal bounds. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain.

full rationale

The paper introduces a sequential sensitivity analysis framework that jointly varies two pre-force assumptions (no discrimination in stops; no bias in encounters) and applies it to NYPD data. The census-based calibration is presented as an external input to bound plausible sensitivity ranges rather than being fitted to or derived from the target disparity estimates. No equations reduce the final conclusions to the inputs by construction, no load-bearing self-citations appear, and the central claims follow from applying the new framework under stated assumptions. The analysis is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are detailed beyond the two core domain assumptions named in the abstract.

axioms (1)

domain assumption Officers do not discriminate in whom they would stop and, conditional on patrol context, the probability an encounter is with a minority civilian does not vary across encounters.
Stated as the two assumptions required to infer the average causal effect of civilian race on use of force.

pith-pipeline@v0.9.0 · 5745 in / 1126 out tokens · 23116 ms · 2026-05-25T05:52:18.059719+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Bickel, P. J. & van Zwet, W. R. (1978), ‘Asymptotic expansions for the power of distribu- tionfree tests in the two-sample problem’,Annals of Statistics6(5), 937–1004. Bookstein, F. L. (1989), ‘Principal warps: Thin-plate splines and the decomposition of de- formations’,IEEE Transactions on Pattern Analysis and Machine Intelligence11(6), 567–

work page 1978
[2]

Stop-and-Frisk

Cohen, P. L., Olson, M. A. & Fogarty, C. B. (2020), ‘Multivariate one-sided testing in matched observational studies as an adversarial game’,Biometrika107(4), 809–825. Fogarty, C. B. (2018), ‘On mitigating the analytical limitations of finely stratified ex- periments’,Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(5), 1035...

work page 2020
[3]

leaves implicit: For each fixed racez, yi(z,v )is constant across all profilesv within a given principal stratum. The manuscript’s contrast between races at a fixed principal stratum therefore takes the same value for every pair of profiles drawn from that principal stratum, and no marginalization over nonracial profiles is required. This contrast is a co...

work page 2025
[4]

individual sharing the same nonracial profile v. • Themarginal component effect, ∑ v [yi(1,v)−yi(0,v)]q (v), integrates the component effect over a researcher-specified probability mass functionq onv, yielding a scalar contrast for each decision-makeri. • Theaverage marginal component effect,1 N ∑N i=1 ∑ v [yi(1,v)−yi(0,v)]q (v), aggregates the marginal c...

work page 2000
[5]

furtive movements

Under Assumption 4,ρg = ng,OMS/ng, so ρg = 0implies that every potentially stoppable encounter in stratumg is Always-Stop andng,AS =ng. In this case,ng,AS−Cg(Zg) =ng,1 deterministically for any zg∈Ωg. The ratio(ng,AS−c)/ng,1 therefore equals1 = 1−ρg for everyc, so each summand in(S.4.16)is zero. HenceE[ˆτg | Eg] =τg, andˆτg is unbiased forτg in stratumg∈G...

work page 2020
[6]

By construction, the image of the map — i.e.,{w/(ℓ+w) : w∈Z≥0}— is identical toFρg : Every element ofFρg isw/(ℓ+w)for somew∈Z≥0, and everyw/(ℓ+w)belongs toFρg

Image equalsFρg .The feasible domain is Fρg :={w/(ℓ+w) : w∈Z≥0}⊂[0, 1). By construction, the image of the map — i.e.,{w/(ℓ+w) : w∈Z≥0}— is identical toFρg : Every element ofFρg isw/(ℓ+w)for somew∈Z≥0, and everyw/(ℓ+w)belongs toFρg . Since the map is injective and its image equalsFρg , it is a bijection fromZ≥0toF ρg . S.5.2 Worked Example: Interaction ofρ...

work page 2023
[7]

Substituting into the lower bound from Lemma S.6.3 yields p ( z ρg g ; Γ ) = 1 ∑ ag∈Ω ρg g Γ a⊤g ( 1−z ρg g ) = 1 Γ(˜n ρg g −1) + 1 ,(S.6.10) which matches the left-hand side of the bound in (5) of Fogarty (2023, p. 2201). A parallel simplification occurs for the upper bound. With one minority-civilian encounter,a⊤ gz ρg g = 0 for allag̸=z ρg g and equals...

work page 2023
[8]

Recombining via (S.6.20) gives E[ˆτtilt g (ρg; Γ,τ0,−1)]≥τg−τ0, and aggregation yields(S.6.15) under the null τ≥τ0

The analogous argument —p/p≥1on nonpositive quantities leaves them unchanged or pushes them closer to zero;p/p≤1on positive quantities makes them less positive — shows each summand is boundedbelowby( ˆτ ρg g −τ0). Recombining via (S.6.20) gives E[ˆτtilt g (ρg; Γ,τ0,−1)]≥τg−τ0, and aggregation yields(S.6.15) under the null τ≥τ0. S.6.3 Corollary for Stratum...

work page 2018
[9]

inside” an impact zone only if it falls within a zone that had been activated on or before the date of the encounter. Second, we classify encounters without coordinates as “outside

captures thinner features that trace street boundaries at intensities between 200 and 220, while excluding pixels near the coastline (intensity above 245), pixels near NYC borough boundary lines (identified by projecting the borough shapefile into pixel space via the inverse TPS and dilating the resulting mask), and a small noise region in the northeast c...

work page 2006
[10]

Dark regions in the right panel correspond to the georeferenced zone boundaries used in the analysis

Right: extracted polygons rendered on the NYC borough outline using the thin-plate spline transform. Dark regions in the right panel correspond to the georeferenced zone boundaries used in the analysis. 93 Figure S.3: Diagnostic overlay for the thin-plate spline transform. Red lines show the NYC borough boundary shapefile projected back onto the raster im...

work page 2010
[11]

remotes") remotes::install_github(

SQF encounters in 2003 therefore receive no zone classification from this pipeline and enter the post-stratification through a missingness indicator. Panel Zone Active from 00 Impact Zone III January 2004 01 Impact Zone IV January 2005 02 Impact Zone V July 2005 03 Impact Zone VI January 2006 04 Impact Zone VII June 2006 05 Impact Zone VIII January 2007 0...

work page 2003
[12]

drop" ) TheΓat which the confidence interval first includes zero—the “changepoint

With no discrimination in stops (ρ= 0), the finding first becomes insignificant atΓ = 1.06: # Smallest Gamma where p > 0.05 at rho = 0 pval_grid |> filter(rho_lb == 0, p_upper > 0.05) |> summarise(Gamma_star =min(Gamma)) # Gamma_star = 1.06 At the other extreme, the smallestΓat which the test is insignificant forallvalues ofρis Γ = 1.33: # Smallest Gamma ...

work page 2020

[1] [1]

Bickel, P. J. & van Zwet, W. R. (1978), ‘Asymptotic expansions for the power of distribu- tionfree tests in the two-sample problem’,Annals of Statistics6(5), 937–1004. Bookstein, F. L. (1989), ‘Principal warps: Thin-plate splines and the decomposition of de- formations’,IEEE Transactions on Pattern Analysis and Machine Intelligence11(6), 567–

work page 1978

[2] [2]

Stop-and-Frisk

Cohen, P. L., Olson, M. A. & Fogarty, C. B. (2020), ‘Multivariate one-sided testing in matched observational studies as an adversarial game’,Biometrika107(4), 809–825. Fogarty, C. B. (2018), ‘On mitigating the analytical limitations of finely stratified ex- periments’,Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80(5), 1035...

work page 2020

[3] [3]

leaves implicit: For each fixed racez, yi(z,v )is constant across all profilesv within a given principal stratum. The manuscript’s contrast between races at a fixed principal stratum therefore takes the same value for every pair of profiles drawn from that principal stratum, and no marginalization over nonracial profiles is required. This contrast is a co...

work page 2025

[4] [4]

individual sharing the same nonracial profile v. • Themarginal component effect, ∑ v [yi(1,v)−yi(0,v)]q (v), integrates the component effect over a researcher-specified probability mass functionq onv, yielding a scalar contrast for each decision-makeri. • Theaverage marginal component effect,1 N ∑N i=1 ∑ v [yi(1,v)−yi(0,v)]q (v), aggregates the marginal c...

work page 2000

[5] [5]

furtive movements

Under Assumption 4,ρg = ng,OMS/ng, so ρg = 0implies that every potentially stoppable encounter in stratumg is Always-Stop andng,AS =ng. In this case,ng,AS−Cg(Zg) =ng,1 deterministically for any zg∈Ωg. The ratio(ng,AS−c)/ng,1 therefore equals1 = 1−ρg for everyc, so each summand in(S.4.16)is zero. HenceE[ˆτg | Eg] =τg, andˆτg is unbiased forτg in stratumg∈G...

work page 2020

[6] [6]

By construction, the image of the map — i.e.,{w/(ℓ+w) : w∈Z≥0}— is identical toFρg : Every element ofFρg isw/(ℓ+w)for somew∈Z≥0, and everyw/(ℓ+w)belongs toFρg

Image equalsFρg .The feasible domain is Fρg :={w/(ℓ+w) : w∈Z≥0}⊂[0, 1). By construction, the image of the map — i.e.,{w/(ℓ+w) : w∈Z≥0}— is identical toFρg : Every element ofFρg isw/(ℓ+w)for somew∈Z≥0, and everyw/(ℓ+w)belongs toFρg . Since the map is injective and its image equalsFρg , it is a bijection fromZ≥0toF ρg . S.5.2 Worked Example: Interaction ofρ...

work page 2023

[7] [7]

Substituting into the lower bound from Lemma S.6.3 yields p ( z ρg g ; Γ ) = 1 ∑ ag∈Ω ρg g Γ a⊤g ( 1−z ρg g ) = 1 Γ(˜n ρg g −1) + 1 ,(S.6.10) which matches the left-hand side of the bound in (5) of Fogarty (2023, p. 2201). A parallel simplification occurs for the upper bound. With one minority-civilian encounter,a⊤ gz ρg g = 0 for allag̸=z ρg g and equals...

work page 2023

[8] [8]

Recombining via (S.6.20) gives E[ˆτtilt g (ρg; Γ,τ0,−1)]≥τg−τ0, and aggregation yields(S.6.15) under the null τ≥τ0

The analogous argument —p/p≥1on nonpositive quantities leaves them unchanged or pushes them closer to zero;p/p≤1on positive quantities makes them less positive — shows each summand is boundedbelowby( ˆτ ρg g −τ0). Recombining via (S.6.20) gives E[ˆτtilt g (ρg; Γ,τ0,−1)]≥τg−τ0, and aggregation yields(S.6.15) under the null τ≥τ0. S.6.3 Corollary for Stratum...

work page 2018

[9] [9]

inside” an impact zone only if it falls within a zone that had been activated on or before the date of the encounter. Second, we classify encounters without coordinates as “outside

captures thinner features that trace street boundaries at intensities between 200 and 220, while excluding pixels near the coastline (intensity above 245), pixels near NYC borough boundary lines (identified by projecting the borough shapefile into pixel space via the inverse TPS and dilating the resulting mask), and a small noise region in the northeast c...

work page 2006

[10] [10]

Dark regions in the right panel correspond to the georeferenced zone boundaries used in the analysis

Right: extracted polygons rendered on the NYC borough outline using the thin-plate spline transform. Dark regions in the right panel correspond to the georeferenced zone boundaries used in the analysis. 93 Figure S.3: Diagnostic overlay for the thin-plate spline transform. Red lines show the NYC borough boundary shapefile projected back onto the raster im...

work page 2010

[11] [11]

remotes") remotes::install_github(

SQF encounters in 2003 therefore receive no zone classification from this pipeline and enter the post-stratification through a missingness indicator. Panel Zone Active from 00 Impact Zone III January 2004 01 Impact Zone IV January 2005 02 Impact Zone V July 2005 03 Impact Zone VI January 2006 04 Impact Zone VII June 2006 05 Impact Zone VIII January 2007 0...

work page 2003

[12] [12]

drop" ) TheΓat which the confidence interval first includes zero—the “changepoint

With no discrimination in stops (ρ= 0), the finding first becomes insignificant atΓ = 1.06: # Smallest Gamma where p > 0.05 at rho = 0 pval_grid |> filter(rho_lb == 0, p_upper > 0.05) |> summarise(Gamma_star =min(Gamma)) # Gamma_star = 1.06 At the other extreme, the smallestΓat which the test is insignificant forallvalues ofρis Γ = 1.33: # Smallest Gamma ...

work page 2020