Triage Score: A Counterfactual Risk Assessment Instrument
Pith reviewed 2026-06-26 13:06 UTC · model grok-4.3
The pith
Triage scores based on additive counterfactual utilities include risk scores as a special case and account for outcomes under intervention.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Triage scores are based on additive counterfactual utilities and include risk scores as a special case. Unlike risk scores, triage scores can incorporate counterfactual outcomes under alternative decisions, enabling decision makers to incorporate a wide range of ethical and practical factors. We illustrate the use of triage scores with an application to our own randomized controlled trial evaluating a pretrial risk score. Our analysis demonstrates that triage scores are able to capture rich utility structures and yield substantively distinct results regarding policy evaluation and learning.
What carries the argument
Additive counterfactual utilities, which assign values to outcomes that would occur under each possible decision and sum those values into a single score.
If this is right
- Risk scores emerge when the utility for the intervention outcome is set to zero or ignored.
- Decision makers can encode a range of ethical and practical considerations by changing the utilities attached to each counterfactual outcome.
- Policy evaluation and learning produce substantively different results once counterfactual utilities replace simple risk prediction.
- The framework applies directly to any setting where randomized data on interventions are available, such as pretrial detention or medical treatment choices.
Where Pith is reading between the lines
- Existing risk-score systems could be converted to triage scores if utilities for the intervention arm can be specified or elicited.
- The additive structure may extend to multi-treatment settings where more than two decisions are possible.
- If utilities prove difficult to agree upon, sensitivity checks over plausible utility ranges would become a necessary practical step.
Load-bearing premise
Counterfactual outcomes under intervention can be defined, assigned utilities, and estimated from the RCT data in a way that supports the claimed policy distinctions.
What would settle it
Re-estimating the triage scores on the pretrial RCT data and finding that the resulting policy recommendations or learned policies are identical to those produced by the original risk score.
Figures
read the original abstract
Risk assessment instruments, also known as "risk scores," are widely used in high-stakes decision-making settings such as medicine and the criminal justice system. A risk score predicts the likelihood of an undesired outcome if no intervention is made. Thus, a sufficiently high score is often interpreted as a recommendation to intervene. However, risk scores fail to account for what would happen if a decision-maker does intervene. This failure is problematic because effective decision making requires consideration of both or multiple potential outcomes. We propose "triage scores," which are based on additive counterfactual utilities and include risk scores as a special case. Unlike risk scores, triage scores can incorporate counterfactual outcomes under alternative decisions, enabling decision makers to incorporate a wide range of ethical and practical factors. We illustrate the use of triage scores with an application to our own randomized controlled trial evaluating a pretrial risk score. Our analysis demonstrates that triage scores are able to capture rich utility structures and yield substantively distinct results regarding policy evaluation and learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes 'triage scores' defined via additive counterfactual utilities as a generalization of conventional risk scores (which predict outcomes under no intervention). Triage scores are claimed to incorporate outcomes under alternative decisions, thereby allowing incorporation of ethical and practical factors. The proposal is illustrated via an application to data from the authors' own RCT that randomized provision of an existing pretrial risk score to judges; the analysis is said to show that triage scores capture rich utility structures and produce substantively distinct results for policy evaluation and learning.
Significance. If the counterfactual utilities can be identified and estimated from the RCT without strong untestable assumptions, the framework would provide a principled way to move beyond purely predictive risk instruments toward decision-theoretic scores that explicitly trade off multiple potential outcomes. The RCT-based illustration is a positive feature that grounds the conceptual proposal in real data.
major comments (1)
- [Abstract and RCT application section] The central empirical claim (abstract) that triage scores 'yield substantively distinct results regarding policy evaluation and learning' from the pretrial RCT rests on recovering additive counterfactual utilities for decisions under alternative policies. The RCT randomizes only the provision of the existing risk score; recovering utilities for counterfactual judge decisions (e.g., under a different threshold or utility function) therefore requires an explicit model of judge behavior, information sets, or compliance under unobserved policies. No such model or identifying assumptions are stated or tested, rendering the counterfactual quantities unidentified from the observed data alone.
minor comments (1)
- [Introduction/Definition] The precise functional form of the 'additive counterfactual utilities' and how they reduce to a standard risk score as a special case should be stated formally (e.g., via an equation) rather than only conceptually.
Simulated Author's Rebuttal
We thank the referee for their careful reading and for identifying a key issue in the empirical section. We address the major comment below.
read point-by-point responses
-
Referee: [Abstract and RCT application section] The central empirical claim (abstract) that triage scores 'yield substantively distinct results regarding policy evaluation and learning' from the pretrial RCT rests on recovering additive counterfactual utilities for decisions under alternative policies. The RCT randomizes only the provision of the existing risk score; recovering utilities for counterfactual judge decisions (e.g., under a different threshold or utility function) therefore requires an explicit model of judge behavior, information sets, or compliance under unobserved policies. No such model or identifying assumptions are stated or tested, rendering the counterfactual quantities unidentified from the observed data alone.
Authors: We agree that the referee's observation is correct: the RCT randomizes only the provision of the existing risk score, and extending the analysis to counterfactual policies (different thresholds or utility functions) requires an explicit model of judge behavior together with identifying assumptions that are not currently stated. The manuscript estimates additive counterfactual utilities from the observed RCT data under the randomized provision but does not supply the additional behavioral model needed for fully counterfactual policy evaluation. We will revise the paper to (i) state the required identifying assumptions explicitly, (ii) describe a simple model of judge compliance with the provided score, and (iii) include sensitivity checks. These changes will be made in the RCT application section and the abstract will be updated to reflect the clarified scope. revision: yes
Circularity Check
No circularity: triage score introduced by definition as extension of risk scores
full rationale
The paper defines triage scores directly as additive counterfactual utilities, with risk scores stated as a special case. This is an explicit definitional proposal rather than a derivation, prediction, or fitted quantity that reduces to its own inputs. No self-citation load-bearing step, uniqueness theorem, ansatz smuggling, or renaming of known results is present in the abstract or described claims. The RCT application concerns estimation under additional modeling assumptions, but the core concept does not reduce by construction to fitted parameters or prior self-citations. The derivation chain is self-contained as a conceptual extension.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Counterfactual outcomes under intervention can be defined and combined additively via a utility function.
Reference graph
Works this paper leans on
-
[1]
Imai, Kosuke and Nakamura, Kentaro , journal=
-
[2]
Transactions of the Association for Computational Linguistics , volume=
Causal inference in natural language processing: Estimation, prediction, interpretation and beyond , author=. Transactions of the Association for Computational Linguistics , volume=. 2022 , publisher=
2022
-
[3]
Minnesota Law Review , volume=
Assessing risk assessment in action , author=. Minnesota Law Review , volume=
-
[4]
Circulation , volume=
General cardiovascular risk profile for use in primary care , author=. Circulation , volume=
-
[5]
Annual review of biomedical data science , volume=
Probabilistic machine learning for healthcare , author=. Annual review of biomedical data science , volume=. 2021 , publisher=
2021
-
[6]
The RAND Journal of Economics , volume=
The impact of credit scoring on consumer lending , author=. The RAND Journal of Economics , volume=. 2013 , publisher=
2013
-
[7]
The Review of Economic Studies , volume=
Measuring bias in consumer lending , author=. The Review of Economic Studies , volume=. 2021 , publisher=
2021
-
[8]
The quarterly journal of economics , volume=
Human decisions and machine predictions , author=. The quarterly journal of economics , volume=. 2018 , publisher=
2018
-
[9]
International Conference on Machine Learning , pages=
Characterizing fairness over the set of good models under selective labels , author=. International Conference on Machine Learning , pages=. 2021 , organization=
2021
-
[10]
Review of Economic Studies , pages=
Algorithmic recommendations and human discretion , author=. Review of Economic Studies , pages=. 2025 , publisher=
2025
-
[11]
Breakthroughs in Statistics: Foundations and Basic Theory , pages=
Statistical decision functions , author=. Breakthroughs in Statistics: Foundations and Basic Theory , pages=. 1950 , publisher=
1950
-
[12]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Multiply robust estimation of causal effects under principal ignorability , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2022 , publisher=
2022
-
[13]
arXiv preprint arXiv:2410.17864 , year=
Longitudinal Causal Inference with Selective Eligibility , author=. arXiv preprint arXiv:2410.17864 , year=
-
[14]
Journal of Causal Inference , volume=
Personalized decision making--a conceptual introduction , author=. Journal of Causal Inference , volume=. 2023 , publisher=
2023
-
[15]
arXiv preprint arXiv:2407.18206 , year=
Starting small: Prioritizing safety over efficacy in randomized experiments using the exact finite sample likelihood , author=. arXiv preprint arXiv:2407.18206 , year=
-
[16]
Bell , title =
David E. Bell , title =. Operations Research , year =
-
[17]
The Economic Journal , volume =
Graham Loomes and Robert Sugden , title =. The Economic Journal , volume =. 1982 , publisher =
1982
-
[18]
and Rubin, Donald B
Frangakis, Constantine E. and Rubin, Donald B. , Journal =
-
[19]
, Journal =
Rubin, Donald B. , Journal =. Comments on ``
-
[20]
arXiv preprint arXiv:2505.08908 , year=
Statistical Decision Theory with Counterfactual Loss , author=. arXiv preprint arXiv:2505.08908 , year=
-
[21]
Journal of Business & Economic Statistics , volume=
Statistical inference for heterogeneous treatment effects discovered by generic machine learning in randomized experiments , author=. Journal of Business & Economic Statistics , volume=. 2025 , publisher=
2025
-
[22]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Principal stratification analysis using principal scores , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2017 , publisher=
2017
-
[23]
Ben-Michael, Eli and Greiner, D James and Huang, Melody and Imai, Kosuke and Jiang, Zhichao and Shin, Sooahn , journal=. Does
-
[24]
arXiv preprint arXiv:2410.00903 , year=
Causal Representation Learning with Generative Artificial Intelligence: Application to Texts as Treatments , author=. arXiv preprint arXiv:2410.00903 , year=
-
[25]
arXiv preprint arXiv:2302.13971 , year=
Llama: Open and efficient foundation language models , author=. arXiv preprint arXiv:2302.13971 , year=
-
[26]
Journal of the American Statistical Association , year =
Ben-Michael, Eli and Imai, Kosuke and Jiang, Zhichao , title =. Journal of the American Statistical Association , year =
-
[27]
2009 , publisher=
Identification for prediction and decision , author=. 2009 , publisher=
2009
-
[28]
James and Halen, Ryan and Shin, Sooahn , year =
Imai, Kosuke and Jiang, Zhichao and Greiner, D. James and Halen, Ryan and Shin, Sooahn , year =. Replication Data for: Experimental Evaluation of Algorithm-Assisted Human Decision-Making: Application to Pretrial Public Safety Assessment , journal=
-
[29]
, author=
Learning under Selective Labels with Data from Heterogeneous Decision-makers: An Instrumental Variable Approach. , author=. CoRR , year=
-
[31]
arXiv preprint arXiv:2212.09844 , year=
Robust design and evaluation of predictive algorithms under unobserved confounding , author=. arXiv preprint arXiv:2212.09844 , year=
-
[32]
arXiv preprint arXiv:2305.11812 , year=
Off-policy evaluation beyond overlap: partial identification through smoothness , author=. arXiv preprint arXiv:2305.11812 , year=
-
[33]
arXiv preprint arXiv:2202.11886 , year=
Calibrated inference: statistical inference that accounts for both sampling uncertainty and distributional uncertainty , author=. arXiv preprint arXiv:2202.11886 , year=
-
[34]
The International Journal of Biostatistics , volume=
Increasing the efficiency of randomized trial estimates via linear adjustment for a prognostic score , author=. The International Journal of Biostatistics , volume=. 2021 , publisher=
2021
-
[35]
Distribution-free assessment of population overlap in observational studies , author=
-
[36]
Journal of Econometrics , volume=
Overlap in observational studies with high-dimensional covariates , author=. Journal of Econometrics , volume=. 2021 , publisher=
2021
-
[37]
Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=
The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables , author=. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , pages=
-
[38]
2023 , institution=
Algorithmic recommendations and human discretion , author=. 2023 , institution=
2023
-
[39]
American Economic Review , volume=
The effects of pre-trial detention on conviction, future crime, and employment: Evidence from randomly assigned judges , author=. American Economic Review , volume=. 2018 , publisher=
2018
-
[40]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Bounded, efficient and multiply robust estimation of average treatment effects using instrumental variables , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 2018 , publisher=
2018
-
[41]
Journal of the Royal Statistical Society Series A: Statistics in Society , volume=
Experimental evaluation of algorithm-assisted human decision-making: Application to pretrial public safety assessment , author=. Journal of the Royal Statistical Society Series A: Statistics in Society , volume=. 2023 , publisher=
2023
-
[42]
arXiv preprint arXiv:1808.00023 , year=
The measure and mismeasure of fairness: A critical review of fair machine learning , author=. arXiv preprint arXiv:1808.00023 , year=
-
[43]
Annual Review of Statistics and Its Application , volume=
Algorithmic fairness: Choices, assumptions, and definitions , author=. Annual Review of Statistics and Its Application , volume=. 2021 , publisher=
2021
-
[44]
Nips tutorial , volume=
Fairness in machine learning , author=. Nips tutorial , volume=
-
[45]
Communications of the ACM , volume=
A snapshot of the frontiers of fairness in machine learning , author=. Communications of the ACM , volume=. 2020 , publisher=
2020
-
[46]
Statistical Science , volume=
Principal fairness for human and algorithmic decision-making , author=. Statistical Science , volume=. 2023 , publisher=
2023
-
[47]
Essay on principles , author=
On the application of probability theory to agricultural experiments. Essay on principles , author=. Ann. Agricultural Sciences , pages=
-
[48]
, author=
The design of experiments. , author=. The design of experiments. , year=
-
[49]
, author=
Estimating causal effects of treatments in randomized and nonrandomized studies. , author=. Journal of Educational Psychology , volume=. 1974 , publisher=
1974
-
[50]
Journal of the American statistical Association , volume=
Statistics and causal inference , author=. Journal of the American statistical Association , volume=. 1986 , publisher=
1986
-
[51]
Proceedings of the ACM on Human-computer Interaction , volume=
``Hello AI'': uncovering the onboarding needs of medical practitioners for human-AI collaborative decision-making , author=. Proceedings of the ACM on Human-computer Interaction , volume=. 2019 , publisher=
2019
-
[52]
Human-AI collaboration in healthcare: A review and research agenda , author=
-
[53]
Proceedings of the 2018 Chi conference on human factors in computing systems , pages=
`It's Reducing a Human Being to a Percentage' Perceptions of Justice in Algorithmic Decisions , author=. Proceedings of the 2018 Chi conference on human factors in computing systems , pages=
2018
-
[54]
Available at SSRN 3489440 , year=
Algorithmic risk assessment in the hands of humans , author=. Available at SSRN 3489440 , year=
-
[55]
The Quarterly Journal of Economics , volume=
Discretion in hiring , author=. The Quarterly Journal of Economics , volume=. 2018 , publisher=
2018
-
[56]
The Annals of Applied Statistics , volume=
An algorithm for removing sensitive information , author=. The Annals of Applied Statistics , volume=. 2019 , publisher=
2019
-
[57]
Journal of empirical legal studies , volume=
Forecasting domestic violence: A machine learning approach to help inform arraignment decisions , author=. Journal of empirical legal studies , volume=. 2016 , publisher=
2016
-
[58]
American Economic Review , volume=
Personalized risk assessments in the criminal justice system , author=. American Economic Review , volume=. 2016 , publisher=
2016
-
[59]
Science advances , volume=
The accuracy, fairness, and limits of predicting recidivism , author=. Science advances , volume=. 2018 , publisher=
2018
-
[60]
Sociological Methods & Research , volume=
Fairness in criminal justice risk assessments: The state of the art , author=. Sociological Methods & Research , volume=. 2021 , publisher=
2021
-
[61]
Criminal Justice and Behavior , volume=
Practitioner compliance with risk/needs assessment tools: A theoretical and empirical assessment , author=. Criminal Justice and Behavior , volume=. 2013 , publisher=
2013
-
[62]
Law, Economics, and Business Fellows’ Discussion Paper Series , volume=
If you give a judge a risk score: evidence from Kentucky bail decisions , author=. Law, Economics, and Business Fellows’ Discussion Paper Series , volume=
-
[63]
Journal of Experimental Criminology , volume=
An impact assessment of machine learning risk forecasts on parole board decisions and recidivism , author=. Journal of Experimental Criminology , volume=. 2017 , publisher=
2017
-
[64]
Proceedings of the conference on fairness, accountability, and transparency , pages=
Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments , author=. Proceedings of the conference on fairness, accountability, and transparency , pages=
-
[65]
, author=
Impact of risk assessment on judges’ fairness in sentencing relatively poor defendants. , author=. Law and human behavior , volume=. 2020 , publisher=
2020
-
[66]
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency , pages=
Ground (less) Truth: A Causal Framework for Proxy Labels in Human-Algorithm Decision-Making , author=. Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency , pages=
2023
-
[67]
Proceedings of the ACM on Human-Computer Interaction , volume=
Heterogeneity in Algorithm-Assisted Decision-Making: A Case Study in Child Abuse Hotline Screening , author=. Proceedings of the ACM on Human-Computer Interaction , volume=. 2022 , publisher=
2022
-
[68]
American Economic Review , volume=
Measuring racial discrimination in bail decisions , author=. American Economic Review , volume=. 2022 , publisher=
2022
-
[69]
2023 , institution=
Combining human expertise with artificial intelligence: Experimental evidence from radiology , author=. 2023 , institution=
2023
-
[70]
The Quarterly Journal of Economics , pages=
Identifying prediction mistakes in observational data , author=. The Quarterly Journal of Economics , pages=. 2024 , publisher=
2024
-
[71]
Randomized Control Trial Evaluation of the Implementation of the PSA-DMF System in Dane County , author =
-
[72]
Statistics & Probability Letters , volume=
Sharp lower and upper bounds for the covariance of bounded random variables , author=. Statistics & Probability Letters , volume=. 2022 , publisher=
2022
-
[73]
Statistical Science , volume=
[Covariance Adjustment in Randomized Experiments and Observational Studies]: Comment , author=. Statistical Science , volume=. 2002 , publisher=
2002
-
[74]
2002 , publisher=
Overt bias in observational studies , author=. 2002 , publisher=
2002
-
[75]
Proceedings of the 2020 conference on fairness, accountability, and transparency , pages=
Counterfactual risk assessments, evaluation, and fairness , author=. Proceedings of the 2020 conference on fairness, accountability, and transparency , pages=
2020
-
[76]
Econometrica , volume=
Confidence intervals for partially identified parameters , author=. Econometrica , volume=. 2004 , publisher=
2004
-
[77]
Wainwright, Martin J. , year=. High-Dimensional Statistics: A Non-Asymptotic Viewpoint , DOI=
-
[78]
Tsybakov , title =
Jean-Yves Audibert and Alexandre B. Tsybakov , title =. The Annals of Statistics , number =. 2007 , doi =
2007
-
[79]
Journal of the American statistical Association , volume=
Estimation of regression coefficients when some regressors are not always observed , author=. Journal of the American statistical Association , volume=. 1994 , publisher=
1994
-
[80]
Advances in Neural Information Processing Systems , volume=
What's the harm? sharp bounds on the fraction negatively affected by treatment , author=. Advances in Neural Information Processing Systems , volume=
-
[81]
arXiv preprint arXiv:2402.09332 , year=
Nonparametric identification and efficient estimation of causal effects with instrumental variables , author=. arXiv preprint arXiv:2402.09332 , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.