pith. sign in

arxiv: 2309.07176 · v5 · submitted 2023-09-12 · 💻 cs.LG · stat.ML

Mind the Gap: Optimal and Equitable Encouragement Policies

Pith reviewed 2026-05-24 06:56 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords encouragement policiestreatment take-upfairness in recommendationsno-direct-effect modeloptimal policy designbudget constraintsresponsiveness to encouragementalgorithmic allocation
0
0 comments X

The pith

Encouragement policies should optimize and equalize induced treatment take-up rather than recommendation rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that when decision-makers can only recommend treatments instead of forcing them, the value of a policy splits into two separate pieces: how strongly people respond to the recommendation and how effective the treatment is once adopted. This split matters because fairness then requires balancing actual treatment uptake across groups, not just how often each group receives the suggestion. The same separation produces explicit rules for designing policies when total recommendations or access are limited by budgets. It also confines robustness to data overlap problems to the response-to-recommendation part alone. Illustrations with benefit-reminder and pretrial-monitoring data show how the distinction changes what an equitable policy looks like in practice.

Core claim

Under the covariate-conditional no-direct-effect model, the value of an encouragement policy is the product of responsiveness to the recommendation and the efficacy of the treatment conditional on take-up. This decomposition identifies induced treatment take-up as the correct fairness target and supplies closed-form characterizations of optimal policies subject to budget or access constraints. In deterministic recommendation settings the same model isolates overlap robustness to the recommendation-response function rather than the outcome model.

What carries the argument

The covariate-conditional no-direct-effect model of encouragement, which states that recommendations affect outcomes only by changing treatment adoption and carry no direct effect once covariates are held fixed.

If this is right

  • Induced treatment take-up, not recommendation frequency, becomes the fairness criterion that must be equalized across groups.
  • Optimal policies admit tractable characterizations when total encouragement volume or access is constrained by budgets.
  • Overlap robustness in deterministic recommendation regimes localizes entirely to the recommendation-response model.
  • Policy design must separately estimate responsiveness and treatment efficacy rather than treating them as a single combined quantity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Data collection efforts would need to track actual take-up rates in addition to recommendation logs to apply the fairness criterion.
  • The separation of responsiveness and efficacy could be tested in other recommendation settings such as health or education interventions where adherence is voluntary.
  • If the model holds, existing fairness audits focused only on recommendation parity would systematically miss disparities in realized treatment.
  • Budget-constrained characterizations might extend to sequential decision settings where recommendations can be adjusted over time.

Load-bearing premise

Encouragement influences outcomes solely through its effect on whether individuals actually take the treatment, with no remaining direct effect once covariates are controlled.

What would settle it

Observing that a recommendation still changes outcomes for individuals who do not change their treatment status, after conditioning on the same covariates, would falsify the central modeling assumption.

Figures

Figures reproduced from arXiv: 2309.07176 by Angela Zhou.

Figure 1
Figure 1. Figure 1: Comparison of average group outcomes under budget allocations for targeted treatment of [PITH_FULL_IMAGE:figures/full_fig_p019_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: SNAP recertification case study: Heterogeneous treatment effects [PITH_FULL_IMAGE:figures/full_fig_p020_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: SNAP recertification case study. Each figure indicates the performance metric (conditional [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Groupwise separate budgets: Which bud￾get allocations would achieve equal improvements? [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of treatment effect by gender, lift in treatment probabilities [PITH_FULL_IMAGE:figures/full_fig_p024_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Policy value V (π λ ), treatment value E[T(π λ ) | A = a], for A = race, gender. we plot the penalty λ that we use to assess the solutions of Proposition 9. The vertical dashed line indicates the solution achieving ϵ = 0, i.e., parity in treatment takeup. Near-optimal policies that reduce treatment disparity can be of interest given advocacy concerns about how the expansion of supervised release could incr… view at source ↗
Figure 7
Figure 7. Figure 7: Distribution of lift in treatment probabilities [PITH_FULL_IMAGE:figures/full_fig_p049_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Policy value V (π λ ), treatment value E[T(π λ ) | A = a], for A = race, gender. identification. One potential concern is the continued use of the healthcare utilization variable as an outcome measure. From a methodological angle, it displays heterogeneity in treatment effects. From the substantive angle, healthcare utilization remains a proxy outcome measure for other health measures, and interpreting inc… view at source ↗
read the original abstract

In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. We study personalized decision problems in which the planner controls recommendations into treatment rather than treatment itself. Under a covariate-conditional no-direct-effect model of encouragement, policy value depends on two distinct objects: responsiveness to encouragement and treatment efficacy. This modeling distinction makes induced treatment take-up, rather than recommendation rates alone, the natural fairness target and yields tractable policy characterizations under budget and access constraints. In settings with deterministic algorithmic recommendations, the same model localizes overlap-robustness to the recommendation-response model rather than the downstream outcome model. We illustrate the methods in case studies based on data from reminders of SNAP benefits recertification, and from pretrial supervised release with electronic monitoring. While the specific remedy to inequities in algorithmic allocation is context-specific, it requires studying both take-up of decisions and downstream outcomes of them.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript develops a framework for optimal and equitable encouragement policies when treatment cannot be compelled. Under a covariate-conditional no-direct-effect model of encouragement, policy value is shown to depend separately on responsiveness to encouragement and treatment efficacy; this distinction implies that fairness should target induced treatment take-up rather than recommendation rates alone. The paper derives tractable policy characterizations under budget and access constraints, localizes overlap-robustness to the recommendation-response model in deterministic settings, and illustrates the approach with case studies on SNAP benefit recertification reminders and pretrial supervised release with electronic monitoring.

Significance. Conditional on the maintained no-direct-effect assumption, the separation of responsiveness and efficacy parameters supplies a principled basis for fairness considerations in encouragement settings and yields explicit characterizations that could inform policy design. The localization of robustness and the emphasis on take-up as the fairness target are potentially useful for causal policy learning applications.

minor comments (2)
  1. [Abstract and main theoretical sections] The abstract states that the model 'yields tractable policy characterizations' and 'localizes overlap-robustness'; the main text should include explicit theorem or proposition numbers that state these characterizations so readers can verify the claimed tractability directly.
  2. [Case studies section] The case studies are described as illustrations rather than tests; the text should clarify whether any sensitivity analysis to the no-direct-effect assumption (e.g., via alternative outcome models) was conducted or is left for future work.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work and the recommendation of minor revision. No major comments were listed in the report, so there are no specific points requiring point-by-point response. We remain available to address any minor issues that may be identified.

Circularity Check

0 steps flagged

No significant circularity; derivation conditional on stated model

full rationale

The paper explicitly conditions all results on the covariate-conditional no-direct-effect model of encouragement. Under this modeling assumption the separation of policy value into responsiveness and efficacy parameters follows directly by definition, and subsequent policy characterizations under budget/access constraints are standard optimization steps. No equations or claims reduce by construction to fitted parameters, self-citations, or renamed known results. The provided abstract and reader summary contain no load-bearing self-referential steps; the work is self-contained given its explicit assumption.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central modeling distinction and policy characterizations rest on the no-direct-effect assumption for encouragement; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption covariate-conditional no-direct-effect model of encouragement
    Explicitly invoked in abstract as the modeling foundation that separates responsiveness from efficacy and localizes fairness to take-up.

pith-pipeline@v0.9.0 · 5689 in / 1178 out tokens · 18882 ms · 2026-05-24T06:56:34.255753+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

  1. [1]

    https://news.wttw.com/sites/default/files/article/file-attachments/PSA 2016

    Public safety assessment decision making framework - cook county, il [effective march 2016]. https://news.wttw.com/sites/default/files/article/file-attachments/PSA 2016

  2. [2]

    URL https://www.whitehouse.gov/wp-content/uploads/2022/12/BurdenReductionStrategies.pdf

    Dec 2022. URL https://www.whitehouse.gov/wp-content/uploads/2022/12/BurdenReductionStrategies.pdf

  3. [3]

    Agarwal, D

    A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. Schapire. Taming the monster: A fast and simple algorithm for contextual bandits. In International Conference on Machine Learning, pages 1638--1646. PMLR, 2014

  4. [4]

    Agarwal, A

    A. Agarwal, A. Beygelzimer, M. Dud \' k, J. Langford, and H. Wallach. A reductions approach to fair classification. In International Conference on Machine Learning, pages 60--69. PMLR, 2018

  5. [5]

    Akinnibi and S

    F. Akinnibi and S. Holder. America is the world leader in locking people up. one city found a fix. https://www.bloomberg.com/news/features/2023-08-30/nyc-s-cash-bail-reform-program-is-working-but-caseworkers-need-help, 2023. [Accessed 08-09-2023]

  6. [6]

    Arnold, W

    D. Arnold, W. Dobbie, and C. S. Yang. Racial bias in bail decisions. The Quarterly Journal of Economics, 133 0 (4): 0 1885--1932, 2018

  7. [7]

    Arnold, W

    D. Arnold, W. Dobbie, and P. Hull. Measuring racial discrimination in bail decisions. American Economic Review, 112 0 (9): 0 2992--3038, 2022

  8. [8]

    S. Athey. Beyond prediction: Using big data for policy problems. Science, 2017

  9. [9]

    Athey and S

    S. Athey and S. Wager. Policy learning with observational data. Econometrica, 89 0 (1): 0 133--161, 2021

  10. [10]

    M. Bao, A. Zhou, S. Zottola, B. Brubach, S. Desmarais, A. Horowitz, K. Lum, and S. Venkatasubramanian. It's compaslicated: The messy relationship between rai datasets and algorithmic fairness benchmarks. arXiv preprint arXiv:2106.05498, 2021

  11. [11]

    Barocas, M

    S. Barocas, M. Hardt, and A. Narayanan. Fairness and Machine Learning. fairmlbook.org, 2018. http://www.fairmlbook.org

  12. [12]

    P. L. Bartlett and S. Mendelson. Rademacher and gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 3 0 (Nov): 0 463--482, 2002

  13. [13]

    Bastani, O

    H. Bastani, O. Bastani, and W. P. Sinchaisri. Improving human decision-making with machine learning. arXiv preprint arXiv:2108.08454, 2021

  14. [14]

    Ben-Michael, D

    E. Ben-Michael, D. J. Greiner, K. Imai, and Z. Jiang. Safe policy learning through extrapolation: Application to pre-trial risk assessment. arXiv preprint arXiv:2109.11679, 2021

  15. [15]

    Beygelzimer and J

    A. Beygelzimer and J. Langford. The offset tree for learning with partial labels. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 129--138, 2009

  16. [16]

    Cesa-Bianchi, Y

    N. Cesa-Bianchi, Y. Mansour, and G. Stoltz. Improved second-order bounds for prediction with expert advice. Machine Learning, 66: 0 321--352, 2007

  17. [17]

    Chernozhukov, M

    V. Chernozhukov, M. Demirer, G. Lewis, and V. Syrgkanis. Semi-parametric efficient policy learning with continuous actions. Advances in Neural Information Processing Systems, 32, 2019

  18. [18]

    Chohlas-Wood, M

    A. Chohlas-Wood, M. Coots, H. Zhu, E. Brunskill, and S. Goel. Learning to be fair: A consequentialist approach to equitable decision-making. arXiv preprint arXiv:2109.08792, 2021

  19. [19]

    Christensen, L

    J. Christensen, L. Aar e, M. Baekgaard, P. Herd, and D. P. Moynihan. Human capital and administrative burden: The role of cognitive resources in citizen-state interactions. Public Administration Review, 80 0 (1): 0 127--136, 2020

  20. [20]

    Coston, A

    A. Coston, A. Mishler, E. H. Kennedy, and A. Chouldechova. Counterfactual risk assessments, evaluation, and fairness. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pages 582--593, 2020

  21. [21]

    De-Arteaga, R

    M. De-Arteaga, R. Fogliato, and A. Chouldechova. A case for humans-in-the-loop: Decisions in the presence of erroneous algorithmic scores. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, pages 1--12, 2020

  22. [22]

    J. L. Doleac and M. T. Stevenson. Algorithmic risk assessments in the hands of humans. Salem Center, 2020

  23. [23]

    Finkelstein, S

    A. Finkelstein, S. Taubman, B. Wright, M. Bernstein, J. Gruber, J. P. Newhouse, H. Allen, K. Baicker, and O. H. S. Group. The oregon health insurance experiment: evidence from the first year. The Quarterly journal of economics, 127 0 (3): 0 1057--1106, 2012

  24. [24]

    D. J. Foster and V. Syrgkanis. Orthogonal statistical learning. arXiv preprint arXiv:1901.09036, 2019

  25. [25]

    Freund and R

    Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of computer and system sciences, 55 0 (1): 0 119--139, 1997

  26. [26]

    Green and Y

    B. Green and Y. Chen. Disparate interactions: An algorithm-in-the-loop analysis of fairness in risk assessments. In Proceedings of the conference on fairness, accountability, and transparency, pages 90--99, 2019

  27. [27]

    Green and Y

    B. Green and Y. Chen. Algorithmic risk assessments can alter human decision-making processes in high-stakes government contexts. Proceedings of the ACM on Human-Computer Interaction, 5 0 (CSCW2): 0 1--33, 2021

  28. [28]

    T. Gross. L etter R egarding E lectronic M onitoring in I llinois — C ommunity R enewal S ociety --- communityrenewalsociety.org. https://www.communityrenewalsociety.org/blog/letter-regarding-electronic-monitoring-in-illinois. [Accessed 08-09-2023]

  29. [29]

    Heard, E

    K. Heard, E. O’Toole, R. Naimpally, and L. Bressler. Real world challenges to randomization and their solutions. Boston, MA: Abdul Latif Jameel Poverty Action Lab, 2017

  30. [30]

    Herd and D

    P. Herd and D. P. Moynihan. Administrative burden: Policymaking by other means. Russell Sage Foundation, 2019

  31. [31]

    M. A. Hern \'a n and J. M. Robins. Causal inference

  32. [32]

    K. Imai, Z. Jiang, J. Greiner, R. Halen, and S. Shin. Experimental evaluation of algorithm-assisted human decision-making: Application to pretrial public safety assessment. arXiv preprint arXiv:2012.02845, 2020

  33. [33]

    Jiang, S

    Z. Jiang, S. Yang, and P. Ding. Multiply robust estimation of causal effects under principal ignorability. arXiv preprint arXiv:2012.01615, 2020

  34. [34]

    Kallus and A

    N. Kallus and A. Zhou. Confounding-robust policy improvement. In Advances in Neural Information Processing Systems, pages 9269--9279, 2018

  35. [35]

    Kallus and A

    N. Kallus and A. Zhou. Assessing disparate impact of personalized interventions: identifiability and bounds. Advances in neural information processing systems, 32, 2019

  36. [36]

    Kallus and A

    N. Kallus and A. Zhou. Fairness, welfare, and equity in personalized pricing. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 296--314, 2021 a

  37. [37]

    Kallus and A

    N. Kallus and A. Zhou. Minimax-optimal policy learning under unobserved confounding. Management Science, 67 0 (5): 0 2870--2890, 2021 b

  38. [38]

    Kallus, X

    N. Kallus, X. Mao, and A. Zhou. Assessing algorithmic fairness with unobserved protected class using data combination. arXiv preprint arXiv:1906.00285, 2019 a

  39. [39]

    Kallus, X

    N. Kallus, X. Mao, and A. Zhou. Interval estimation of individual-level causal effects under unobserved confounding. In The 22nd International Conference on Artificial Intelligence and Statistics, pages 2281--2290, 2019 b

  40. [40]

    K. Kim, E. Kennedy, and J. Zubizarreta. Doubly robust counterfactual classification. Advances in Neural Information Processing Systems, 35: 0 34831--34845, 2022

  41. [41]

    Kitagawa and A

    T. Kitagawa and A. Tetenov. Empirical welfare maximization. 2015

  42. [42]

    Lin, S.-H

    W. Lin, S.-H. Kim, and J. Tong. Does algorithm aversion exist in the field? an empirical analysis of algorithm use determinants in diabetes self-management. An Empirical Analysis of Algorithm Use Determinants in Diabetes Self-Management (July 23, 2021). USC Marshall School of Business Research Paper Sponsored by iORB, No. Forthcoming, 2021

  43. [43]

    M. Lipsky. Street-level bureaucracy: Dilemmas of the individual in public service. Russell Sage Foundation, 2010

  44. [44]

    L. Liu, Z. Shahn, J. M. Robins, and A. Rotnitzky. Efficient estimation of optimal regimes under a no direct effect assumption. Journal of the American Statistical Association, 116 0 (533): 0 224--239, 2021

  45. [45]

    Ludwig and S

    J. Ludwig and S. Mullainathan. Fragile algorithms and fallible decision-makers: lessons from the justice system. Journal of Economic Perspectives, 35 0 (4): 0 71--96, 2021

  46. [46]

    K. Lum, E. Ma, and M. Baiocchi. The causal impact of bail on case outcomes for indigent defendants in new york city. Observational Studies, 3 0 (1): 0 38--64, 2017

  47. [47]

    C. Manski. Social Choice with Partial Knoweldge of Treatment Response. The Econometric Institute Lectures, 2005

  48. [48]

    A. Maurer. A vector-contraction inequality for rademacher complexities. In Algorithmic Learning Theory: 27th International Conference, ALT 2016, Bari, Italy, October 19-21, 2016, Proceedings 27, pages 3--17. Springer, 2016

  49. [49]

    Metevier, S

    B. Metevier, S. Giguere, S. Brockman, A. Kobren, Y. Brun, E. Brunskill, and P. S. Thomas. Offline contextual bandits with high probability fairness guarantees. Advances in neural information processing systems, 32, 2019

  50. [50]

    Mishler, E

    A. Mishler, E. H. Kennedy, and A. Chouldechova. Fairness in risk assessment instruments: Post-processing to achieve counterfactual equalized odds. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, pages 386--400, 2021

  51. [51]

    Bail reform in cook county: An examination of general order 18.8a and bail in felony cases

    Office of the Chief Judge . Bail reform in cook county: An examination of general order 18.8a and bail in felony cases. 2019 a

  52. [52]

    Bail reform

    Office of the Chief Judge . Bail reform. 2019 b . URL https://www.cookcountycourt.org/HOME/Bail-Reform

  53. [53]

    H. Qiu, M. Carone, E. Sadikova, M. Petukhova, R. C. Kessler, and A. Luedtke. Optimal individualized decision rules using instrumental variable methods. Journal of the American Statistical Association, 116 0 (533): 0 174--191, 2021

  54. [54]

    D. B. Rubin. Comments on ``randomization analysis of experimental data: The fisher randomization test comment''. Journal of the American Statistical Association, 75 0 (371): 0 591--593, 1980

  55. [55]

    Safety and C. f. C. I. Justice Challenge. Expanding supervised release in new york city. 2022. URL https://safetyandjusticechallenge.org/resources/expanding-supervised-release-in-new-york-city/

  56. [56]

    A. Shapiro. On duality theory of conic linear problems. Semi-Infinite Programming: Recent Advances, pages 135--165, 2001

  57. [57]

    Steinhardt and P

    J. Steinhardt and P. Liang. Adaptivity and optimism: An improved exponentiated gradient algorithm. In International conference on machine learning, pages 1593--1601. PMLR, 2014

  58. [58]

    H. Sun, E. Munro, G. Kalashnov, S. Du, and S. Wager. Treatment allocation under uncertain costs. arXiv preprint arXiv:2103.11066, 2021

  59. [59]

    Swaminathan and T

    A. Swaminathan and T. Joachims. Counterfactual risk minimization. Journal of Machine Learning Research, 2015

  60. [60]

    Commission on Civil Rights

    U.S. Commission on Civil Rights . A new paradigm for welfare reform: The need for civil rights enforcement. 2002

  61. [61]

    A. W. Van Der Vaart, J. A. Wellner, A. W. van der Vaart, and J. A. Wellner. Weak convergence. Springer, 1996

  62. [62]

    M. J. Wainwright. High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019

  63. [63]

    Woodworth, S

    B. Woodworth, S. Gunasekar, M. I. Ohannessian, and N. Srebro. Learning non-discriminatory predictors. In Conference on Learning Theory, pages 1920--1953. PMLR, 2017

  64. [64]

    if it didn’t happen, why would i change my decision?

    Y. Yacoby, B. Green, C. L. Griffin Jr, and F. Doshi-Velez. “if it didn’t happen, why would i change my decision?”: How judges respond to counterfactual explanations for the public safety assessment. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, volume 10, pages 219--230, 2022

  65. [65]

    Y. Zhao, D. Zeng, A. J. Rush, and M. R. Kosorok. Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107 0 (499): 0 1106--1118, 2012