pith. sign in

arxiv: 2605.17697 · v1 · pith:R2D6RV3Lnew · submitted 2026-05-17 · 💻 cs.CY

Scrutinizing Index-Based Risk Assessments: A Case Study in NYC Decision-making for Heat Emergency Management

Pith reviewed 2026-05-19 22:03 UTC · model grok-4.3

classification 💻 cs.CY
keywords risk indicessensitivity analysisemergency managementheat emergenciesNew York Citygovernment decision-makingvalidity and reliability
0
0 comments X

The pith

Different choices of variables or spatial scale in risk indices can produce substantively different scores for NYC heat emergency targeting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines hand-crafted indices used for geographic targeting in emergency management, focusing on a case study of extreme heat preparedness and response in New York City. It shows through sensitivity analyses that reasonable variations in selected input variables or in the spatial scale of aggregation lead to different risk scores across neighborhoods. These differences can alter which areas are prioritized for government actions. The findings are mapped to concerns about validity and reliability drawn from measurement literature, and the work contrasts the challenges of such indices with those of predictive algorithms tied more directly to measurable outcomes.

Core claim

In NYC heat emergency management, different reasonable choices of input variables or spatial scale can result in substantive differences to index risk scores, thereby affecting downstream government decision-making.

What carries the argument

Hand-crafted indices that statistically aggregate chosen variables into geographic risk scores for targeting emergency preparedness and response actions.

If this is right

  • Emergency managers using such indices may allocate resources or plan interventions for different neighborhoods depending on minor design choices.
  • Index-based targeting lacks the consistency required for reliable, reproducible government decisions without explicit sensitivity checks.
  • Concerns about validity arise because the index may not stably capture the underlying priority it aims to measure.
  • Predictive algorithms relating directly to concrete outcomes offer an alternative when measurable targets exist.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar design-sensitivity problems are likely in other public-sector index applications for resource allocation.
  • Standardized protocols for variable selection and scale could reduce unwanted variation in index outputs.
  • Testing whether predictive models yield more stable targeting decisions than indices would be a direct next step.

Load-bearing premise

That variations in index scores arising from reasonable design choices in variables or spatial scale constitute challenges to validity and reliability for emergency management decisions.

What would settle it

An analysis showing that all plausible sets of input variables and all plausible spatial scales produce identical rankings of the highest-risk neighborhoods for heat emergencies in NYC.

Figures

Figures reproduced from arXiv: 2605.17697 by Allison Koenecke, Angelina Wang, Jennah Gosciak, Luke Boyce.

Figure 1
Figure 1. Figure 1: Comparison of different index tools. We compare the NYC Heat Vulnerability Index (HVI), which combines and standardizes information on five sociodemographic and environmental characteristics related to extreme heat, to two other indices: the FEMA National Risk Index (NRI) and the CDC Heat and Health Index (HHI). The NYC HVI is at the neighborhood level (n=197), the NRI is at the census tract level (n=2,324… view at source ↗
Figure 2
Figure 2. Figure 2: Sensitivity of the NYC HVI. Each of the three graphs represents a different sensitivity analysis of the NYC HVI, corresponding to (a),(b) construct reliability, and (c) convergent validity. Substantial fluctuation is observed in each graph, as evidenced by the large number of green squares and gold dots, rather than points clustering in 5 gray boxes along the diagonal. Green, orange, and gray illustrate wh… view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of different average land surface measurements at the neighborhood level (n=197). The average land surface temperature in the NYC HVI from ECOSTRESS thermal imaging on August 27, 2020 is compared to average land surface temperature estimates from ECOSTRESS thermal imaging (taken on July 30, 2025) and to Landsat data (taken on July 9, 2020). While the measurements are all positively correlated, t… view at source ↗
Figure 4
Figure 4. Figure 4: The NYC Heat Vulnerability Index (HVI) 0.2 0.3 0.4 0.5 0.6 0.7 0.8 [PITH_FULL_IMAGE:figures/full_fig_p027_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: The CDC HHI with NYC-specific quintiles 0 20 40 60 80 100 0 20 40 60 80 100 1 2 3 4 5 Increased score Unchanged score Decreased score New percentile ranking Original HVI prioritizations (percentile ranking, neighborhood-level) [PITH_FULL_IMAGE:figures/full_fig_p027_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Average land surface temperature in Fahrenheit at [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: The NYC HVI if only environmental characteristics were used (based on the formula “Alt. 1: Environmental” in [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: The NYC HVI if including information on seniors and poverty status (based on the formula “Alt. 2: seniors and [PITH_FULL_IMAGE:figures/full_fig_p030_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The NYC HVI if including individuals with comorbidities (based on the formula “Alt. 3: Comorbidities” in Table 2). [PITH_FULL_IMAGE:figures/full_fig_p030_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: The NYC HVI if including all additional features (based on the formula “Alt. 4: All” in Table 2). [PITH_FULL_IMAGE:figures/full_fig_p031_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: The NYC HVI if constructed at the neighborhood level (on the left) compared to the census tract level (on the right). [PITH_FULL_IMAGE:figures/full_fig_p031_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Comparing the NYC HVI to two different specifications for the NRI: EAL and EAL · 𝑓 ( SV CR ). EAL = Expected Annual Loss, SV = Social Vulnerability, and CR = Community Resilience. Overall, we do not observe a strong relationship between the NYC HVI and the NRI using either specification. EAL EAL x f(SV / CR) 1 2 3 4 5 [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Spatial distribution of the NRI according to the EAL alone and the full NRI risk score. We observe only slight spatial variation between the two methods. across all tied inputs. Furthermore, we define quintiles based on the percentile rank values that we compute (e.g., the first quintile corresponds to all percentile rank values ≤ 20). This approach ensures that all inputs with the [PITH_FULL_IMAGE:figur… view at source ↗
read the original abstract

Cities are increasingly turning to large-scale data analysis and machine learning to make consequential decisions. While the algorithmic fairness community has focused on analyzing the risks and benefits associated with these complex methods, there has been much less scrutiny of the many simpler, but still widely used, data-driven tools that support government decision-making in a variety of settings. In this work, we study hand-crafted indices for geographic targeting and decision-making in emergency management -- a field responsible for coordinating preparedness and response efforts to hazards ranging from natural disasters to human threats. Indices, which capture abstract principles and overarching priorities (e.g., reducing social vulnerability), are low-complexity models that statistically aggregate chosen variables. They are generally flexible and interpretable, but can also be sensitive to key design choices and require strong assumptions. Through a case study of decision-making for extreme heat emergencies in NYC, we examine the challenges that practitioners may face in selecting an index for preparedness and response actions. We map empirical findings from index-based simulations to concerns related to validity and reliability from the measurement literature and show via sensitivity analyses that different reasonable choices of input variables or spatial scale can result in substantive differences to index risk scores, thereby affecting downstream government decision-making. We contrast these challenges with considerations for developing predictive algorithms that more narrowly relate to concrete, measurable outcomes. Ultimately, we provide generalizable recommendations that practitioners and public-sector technologists can use for navigating the trade-offs between indices and predictive algorithms in other government settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that hand-crafted indices for geographic targeting in emergency management are sensitive to design choices. Through a NYC case study on extreme heat preparedness, sensitivity analyses varying input variables and spatial scales produce substantive differences in risk scores. These findings are mapped to validity and reliability concerns from the measurement literature, with the argument that such differences can affect downstream government decisions. The work contrasts indices with predictive algorithms focused on concrete outcomes and offers generalizable recommendations for practitioners and public-sector technologists.

Significance. If the empirical sensitivity results hold and the mapping to decision impacts can be strengthened, the paper makes a useful contribution by scrutinizing simpler index-based tools that are widely deployed in government settings but receive less attention than complex ML systems in the algorithmic fairness literature. It provides a concrete case study that links measurement theory to practical challenges in emergency management, potentially informing better tool selection in other public-sector contexts.

major comments (1)
  1. [Abstract and results/discussion sections] The central claim that score differences 'thereby affecting downstream government decision-making' lacks supporting evidence or modeling. The sensitivity analyses establish variation in index scores, but the manuscript does not report the specific decision rules, thresholds, or integration points used by NYC agencies for actions such as cooling center placement or alerts, nor does it simulate how score shifts would propagate through those operational processes (see Abstract and the results/discussion sections describing the case study).
minor comments (1)
  1. [Abstract] The abstract could more explicitly separate the empirical sensitivity findings from the interpretive claims about validity/reliability and downstream effects to improve clarity for readers unfamiliar with measurement literature.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for their constructive feedback, which identifies a key area where the manuscript's claims can be clarified and strengthened. We address the major comment in detail below.

read point-by-point responses
  1. Referee: [Abstract and results/discussion sections] The central claim that score differences 'thereby affecting downstream government decision-making' lacks supporting evidence or modeling. The sensitivity analyses establish variation in index scores, but the manuscript does not report the specific decision rules, thresholds, or integration points used by NYC agencies for actions such as cooling center placement or alerts, nor does it simulate how score shifts would propagate through those operational processes (see Abstract and the results/discussion sections describing the case study).

    Authors: We appreciate this point and agree that the manuscript does not provide or simulate the precise internal decision rules, thresholds, or operational integration points used by NYC agencies. Our sensitivity analyses demonstrate substantive changes in risk scores and neighborhood rankings under reasonable variations in inputs and scale. We argue that such changes would affect prioritization because the indices are explicitly designed and deployed for geographic targeting in emergency management. However, we do not claim to model the full downstream pipeline. In revision, we will update the abstract and discussion sections to qualify the language more carefully, noting that index-based scores typically inform but do not solely determine actions, and that re-ranking of areas would likely shift which neighborhoods receive priority under standard use of these tools. We will add supporting references to emergency management literature on index application without introducing new empirical simulation, as that would require internal agency data beyond the scope of this work. revision: partial

standing simulated objections not resolved
  • The specific decision rules, thresholds, and integration points used by NYC agencies for heat emergency actions such as cooling center placement or alerts, which are internal operational details not available in public documentation.

Circularity Check

0 steps flagged

No circularity: empirical sensitivity analysis is self-contained

full rationale

The paper performs a case study consisting of index simulations and sensitivity tests on variable selection and spatial scale for NYC heat emergency indices. It reports observed differences in risk scores and maps those differences to external validity/reliability concepts drawn from the measurement literature. No equations, fitted parameters, or derivations are presented; the central claim rests on the simulation outputs themselves rather than reducing to any input by construction, self-definition, or load-bearing self-citation. The analysis is therefore independent of the patterns that would trigger a positive circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that indices are low-complexity models that aggregate variables to capture abstract priorities and that sensitivity to design choices raises validity issues, drawn from measurement literature.

axioms (2)
  • domain assumption Indices capture abstract principles such as reducing social vulnerability and are used for geographic targeting in emergency management
    Stated directly in the abstract as a general property of these tools.
  • domain assumption Sensitivity to input variables or spatial scale affects the reliability of index-based decisions
    Invoked when mapping empirical findings to validity and reliability concerns.

pith-pipeline@v0.9.0 · 5803 in / 1258 out tokens · 40994 ms · 2026-05-19T22:03:40.912153+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

152 extracted references · 152 canonical work pages

  1. [1]

    NYC Emergency Management Heat Emergency Plan: Local Law 85 of 2020

    2020. NYC Emergency Management Heat Emergency Plan: Local Law 85 of 2020. https://www.nyc.gov/assets/em/downloads/pdf/ local_law_reports/ll85_cooling_center_report_2023_b.pdf

  2. [2]

    How Extreme Heat is Impacting India

    2025. How Extreme Heat is Impacting India. https://www.ceew.in/publications/mapping-climate-risks-and-impacts-of-extreme- heatwave-disaster-in-indian-districts

  3. [3]

    Human Development Index (HDI)

    2025. Human Development Index (HDI)

  4. [4]

    Enoch J Abbey, Banda AA Khalifa, Modupe O Oduwole, Samuel K Ayeh, Richard D Nudotor, Emmanuella L Salia, Oluwatobi Lasisi, Seth Bennett, Hasiya E Yusuf, Allison L Agwu, et al. 2020. The Global Health Security Index is not predictive of coronavirus pandemic responses among Organization for Economic Cooperation and Development countries.PloS one15, 10 (2020...

  5. [5]

    Federal Emergency Management Agency. [n. d.]. The National Risk Index. https://hazards.fema.gov/nri/

  6. [6]

    2021.National Risk Index Census Tracts

    Federal Emergency Management Agency. 2021.National Risk Index Census Tracts. Retrieved December 1, 2025 from https://resilience. climate.gov/datasets/FEMA::national-risk-index-census-tracts/about

  7. [7]

    Gabriel Agostini, Emma Pierson, and Nikhil Garg. 2024. A Bayesian Spatial Model to Correct Under-Reporting in Urban Crowdsourcing. Proceedings of the AAAI Conference on Artificial Intelligence38, 20, 21888–21896. doi:10.1609/aaai.v38i20.30190

  8. [8]

    Tess Aitken, Ken Lee Chin, Danny Liew, and Richard Ofori-Asenso. 2020. Rethinking pandemic preparation: Global Health Security Index (GHSI) is predictive of COVID-19 burden, but in the opposite direction.The Journal of infection81, 2 (2020), 318

  9. [9]

    Bennett Allen, Robert C Schell, Victoria A Jent, Maxwell Krieger, Claire Pratty, Benjamin D Hallowell, William C Goedel, Melissa Basta, Jesse L Yedinak, Yu Li, et al. 2024. PROVIDENT: development and validation of a machine learning model to predict neighborhood-level overdose risk in Rhode Island.Epidemiology35, 2 (2024), 232–240

  10. [10]

    Bennett Allen, Adelya Urmanche, Brenda Curtis, and Celia Fisher. 2026. Ethical challenges and opportunities for integrating predictive analytics in community-based overdose prevention.The Lancet Regional Health-Americas55 (2026), 101345

  11. [11]

    Robin George Andrews. 2024. AI is helping seismologists find the next monster earthquake. https://www.nationalgeographic.com/ science/article/ai-predict-earthquakes-seismology

  12. [12]

    Stefania F Balica, Nigel George Wright, and Frank Van der Meulen. 2012. A flood vulnerability index for coastal cities and its use in assessing climate change impacts.Natural hazards64, 1 (2012), 73–105. Scrutinizing Index-Based Risk Assessments FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  13. [13]

    Roxana Bardan. 2025. Temperatures Rising: NASA Confirms 2024 Warmest Year on Record. https://www.nasa.gov/news-release/ temperatures-rising-nasa-confirms-2024-warmest-year-on-record/

  14. [14]

    Solon Barocas, Moritz Hardt, and Arvind Narayanan. 2023. When is automated decision making legitimate.Fairness and Machine Learning: Limitations and Opportunities. The MIT Press, Cambridge, MA, USA(2023)

  15. [15]

    Benjamin Beccari. 2016. A comparative analysis of disaster risk, vulnerability and resilience composite indicators.PLoS currents8 (2016). doi:10.1371/currents.dis.453df025e34b682e9737f95070f9b970

  16. [16]

    Andrew Bell, Ian Solano-Kamaiko, Oded Nov, and Julia Stoyanovich. 2022. It’s Just Not That Simple: An Empirical Study of the Accuracy-Explainability Trade-off in Machine Learning for Public Policy. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(Seoul, Republic of Korea)(FAccT ’22). Association for Computing Machiner...

  17. [17]

    Anderson, and Daniel E

    Elinor Benami, Reid Whitaker, Vincent La, Hongjin Lin, Brandon R. Anderson, and Daniel E. Ho. 2021. The Distributive Effects of Risk Prediction in Environmental Compliance: Algorithmic Design, Environmental Justice, and Public Policy. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(Virtual Event, Canada)(FAccT ’21). ...

  18. [18]

    Clément Bénesse, Fabrice Gamboa, Jean-Michel Loubes, and Thibaut Boissin. 2024. Fairness seen as global sensitivity analysis.Machine Learning113, 5 (2024), 3205–3232

  19. [19]

    Leah Blackwood and Susan L Cutter. 2023. The application of the Social Vulnerability Index (SoVI) for geo-targeting of post-disaster recovery resources.International Journal of Disaster Risk Reduction92 (2023), 103722

  20. [20]

    Census Bureau

    U.S. Census Bureau. [n. d.].Community Resilience Estimates. Retrieved December 1, 2025 from https://www.census.gov/programs- surveys/community-resilience-estimates.html

  21. [21]

    Census Bureau

    U.S. Census Bureau. 2020.American Community Survey 5-Year Estimates: Comparison Profiles 5-Year. Retrieved December 1, 2025 from http://api.census.gov/data/2023/acs/acs5

  22. [22]

    Census Bureau

    U.S. Census Bureau. 2020.Decennial Census: Redistricting Data (PL 94-171). Retrieved December 1, 2025 from https://api.census.gov/ data/2020/dec/pl.html

  23. [23]

    Miriam M Calkins, Tania Busch Isaksen, Benjamin A Stubbs, Michael G Yost, and Richard A Fenske. 2016. Impacts of extreme heat on emergency medical service calls in King County, Washington, 2007–2012: relative risk and time series analyses of basic and advanced life support.Environmental health15, 1 (2016), 13

  24. [24]

    2008.Handbook on constructing composite indicators: methodology and user guide

    Joint Research Centre. 2008.Handbook on constructing composite indicators: methodology and user guide. OECD publishing

  25. [25]

    City of Philadelphia, Department of Health. [n. d.]. Philadelphia Heat Vulnerability Index. https://hip.phila.gov/emergency-response/ philadelphia-heat-vulnerability-index/

  26. [26]

    NYC Comptroller. 2022. Overheated, Underserved: Expanding Cooling Center Access. https://comptroller.nyc.gov/reports/overheated- underserved/

  27. [27]

    Concern Worldwide, Welthungerhilfe, and the Institute for International Law of Peace and Armed Conflict. [n. d.]. Global Hunger Index. https://www.globalhungerindex.org/

  28. [28]

    Kathryn C Conlon, Evan Mallen, Carina J Gronlund, Veronica J Berrocal, Larissa Larsen, and Marie S O’Neill. 2020. Mapping human vulnerability to extreme heat: A critical assessment of heat vulnerability indices created using principal components analysis. Environmental health perspectives128, 9 (2020), 097001

  29. [29]

    OrgCode Consulting. 2015. Vulnerability Index-Service Prioritization Decision Assistance Tool (VI-SPDAT) Prescreen Triage Tool for Single Adults

  30. [30]

    Amanda Coston, Anna Kawakami, Haiyi Zhu, Ken Holstein, and Hoda Heidari. 2023. A Validity Perspective on Evaluating the Justified Use of Data-driven Decision-making Algorithms. In2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML). 690–704. doi:10.1109/SaTML54575.2023.00050

  31. [31]

    Lee J Cronbach and Paul E Meehl. 1955. Construct validity in psychological tests.Psychological bulletin52, 4 (1955), 281

  32. [32]

    Susan L Cutter. 2024. The origin and diffusion of the social vulnerability index (SoVI).International Journal of Disaster Risk Reduction 109 (2024), 104576

  33. [33]

    Cutter, Bryan J

    Susan L. Cutter, Bryan J. Boruff, and W. Lynn Shirley. 2003. Social Vulnerability to Environmental Hazards.Social Science Quarterly84, 2 (2003), 242–261. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/1540-6237.8402002 doi:10.1111/1540-6237.8402002

  34. [34]

    Cybersecurity and Infrastructure Security Agency. 2024. Artificial Intelligence and the Emergency Services Sector - Benefits and Challenges. https://www.apwa.org/wp-content/uploads/Artificial-Intelligence-and-the-Emergency-Services-Sector-Case-Studies- Benefits-and-Challenges.pdf

  35. [35]

    Nikki Davidson. 2024. NYC’s Data-Driven Future: 46 Algorithms and Counting. https://www.govtech.com/biz/data/nycs-data-driven- future-46-algorithms-and-counting

  36. [36]

    Xinlei Deng, Samantha Friedman, Ian Ryan, Wangjian Zhang, Guanghui Dong, Havidan Rodriguez, Fangqun Yu, Wenzhong Huang, Arshad Nair, Gan Luo, et al. 2022. The independent and synergistic impacts of power outages and floods on hospital admissions for multiple diseases.Science of the total environment828 (2022), 154305. FAccT ’26, June 25–28, 2026, Montreal...

  37. [37]

    New York City Health Department. [n. d.]. 2025 NYC Heat-Related Mortality Report. https://a816-dohbesp.nyc.gov/IndicatorPublic/data- features/heat-report/

  38. [38]

    Vivian Do, Heather Kathleen McBrien, Donald Edmondson, Marianthi-Anna Kioumourtzoglou, and Joan Allison Casey. 2025. The Impact of Power Outages on Cardiovascular Hospitalizations Among Medicare Fee-for-service Enrollees in New York State, 2017–2018. Epidemiology36, 4 (2025), 458–466

  39. [39]

    Melissa J Dobbie and David Dail. 2013. Robustness and sensitivity of weighting and aggregation in constructing composite indices. Ecological Indicators29 (2013), 270–277

  40. [40]

    Timothy J Dolney and Scott C Sheridan. 2006. The relationship between extreme heat and ambulance response calls for the city of Toronto, Ontario, Canada.Environmental research101, 1 (2006), 94–103

  41. [41]

    Christine Dominianni, Kathryn Lane, Sarah Johnson, Kazuhiko Ito, and Thomas Matte. 2018. Health impacts of citywide and localized power outages in New York City.Environmental Health Perspectives126, 6 (2018), 067003

  42. [42]

    David Freeman Engstrom, Daniel E Ho, Catherine M Sharkey, and Mariano-Florentino Cuéllar. 2020. Government by algorithm: Artificial intelligence in federal administrative agencies.NYU School of Law, Public Law Research Paper20-54 (2020)

  43. [43]

    Environmental Defense Fund, Texas A&M University, and Darkhorse Visualization. [n. d.]. U.S. Climate Vulnerability Index. https: //climatevulnerabilityindex.org/

  44. [44]

    Adriana Eugene, Naomi Alpert, Wil Lieberman-Cribbin, and Emanuela Taioli. 2022. Using NYC 311 call center data to assess short-and long-term needs following Hurricane Sandy.Disaster Medicine and Public Health Preparedness16, 4 (2022), 1447–1451

  45. [45]

    Jake Fawkes, Nic Fishman, Mel Andrews, and Zachary Lipton. 2024. The Fragility of Fairness: Causal Sensitivity Analysis for Fair Machine Learning. 37 (2024), 137105–137134. doi:10.52202/079017-4356

  46. [46]

    FEMA. 2025. National Risk Index: Technical Documentation. https://www.fema.gov/sites/default/files/documents/fema_national-risk- index_technical-documentation.pdf

  47. [47]

    Luciano Floridi. 2020. Artificial intelligence as a public service: Learning from Amsterdam and Helsinki.Philosophy & Technology33, 4 (2020), 541–546

  48. [48]

    Forest for All NYC. 2021. NYC Urban Forest Agenda:Toward a Healthy, Resilient, Equitable, and Just New York City. https: //forestforall.nyc/wp-content/uploads/2021/06/NYC-Urban-Forest-Agenda-.pdf

  49. [49]

    Center for Disease Control. [n. d.]. Heat & Health Tracker. https://ephtracking.cdc.gov/Applications/heatTracker/

  50. [50]

    2023.PLACES

    Centers for Disease Control and Prevention. 2023.PLACES. Retrieved December 1, 2025 from https://www.cdc.gov/places

  51. [51]

    2025.The Spatial Hazard Events and Losses Database for the United States, Version 23.0

    ASU Center for Emergency Management and Homeland Security. 2025.The Spatial Hazard Events and Losses Database for the United States, Version 23.0. Retrieved December 1, 2025 from https://sheldus.org

  52. [52]

    Data Science for Social Good. [n. d.]. Data Science Project Scoping Guide. https://datasciencepublicpolicy.org/our-work/tools- guides/data-science-project-scoping-guide/

  53. [53]

    Agency for Toxic Substances and Disease Registry. [n. d.]. Social Vulnerability Index. https://www.atsdr.cdc.gov/place-health/php/svi/ index.html

  54. [54]

    Four Twenty Seven, Argos Analytics, Habitat Seven, and the Public Health Institute (PHI). [n. d.]. California Heat Assessment Tool. https://www.cal-heat.org/

  55. [55]

    Lawrence D Frank, James F Sallis, Brian E Saelens, Lauren Leary, Kelli Cain, Terry L Conway, and Paul M Hess. 2010. The development of a walkability index: application to the Neighborhood Quality of Life Study.British journal of sports medicine44, 13 (2010), 924–933

  56. [56]

    Germanwatch. [n. d.]. Climate Risk Index. https://www.germanwatch.org/en/cri

  57. [57]

    Salvatore Greco, Alessio Ishizaka, Menelaos Tasiou, and Gianpiero Torrisi. 2019. On the methodological framework of composite indices: A review of the issues of weighting, aggregation, and robustness.Social indicators research141, 1 (2019), 61–94

  58. [58]

    Hariolf Grupp and Mary Ellen Mogee. 2004. Indicators for national science and technology policy: how robust are composite indicators? Research policy33, 9 (2004), 1373–1384

  59. [59]

    Hariolf Grupp and Torben Schubert. 2010. Review and new evidence on composite innovation indicators for evaluating national performance.Research Policy39, 1 (2010), 67–78

  60. [60]

    Nobuyuki Hamajima, Hidemichi Yuasa, Keitaro Matsuo, and Yohko Kurobe. 1999. Detection of Gene—Environment Interaction by Case-only Studies.Japanese journal of clinical oncology29, 10 (1999), 490–493

  61. [61]

    Moritz Hardt, Eric Price, and Nati Srebro. 2016. Equality of Opportunity in Supervised Learning. 29 (2016). https://proceedings.neurips. cc/paper_files/paper/2016/file/6a9659feb1216f14f7384ba499518b38-Paper.pdf

  62. [62]

    Sharon L Harlan, Juan H Declet-Barreto, William L Stefanov, and DB Petitti. 2013. Neighborhood effects on heat deaths: social and environmental determinants of vulnerable places.Environmental Health Perspectives121, 2 (2013), 197–204

  63. [63]

    Kelly J Henning. 2004. What is syndromic surveillance?MMWR: Morbidity & Mortality Weekly Report53 (2004)

  64. [64]

    Kyle Heuton, Jyontika Kapoor, Shikhar Shrestha, Thomas J Stopka, and Michael C Hughes. 2025. Spatiotemporal forecasting of opioid-related fatal overdoses: towards best practices for modeling and evaluation.American Journal of Epidemiology194, 6 (2025), 1776–1782. Scrutinizing Index-Based Risk Assessments FAccT ’26, June 25–28, 2026, Montreal, QC, Canada

  65. [65]

    2019.ECOSTRESS Land Surface Temperature and Emissivity

    Simon Hook and Glynn Hulley. 2019.ECOSTRESS Land Surface Temperature and Emissivity. Retrieved December 1, 2025 from https://doi.org/10.5067/ECOSTRESS/ECO2LSTE.001

  66. [66]

    Benjamin Q Huynh, Elizabeth T Chin, Allison Koenecke, Derek Ouyang, Daniel E Ho, Mathew V Kiang, and David H Rehkopf. 2024. Mitigating allocative tradeoffs and harms in an environmental justice data tool. Nature Machine Intelligence 6, 2 (01 Feb 2024), 187–194

  67. [67]

    Jacobs and Hanna Wallach

    Abigail Z. Jacobs and Hanna Wallach. 2021. Measurement and Fairness. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(Virtual Event, Canada)(FAccT ’21). Association for Computing Machinery, New York, NY, USA, 375–385. doi:10.1145/3442188.3445901

  68. [69]

    Nari Johnson, Elise Silva, Harrison Leon, Motahhare Eslami, Beth Schwanke, Ravit Dotan, and Hoda Heidari. 2025. Legacy Procurement Practices Shape How U.S. Cities Govern AI: Understanding Government Employees’ Practices, Challenges, and Needs. InProceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’25). Association f...

  69. [70]

    Rebecca Ann Johnson and Simone Zhang. 2022. What is the Bureaucratic Counterfactual? Categorical versus Algorithmic Prioritization in U.S. Social Policy. InProceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency(Seoul, Republic of Korea) (FAccT ’22). Association for Computing Machinery, New York, NY, USA, 1671–1682. doi:10.114...

  70. [71]

    Brenda Jones and Jean Andrey. 2007. Vulnerability index construction: methodological choices and their influence on identifying vulnerable neighbourhoods.International journal of emergency management4, 2 (2007), 269–295

  71. [72]

    Jongbin Jung, Connor Concannon, Ravi Shroff, Sharad Goel, and Daniel G Goldstein. 2020. Simple rules to guide expert classifications. Journal of the Royal Statistical Society Series A: Statistics in Society183, 3 (2020), 771–800

  72. [73]

    Matthias Kaiser, Andrew Tzer-Yeu Chen, and Peter Gluckman. 2021. Should policy makers trust composite indices? A commentary on the pitfalls of inappropriate indices for policy formation.Health research policy and systems19, 1 (2021), 40

  73. [74]

    Seigi Karasaki, Rachel Morello-Frosch, and Duncan Callaway. 2024. Machine learning for environmental justice: Dissecting an algorithmic approach to predict drinking water quality in California.Science of The Total Environment951 (2024), 175730

  74. [75]

    Maximilian Kasy and Rediet Abebe. 2021. Fairness, Equality, and Power in Algorithmic Decision-Making. InProceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency(Virtual Event, Canada)(FAccT ’21). Association for Computing Machinery, New York, NY, USA, 576–586. doi:10.1145/3442188.3445919

  75. [76]

    Deng Ke, Kiyoshi Takahashi, Jun’ya Takakura, Kaoru Takara, and Bahareh Kamranzad. 2023. Effects of heatwave features on machine-learning-based heat-related ambulance calls prediction models in Japan.Science of the total environment873 (2023), 162283

  76. [77]

    Muin J Khoury and W Dana Flanders. 1996. Nontraditional epidemiologic approaches in the analysis of gene environment interaction: case-control studies with no controls!American journal of epidemiology144, 3 (1996), 207–213

  77. [78]

    Nicholas Kimutis, Tamara Wall, and Lyndsey Darrow. 2024. Emergency management short term response to extreme heat in the 25 most populated US cities.International Journal of Disaster Risk Reduction100 (2024), 104097

  78. [79]

    Selim Kuzucu, Jiaee Cheong, Hatice Gunes, and Sinan Kalkan. 2024. Uncertainty as a fairness measure.Journal of Artificial Intelligence Research81 (2024), 307–335

  79. [80]

    Christos Kyrkou, Panayiotis Kolios, Theocharis Theocharides, and Marios Polycarpou. 2022. Machine learning for emergency management: A survey and future outlook.Proc. IEEE111, 1 (2022), 19–41

  80. [81]

    Erin C Lentz and Daniel Maxwell. 2022. How do information problems constrain anticipating, mitigating, and responding to crises? International Journal of Disaster Risk Reduction81 (2022), 103242

Showing first 80 references.