pith. sign in

arxiv: 2604.08802 · v1 · submitted 2026-04-09 · 💻 cs.LG · cs.SY· eess.SY

Alleviating Community Fear in Disasters via Multi-Agent Actor-Critic Reinforcement Learning

Pith reviewed 2026-05-10 17:13 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY
keywords multi-agent reinforcement learningdisaster resiliencecommunity fearcyber-physical-social systemshurricane simulationactor-critic methodsinfrastructure recoverynon-zero-sum games
0
0 comments X

The pith

Coordinating power, communication, and emergency agencies with actor-critic reinforcement learning reduces community fear by 70% in hurricane simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper extends a cyber-physical-social model of disaster resilience by adding control channels for three key agencies. It models their interactions as a three-player non-zero-sum differential game and solves it with online actor-critic reinforcement learning. Simulations on Hurricane Harvey data show the resulting policies produce a 70% mean fear reduction along with faster infrastructure recovery. The same policies achieve 50% fear reduction on Hurricane Irma data without any refitting, indicating transfer across events.

Core claim

Extending the CPS resilience model with control channels for communication, power, and emergency management agencies allows the system to be cast as a three-player non-zero-sum differential game. Solving this game online with multi-agent actor-critic reinforcement learning generates coordinated actions that reduce mean community fear by 70% and improve infrastructure recovery in simulations based on Hurricane Harvey data. The same learned policies achieve 50% fear reduction on Hurricane Irma data without refitting.

What carries the argument

An extended cyber-physical-social model with control channels for three agencies, treated as a three-player non-zero-sum differential game solved by online actor-critic reinforcement learning.

If this is right

  • Coordinated interventions from the three agencies lower community fear by mitigating amplification from cascading infrastructure failures.
  • The reinforcement learning policies achieve 70% mean fear reduction and better infrastructure recovery in Harvey-based simulations.
  • The policies transfer to Irma data to yield 50% fear reduction without refitting the model.
  • Online solution of the game supports dynamic responses during an unfolding disaster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Agencies could integrate the RL controller with live sensor and social data streams for real-time adaptation in actual events.
  • The multi-agent differential game approach might apply to other cascading systems where fear or cooperation breaks down, such as supply chain disruptions.
  • Physical drills or digital twins could test whether the simulated fear reductions hold when agencies execute the computed actions.

Load-bearing premise

The extended CPS model with added control channels for the three agencies accurately captures the real coupled dynamics of infrastructure failures and social fear.

What would settle it

Comparing the model's predicted fear levels under the learned policies against independently collected community fear surveys from a new hurricane event where similar agency coordination occurred.

Figures

Figures reproduced from arXiv: 2604.08802 by Almuatazbellah M. Boker, Hoda Eldardiry, Lamine Mili, Michael von Spakovsky, Yashodhan D. Hakke.

Figure 1
Figure 1. Figure 1: Ten CPSS state trajectories under three conditions. [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Extended rollouts (2× horizon). Beyond the data window, exogenous drivers are held constant at their last observed values. (a) Harvey, 34 steps. (b) Irma, 24 steps. In both cases the learned policies continue to suppress fear and stabilize infrastructure in the extrapolated regime. (a) Harvey (17 steps) (b) Irma (12 steps) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Control inputs u1 (communication), u2 (power), u3 (EMS). (a) Harvey, 17 steps. (b) Irma, 12 steps. During the exploration phase (t ≤ 12), sinusoidal probing is visible. After exploration ends, controls settle to small values as fear is suppressed and infrastructure deficits diminish. Player 1 (communication) dominates in both hurricanes, reflecting the dual coupling of u1 to fear and fake news. H. Comparis… view at source ↗
Figure 4
Figure 4. Figure 4: Bellman (Hamiltonian) residuals εc,i for each player. Player 2 (power) converges near zero within 12 steps. Player 3 (EMS) shows the largest initial residual due to its multi-objective cost (x1, x4, x9) but decreases substantially. Player 1 (communication) stabilizes at a small residual [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Norms of critic (∥Wc,i∥, solid) and actor (∥Wa,i∥, dashed) weight vectors. All weights start from zero (admissible initialization). The actor weights track the critic weights with a lag set by the ratio αa,i/αc,i, as consistent with the two-timescale nature of the learning algorithm. TABLE IV CONTROLLER COMPARISON (HURRICANE HARVEY). ΣJ: TOTAL INTEGRATED COST ACROSS ALL PLAYERS. EFFORT: Pu 2 i . Method Fea… view at source ↗
Figure 6
Figure 6. Figure 6: PE proxy diagnostics. Top: minimum eigenvalue of the running [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

During disasters, cascading failures across power grids, communication networks, and social behavior amplify community fear and undermine cooperation. Existing cyber-physical-social (CPS) models simulate these coupled dynamics but lack mechanisms for active intervention. We extend the CPS resilience model of Valinejad and Mili (2023) with control channels for three agencies, communication, power, and emergency management, and formulate the resulting system as a three-player non-zero-sum differential game solved via online actor-critic reinforcement learning. Simulations based on Hurricane Harvey data show 70% mean fear reduction with improved infrastructure recovery; cross-validation in the case of Hurricane Irma (without refitting) achieves 50% fear reduction, confirming generalizability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper extends the cyber-physical-social (CPS) resilience model of Valinejad and Mili (2023) by adding control channels for three agencies (communication, power, and emergency management). It formulates the extended system as a three-player non-zero-sum differential game and solves it using online actor-critic reinforcement learning. Simulations on Hurricane Harvey data report a 70% mean fear reduction with improved infrastructure recovery; cross-validation on Hurricane Irma data (without refitting) reports 50% fear reduction to support generalizability.

Significance. If the simulation results hold under the stated model extension and RL formulation, the work provides a concrete framework for active multi-agency intervention in disaster CPS systems, with numerical outcomes from real hurricane data and a cross-validation step that offers some independent grounding. The approach of casting agency coordination as a differential game solved online via actor-critic RL is a clear technical contribution that could inform future resilience modeling.

major comments (2)
  1. [Cross-validation experiment] The cross-validation experiment on Irma data without refitting is presented as confirming generalizability, but it is unclear whether the fear metrics, state dynamics, or any RL hyperparameters were fitted using Harvey data or retained strictly from the 2023 base model; this directly affects whether the 50% reduction constitutes an independent test.
  2. [Simulation results] The central claim of 70% mean fear reduction rests on the extended CPS differential game and the online actor-critic solution, yet the abstract and summary provide no explicit reward functions, state-space definitions, or convergence criteria for the RL training; without these, it is impossible to verify that the reported reductions follow from the model rather than from unstated implementation choices.
minor comments (2)
  1. [Abstract] The abstract states numerical outcomes but omits any reference to the specific form of the control inputs, the payoff structure of the non-zero-sum game, or error bars on the 70% and 50% figures.
  2. [Methods] Implementation details such as actor-critic network architecture, learning rates, or how the differential equations are discretized for RL training are absent from the provided text, limiting reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We appreciate the recognition of the technical contribution and the value of the cross-validation step. We address each major comment below and have made revisions to improve clarity.

read point-by-point responses
  1. Referee: The cross-validation experiment on Irma data without refitting is presented as confirming generalizability, but it is unclear whether the fear metrics, state dynamics, or any RL hyperparameters were fitted using Harvey data or retained strictly from the 2023 base model; this directly affects whether the 50% reduction constitutes an independent test.

    Authors: We thank the referee for highlighting this important point. The fear metrics, state dynamics, and all base CPS model parameters are retained strictly from Valinejad and Mili (2023) with no fitting, calibration, or adjustment performed on the Hurricane Harvey data. The actor-critic RL hyperparameters follow standard values from the literature and were not tuned or optimized on Harvey simulation results. For the Irma cross-validation, the identical model, parameters, and hyperparameters are applied without any refitting or retraining, using only Irma-specific initial conditions and inputs. This ensures the 50% fear reduction is an independent test. We have added an explicit clarifying statement in the revised Section 5.2. revision: yes

  2. Referee: The central claim of 70% mean fear reduction rests on the extended CPS differential game and the online actor-critic solution, yet the abstract and summary provide no explicit reward functions, state-space definitions, or convergence criteria for the RL training; without these, it is impossible to verify that the reported reductions follow from the model rather than from unstated implementation choices.

    Authors: We agree that the abstract prioritizes brevity and omits these specifics. The full manuscript defines the state space in Section 3.1, specifies the three-player reward functions in Equations (3)-(5) of Section 3.2, and details the online actor-critic convergence criteria (including learning rates, discount factor of 0.95, and policy gradient norm threshold) in Section 4.2. The reported 70% reduction is obtained directly from solving the differential game under these definitions. To address the concern, we have expanded the abstract with a concise description of the reward structure and RL convergence approach and added a summary table of hyperparameters in the simulation section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained with independent cross-validation

full rationale

The paper extends the prior CPS model from Valinejad and Mili (2023) by adding control channels, formulates the system as a three-player differential game, and solves it with online actor-critic RL. Simulation outcomes (70% fear reduction on Harvey data, 50% on Irma without refitting) are produced by running the extended model under the learned policy; these are not equivalent to the inputs by construction. The Irma cross-validation is explicitly out-of-sample and provides independent grounding. No self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain appears in the abstract or described pipeline. The central results follow from the new extension and RL solution rather than reducing tautologically to the base model or data fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides insufficient technical detail to enumerate specific free parameters, axioms, or invented entities. The approach relies on the assumptions of the cited 2023 CPS model plus standard reinforcement learning and differential game theory constructs, but none are explicitly listed or justified here.

pith-pipeline@v0.9.0 · 5444 in / 1307 out tokens · 52201 ms · 2026-05-10T17:13:04.848099+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Identifying, under- standing, and analyzing critical infrastructure interdependencies,

    S. M. Rinaldi, J. P. Peerenboom, and T. K. Kelly, “Identifying, under- standing, and analyzing critical infrastructure interdependencies,”IEEE Control Systems Magazine, vol. 21, no. 6, pp. 11–25, 2001

  2. [2]

    Review of modeling and simulation of interdependent critical infrastructure systems,

    M. Ouyang, “Review of modeling and simulation of interdependent critical infrastructure systems,”Reliability Engineering & System Safety, vol. 121, pp. 43–60, 2014

  3. [3]

    The spread of true and false news online,

    S. V osoughi, D. Roy, and S. Aral, “The spread of true and false news online,”Science, vol. 359, no. 6380, pp. 1146–1151, 2018

  4. [4]

    The science of fake news,

    D. M. J. Lazer, M. A. Baum, Y . Benkler, A. J. Berinsky, D. M. Bushman, R. Fariset al., “The science of fake news,”Science, vol. 359, no. 6380, pp. 1094–1096, 2018

  5. [5]

    Community resilience as a metaphor, theory, set of capacities, and strategy for disaster readiness,

    F. H. Norris, S. P. Stevens, B. Pfefferbaum, K. F. Wyche, and R. L. Pfefferbaum, “Community resilience as a metaphor, theory, set of capacities, and strategy for disaster readiness,”American Journal of Community Psychology, vol. 41, no. 1–2, pp. 127–150, 2008

  6. [6]

    A place-based model for understanding community resilience to natural disasters,

    S. L. Cutter, L. Barnes, M. Berry, C. Burton, E. Evans, E. Tate, and J. Webb, “A place-based model for understanding community resilience to natural disasters,”Global Environmental Change, vol. 18, no. 4, pp. 598–606, 2008

  7. [7]

    Cyber-physical-social model of community resilience by considering critical infrastructure interdependencies,

    J. Valinejad and L. Mili, “Cyber-physical-social model of community resilience by considering critical infrastructure interdependencies,”IEEE Internet of Things Journal, vol. 10, no. 24, pp. 21 985–22 001, 2023

  8. [8]

    Bas ¸ar and G

    T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, 2nd ed., ser. Classics in Applied Mathematics. SIAM, 1999

  9. [9]

    J. C. Engwerda,LQ Dynamic Optimization and Differential Games. John Wiley & Sons, 2005

  10. [10]

    Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations,

    K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations,”Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

  11. [11]

    Reinforcement learning and adaptive dynamic programming for feedback control,

    F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,”IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009

  12. [12]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018

  13. [13]

    On actor-critic algorithms,

    V . R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,”SIAM Journal on Control and Optimization, vol. 42, no. 4, pp. 1143–1166, 2003

  14. [14]

    Operational models of infrastructure resilience,

    D. L. Alderson, G. G. Brown, and W. M. Carlyle, “Operational models of infrastructure resilience,”Risk Analysis, vol. 35, no. 4, pp. 562–586, 2015

  15. [15]

    Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing,

    K. Starbird, J. Maddock, M. Orand, P. Achterman, and R. M. Mason, “Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing,”iConference 2014 Proceed- ings, pp. 654–662, 2014

  16. [16]

    Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

    Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

  17. [17]

    Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learn- ing,

    H. Modares and F. L. Lewis, “Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learn- ing,”IEEE Transactions on Automatic Control, vol. 59, no. 11, pp. 3051– 3056, 2014

  18. [18]

    A comprehensive survey of multiagent reinforcement learning,

    L. Bus ¸oniu, R. Babuˇska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,”IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 38, no. 2, pp. 156–172, 2008