Alleviating Community Fear in Disasters via Multi-Agent Actor-Critic Reinforcement Learning

Almuatazbellah M. Boker; Hoda Eldardiry; Lamine Mili; Michael von Spakovsky; Yashodhan D. Hakke

arxiv: 2604.08802 · v1 · submitted 2026-04-09 · 💻 cs.LG · cs.SY· eess.SY

Alleviating Community Fear in Disasters via Multi-Agent Actor-Critic Reinforcement Learning

Yashodhan D. Hakke , Almuatazbellah M. Boker , Lamine Mili , Michael von Spakovsky , Hoda Eldardiry This is my paper

Pith reviewed 2026-05-10 17:13 UTC · model grok-4.3

classification 💻 cs.LG cs.SYeess.SY

keywords multi-agent reinforcement learningdisaster resiliencecommunity fearcyber-physical-social systemshurricane simulationactor-critic methodsinfrastructure recoverynon-zero-sum games

0 comments

The pith

Coordinating power, communication, and emergency agencies with actor-critic reinforcement learning reduces community fear by 70% in hurricane simulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper extends a cyber-physical-social model of disaster resilience by adding control channels for three key agencies. It models their interactions as a three-player non-zero-sum differential game and solves it with online actor-critic reinforcement learning. Simulations on Hurricane Harvey data show the resulting policies produce a 70% mean fear reduction along with faster infrastructure recovery. The same policies achieve 50% fear reduction on Hurricane Irma data without any refitting, indicating transfer across events.

Core claim

Extending the CPS resilience model with control channels for communication, power, and emergency management agencies allows the system to be cast as a three-player non-zero-sum differential game. Solving this game online with multi-agent actor-critic reinforcement learning generates coordinated actions that reduce mean community fear by 70% and improve infrastructure recovery in simulations based on Hurricane Harvey data. The same learned policies achieve 50% fear reduction on Hurricane Irma data without refitting.

What carries the argument

An extended cyber-physical-social model with control channels for three agencies, treated as a three-player non-zero-sum differential game solved by online actor-critic reinforcement learning.

If this is right

Coordinated interventions from the three agencies lower community fear by mitigating amplification from cascading infrastructure failures.
The reinforcement learning policies achieve 70% mean fear reduction and better infrastructure recovery in Harvey-based simulations.
The policies transfer to Irma data to yield 50% fear reduction without refitting the model.
Online solution of the game supports dynamic responses during an unfolding disaster.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agencies could integrate the RL controller with live sensor and social data streams for real-time adaptation in actual events.
The multi-agent differential game approach might apply to other cascading systems where fear or cooperation breaks down, such as supply chain disruptions.
Physical drills or digital twins could test whether the simulated fear reductions hold when agencies execute the computed actions.

Load-bearing premise

The extended CPS model with added control channels for the three agencies accurately captures the real coupled dynamics of infrastructure failures and social fear.

What would settle it

Comparing the model's predicted fear levels under the learned policies against independently collected community fear surveys from a new hurricane event where similar agency coordination occurred.

Figures

Figures reproduced from arXiv: 2604.08802 by Almuatazbellah M. Boker, Hoda Eldardiry, Lamine Mili, Michael von Spakovsky, Yashodhan D. Hakke.

**Figure 2.** Figure 2: Extended rollouts (2× horizon). Beyond the data window, exogenous drivers are held constant at their last observed values. (a) Harvey, 34 steps. (b) Irma, 24 steps. In both cases the learned policies continue to suppress fear and stabilize infrastructure in the extrapolated regime. (a) Harvey (17 steps) (b) Irma (12 steps) [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Control inputs u1 (communication), u2 (power), u3 (EMS). (a) Harvey, 17 steps. (b) Irma, 12 steps. During the exploration phase (t ≤ 12), sinusoidal probing is visible. After exploration ends, controls settle to small values as fear is suppressed and infrastructure deficits diminish. Player 1 (communication) dominates in both hurricanes, reflecting the dual coupling of u1 to fear and fake news. H. Comparis… view at source ↗

**Figure 4.** Figure 4: Bellman (Hamiltonian) residuals εc,i for each player. Player 2 (power) converges near zero within 12 steps. Player 3 (EMS) shows the largest initial residual due to its multi-objective cost (x1, x4, x9) but decreases substantially. Player 1 (communication) stabilizes at a small residual [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Norms of critic (∥Wc,i∥, solid) and actor (∥Wa,i∥, dashed) weight vectors. All weights start from zero (admissible initialization). The actor weights track the critic weights with a lag set by the ratio αa,i/αc,i, as consistent with the two-timescale nature of the learning algorithm. TABLE IV CONTROLLER COMPARISON (HURRICANE HARVEY). ΣJ: TOTAL INTEGRATED COST ACROSS ALL PLAYERS. EFFORT: Pu 2 i . Method Fea… view at source ↗

**Figure 6.** Figure 6: PE proxy diagnostics. Top: minimum eigenvalue of the running [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

During disasters, cascading failures across power grids, communication networks, and social behavior amplify community fear and undermine cooperation. Existing cyber-physical-social (CPS) models simulate these coupled dynamics but lack mechanisms for active intervention. We extend the CPS resilience model of Valinejad and Mili (2023) with control channels for three agencies, communication, power, and emergency management, and formulate the resulting system as a three-player non-zero-sum differential game solved via online actor-critic reinforcement learning. Simulations based on Hurricane Harvey data show 70% mean fear reduction with improved infrastructure recovery; cross-validation in the case of Hurricane Irma (without refitting) achieves 50% fear reduction, confirming generalizability.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper extends the cyber-physical-social (CPS) resilience model of Valinejad and Mili (2023) by adding control channels for three agencies (communication, power, and emergency management). It formulates the extended system as a three-player non-zero-sum differential game and solves it using online actor-critic reinforcement learning. Simulations on Hurricane Harvey data report a 70% mean fear reduction with improved infrastructure recovery; cross-validation on Hurricane Irma data (without refitting) reports 50% fear reduction to support generalizability.

Significance. If the simulation results hold under the stated model extension and RL formulation, the work provides a concrete framework for active multi-agency intervention in disaster CPS systems, with numerical outcomes from real hurricane data and a cross-validation step that offers some independent grounding. The approach of casting agency coordination as a differential game solved online via actor-critic RL is a clear technical contribution that could inform future resilience modeling.

major comments (2)

[Cross-validation experiment] The cross-validation experiment on Irma data without refitting is presented as confirming generalizability, but it is unclear whether the fear metrics, state dynamics, or any RL hyperparameters were fitted using Harvey data or retained strictly from the 2023 base model; this directly affects whether the 50% reduction constitutes an independent test.
[Simulation results] The central claim of 70% mean fear reduction rests on the extended CPS differential game and the online actor-critic solution, yet the abstract and summary provide no explicit reward functions, state-space definitions, or convergence criteria for the RL training; without these, it is impossible to verify that the reported reductions follow from the model rather than from unstated implementation choices.

minor comments (2)

[Abstract] The abstract states numerical outcomes but omits any reference to the specific form of the control inputs, the payoff structure of the non-zero-sum game, or error bars on the 70% and 50% figures.
[Methods] Implementation details such as actor-critic network architecture, learning rates, or how the differential equations are discretized for RL training are absent from the provided text, limiting reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. We appreciate the recognition of the technical contribution and the value of the cross-validation step. We address each major comment below and have made revisions to improve clarity.

read point-by-point responses

Referee: The cross-validation experiment on Irma data without refitting is presented as confirming generalizability, but it is unclear whether the fear metrics, state dynamics, or any RL hyperparameters were fitted using Harvey data or retained strictly from the 2023 base model; this directly affects whether the 50% reduction constitutes an independent test.

Authors: We thank the referee for highlighting this important point. The fear metrics, state dynamics, and all base CPS model parameters are retained strictly from Valinejad and Mili (2023) with no fitting, calibration, or adjustment performed on the Hurricane Harvey data. The actor-critic RL hyperparameters follow standard values from the literature and were not tuned or optimized on Harvey simulation results. For the Irma cross-validation, the identical model, parameters, and hyperparameters are applied without any refitting or retraining, using only Irma-specific initial conditions and inputs. This ensures the 50% fear reduction is an independent test. We have added an explicit clarifying statement in the revised Section 5.2. revision: yes
Referee: The central claim of 70% mean fear reduction rests on the extended CPS differential game and the online actor-critic solution, yet the abstract and summary provide no explicit reward functions, state-space definitions, or convergence criteria for the RL training; without these, it is impossible to verify that the reported reductions follow from the model rather than from unstated implementation choices.

Authors: We agree that the abstract prioritizes brevity and omits these specifics. The full manuscript defines the state space in Section 3.1, specifies the three-player reward functions in Equations (3)-(5) of Section 3.2, and details the online actor-critic convergence criteria (including learning rates, discount factor of 0.95, and policy gradient norm threshold) in Section 4.2. The reported 70% reduction is obtained directly from solving the differential game under these definitions. To address the concern, we have expanded the abstract with a concise description of the reward structure and RL convergence approach and added a summary table of hyperparameters in the simulation section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained with independent cross-validation

full rationale

The paper extends the prior CPS model from Valinejad and Mili (2023) by adding control channels, formulates the system as a three-player differential game, and solves it with online actor-critic RL. Simulation outcomes (70% fear reduction on Harvey data, 50% on Irma without refitting) are produced by running the extended model under the learned policy; these are not equivalent to the inputs by construction. The Irma cross-validation is explicitly out-of-sample and provides independent grounding. No self-definitional loop, fitted parameter renamed as prediction, or load-bearing self-citation chain appears in the abstract or described pipeline. The central results follow from the new extension and RL solution rather than reducing tautologically to the base model or data fits.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract provides insufficient technical detail to enumerate specific free parameters, axioms, or invented entities. The approach relies on the assumptions of the cited 2023 CPS model plus standard reinforcement learning and differential game theory constructs, but none are explicitly listed or justified here.

pith-pipeline@v0.9.0 · 5444 in / 1307 out tokens · 52201 ms · 2026-05-10T17:13:04.848099+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Identifying, under- standing, and analyzing critical infrastructure interdependencies,

S. M. Rinaldi, J. P. Peerenboom, and T. K. Kelly, “Identifying, under- standing, and analyzing critical infrastructure interdependencies,”IEEE Control Systems Magazine, vol. 21, no. 6, pp. 11–25, 2001

work page 2001
[2]

Review of modeling and simulation of interdependent critical infrastructure systems,

M. Ouyang, “Review of modeling and simulation of interdependent critical infrastructure systems,”Reliability Engineering & System Safety, vol. 121, pp. 43–60, 2014

work page 2014
[3]

The spread of true and false news online,

S. V osoughi, D. Roy, and S. Aral, “The spread of true and false news online,”Science, vol. 359, no. 6380, pp. 1146–1151, 2018

work page 2018
[4]

The science of fake news,

D. M. J. Lazer, M. A. Baum, Y . Benkler, A. J. Berinsky, D. M. Bushman, R. Fariset al., “The science of fake news,”Science, vol. 359, no. 6380, pp. 1094–1096, 2018

work page 2018
[5]

Community resilience as a metaphor, theory, set of capacities, and strategy for disaster readiness,

F. H. Norris, S. P. Stevens, B. Pfefferbaum, K. F. Wyche, and R. L. Pfefferbaum, “Community resilience as a metaphor, theory, set of capacities, and strategy for disaster readiness,”American Journal of Community Psychology, vol. 41, no. 1–2, pp. 127–150, 2008

work page 2008
[6]

A place-based model for understanding community resilience to natural disasters,

S. L. Cutter, L. Barnes, M. Berry, C. Burton, E. Evans, E. Tate, and J. Webb, “A place-based model for understanding community resilience to natural disasters,”Global Environmental Change, vol. 18, no. 4, pp. 598–606, 2008

work page 2008
[7]

Cyber-physical-social model of community resilience by considering critical infrastructure interdependencies,

J. Valinejad and L. Mili, “Cyber-physical-social model of community resilience by considering critical infrastructure interdependencies,”IEEE Internet of Things Journal, vol. 10, no. 24, pp. 21 985–22 001, 2023

work page 2023
[8]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, 2nd ed., ser. Classics in Applied Mathematics. SIAM, 1999

work page 1999
[9]

J. C. Engwerda,LQ Dynamic Optimization and Differential Games. John Wiley & Sons, 2005

work page 2005
[10]

Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations,

K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations,”Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

work page 2011
[11]

Reinforcement learning and adaptive dynamic programming for feedback control,

F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,”IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009

work page 2009
[12]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018

work page 2018
[13]

On actor-critic algorithms,

V . R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,”SIAM Journal on Control and Optimization, vol. 42, no. 4, pp. 1143–1166, 2003

work page 2003
[14]

Operational models of infrastructure resilience,

D. L. Alderson, G. G. Brown, and W. M. Carlyle, “Operational models of infrastructure resilience,”Risk Analysis, vol. 35, no. 4, pp. 562–586, 2015

work page 2015
[15]

Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing,

K. Starbird, J. Maddock, M. Orand, P. Achterman, and R. M. Mason, “Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing,”iConference 2014 Proceed- ings, pp. 654–662, 2014

work page 2013
[16]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

work page 2012
[17]

Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learn- ing,

H. Modares and F. L. Lewis, “Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learn- ing,”IEEE Transactions on Automatic Control, vol. 59, no. 11, pp. 3051– 3056, 2014

work page 2014
[18]

A comprehensive survey of multiagent reinforcement learning,

L. Bus ¸oniu, R. Babuˇska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,”IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 38, no. 2, pp. 156–172, 2008

work page 2008

[1] [1]

Identifying, under- standing, and analyzing critical infrastructure interdependencies,

S. M. Rinaldi, J. P. Peerenboom, and T. K. Kelly, “Identifying, under- standing, and analyzing critical infrastructure interdependencies,”IEEE Control Systems Magazine, vol. 21, no. 6, pp. 11–25, 2001

work page 2001

[2] [2]

Review of modeling and simulation of interdependent critical infrastructure systems,

M. Ouyang, “Review of modeling and simulation of interdependent critical infrastructure systems,”Reliability Engineering & System Safety, vol. 121, pp. 43–60, 2014

work page 2014

[3] [3]

The spread of true and false news online,

S. V osoughi, D. Roy, and S. Aral, “The spread of true and false news online,”Science, vol. 359, no. 6380, pp. 1146–1151, 2018

work page 2018

[4] [4]

The science of fake news,

D. M. J. Lazer, M. A. Baum, Y . Benkler, A. J. Berinsky, D. M. Bushman, R. Fariset al., “The science of fake news,”Science, vol. 359, no. 6380, pp. 1094–1096, 2018

work page 2018

[5] [5]

Community resilience as a metaphor, theory, set of capacities, and strategy for disaster readiness,

F. H. Norris, S. P. Stevens, B. Pfefferbaum, K. F. Wyche, and R. L. Pfefferbaum, “Community resilience as a metaphor, theory, set of capacities, and strategy for disaster readiness,”American Journal of Community Psychology, vol. 41, no. 1–2, pp. 127–150, 2008

work page 2008

[6] [6]

A place-based model for understanding community resilience to natural disasters,

S. L. Cutter, L. Barnes, M. Berry, C. Burton, E. Evans, E. Tate, and J. Webb, “A place-based model for understanding community resilience to natural disasters,”Global Environmental Change, vol. 18, no. 4, pp. 598–606, 2008

work page 2008

[7] [7]

Cyber-physical-social model of community resilience by considering critical infrastructure interdependencies,

J. Valinejad and L. Mili, “Cyber-physical-social model of community resilience by considering critical infrastructure interdependencies,”IEEE Internet of Things Journal, vol. 10, no. 24, pp. 21 985–22 001, 2023

work page 2023

[8] [8]

Bas ¸ar and G

T. Bas ¸ar and G. J. Olsder,Dynamic Noncooperative Game Theory, 2nd ed., ser. Classics in Applied Mathematics. SIAM, 1999

work page 1999

[9] [9]

J. C. Engwerda,LQ Dynamic Optimization and Differential Games. John Wiley & Sons, 2005

work page 2005

[10] [10]

Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations,

K. G. Vamvoudakis and F. L. Lewis, “Multi-player non-zero-sum games: Online adaptive learning solution of coupled Hamilton–Jacobi equations,”Automatica, vol. 47, no. 8, pp. 1556–1569, 2011

work page 2011

[11] [11]

Reinforcement learning and adaptive dynamic programming for feedback control,

F. L. Lewis and D. Vrabie, “Reinforcement learning and adaptive dynamic programming for feedback control,”IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32–50, 2009

work page 2009

[12] [12]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd ed. MIT Press, 2018

work page 2018

[13] [13]

On actor-critic algorithms,

V . R. Konda and J. N. Tsitsiklis, “On actor-critic algorithms,”SIAM Journal on Control and Optimization, vol. 42, no. 4, pp. 1143–1166, 2003

work page 2003

[14] [14]

Operational models of infrastructure resilience,

D. L. Alderson, G. G. Brown, and W. M. Carlyle, “Operational models of infrastructure resilience,”Risk Analysis, vol. 35, no. 4, pp. 562–586, 2015

work page 2015

[15] [15]

Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing,

K. Starbird, J. Maddock, M. Orand, P. Achterman, and R. M. Mason, “Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 Boston Marathon bombing,”iConference 2014 Proceed- ings, pp. 654–662, 2014

work page 2013

[16] [16]

Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,

Y . Jiang and Z.-P. Jiang, “Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics,” Automatica, vol. 48, no. 10, pp. 2699–2704, 2012

work page 2012

[17] [17]

Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learn- ing,

H. Modares and F. L. Lewis, “Linear quadratic tracking control of partially-unknown continuous-time systems using reinforcement learn- ing,”IEEE Transactions on Automatic Control, vol. 59, no. 11, pp. 3051– 3056, 2014

work page 2014

[18] [18]

A comprehensive survey of multiagent reinforcement learning,

L. Bus ¸oniu, R. Babuˇska, and B. De Schutter, “A comprehensive survey of multiagent reinforcement learning,”IEEE Transactions on Systems, Man, and Cybernetics, Part C, vol. 38, no. 2, pp. 156–172, 2008

work page 2008