Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

Ahmed Al-Awah; Alba Aguilera; Georgina Curto; Nardine Osman

arxiv: 2507.23644 · v3 · submitted 2025-07-31 · 💻 cs.MA

Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

Alba Aguilera , Georgina Curto , Nardine Osman , Ahmed Al-Awah This is my paper

Pith reviewed 2026-05-19 02:27 UTC · model grok-4.3

classification 💻 cs.MA

keywords agent-based modelingreinforcement learningcapability approachpeople experiencing homelessnesshealth equityBayesian inverse reinforcement learningtrust calibrationsocial policy evaluation

0 comments

The pith

Simulations show that building trust with social workers can help people experiencing homelessness restore their central human capabilities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds an agent-based reinforcement learning model to evaluate policies for better healthcare equity among people experiencing homelessness in Barcelona. Agents representing PEH pursue restoration of key human capabilities inside existing legal and environmental limits, while interacting with social workers. Bayesian inverse reinforcement learning adjusts trust and engagement levels according to different behavioral profiles. This non-invasive simulation method lets organizations test how trust relationships influence policy outcomes on health inequity before any real-world changes.

Core claim

By defining a reinforcement learning environment where agents representing people experiencing homelessness aim to restore their central human capabilities under environmental and legal constraints, and applying Bayesian inverse reinforcement learning to calibrate profile-dependent trust and engagement parameters with social workers, the work shows a path to mitigate health inequity by building relationships of trust between social service workers and PEH.

What carries the argument

Reinforcement learning environment in which PEH agents restore central human capabilities under constraints, with Bayesian inverse reinforcement learning calibrating trust and engagement parameters.

Load-bearing premise

Bayesian inverse reinforcement learning on an abstract model can produce trust and engagement parameters that meaningfully match real-world behavior and outcomes for people experiencing homelessness.

What would settle it

Direct comparison of the model's predicted improvements in healthcare access and capability restoration against measured results after actual trust-focused policy changes are introduced in Barcelona social services.

read the original abstract

Agent-based simulations have an untapped potential to inform social policies on urgent human development challenges in a non-invasive way, before these are implemented in real-world populations. This paper responds to the request from non-profit and governmental organizations to evaluate policies under discussion to improve equity in health care services for people experiencing homelessness (PEH) in the city of Barcelona. With this goal, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to model and evaluate the behaviour of agents who represent PEH and social workers. We define a reinforcement learning environment where agents aim to restore their central human capabilities, under existing environmental and legal constraints. We use Bayesian inverse reinforcement learning (IRL) to calibrate profile-dependent behavioural parameters in PEH agents, modeling the degree of trust and engagement with social workers, which is reportedly a key element for the success of the policies in scope. Our results open a path to mitigate health inequity by building relationships of trust between social service workers and PEH.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies established RL and Bayesian IRL to model trust in Barcelona's homelessness services via the capability approach, but the parameters look fitted rather than independently validated.

read the letter

The core of the paper is an agent-based simulation that puts the capability approach into a reinforcement learning setup for people experiencing homelessness and social workers in Barcelona. Agents try to restore central capabilities under legal and environmental limits, and Bayesian IRL is used to set profile-specific trust and engagement levels that supposedly drive better policy outcomes. The claim is that this non-invasive tool can help test equity policies before real rollout. That framing is straightforward and matches what the abstract promises. The work is mostly an application rather than a new algorithm or theoretical result, which is fine if the domain fit is useful. It does show how to encode capability restoration as rewards and how to let IRL recover behavioral parameters from that environment. For readers who already work on multi-agent systems for social policy, the Barcelona-specific constraints and the focus on trust as a lever are concrete enough to be worth looking at. The main weakness is the missing link between the calibrated trust parameters and actual observations. The abstract presents those parameters as key to the policy insight, yet there is no reported check against survey data, field observations, or even a hold-out set of real engagement metrics. Without that, it is hard to tell whether the IRL step is recovering genuine behavioral tendencies or simply tuning the model to its own assumptions. Sensitivity to the reward function or prior choices is also not mentioned. If the full text has those checks, they need to be front and center; if not, the simulation results stay decoupled from the target population. This is the sort of paper that could interest people working on AI for social services or urban policy simulation. A methods journal or a venue that handles applied RL in public-sector settings might give it a fair read. I would send it to peer review rather than desk-reject, mainly because the setup is explicit and the policy question is timely, but the referees will need to press hard on validation and robustness before any stronger claims are accepted.

Referee Report

2 major / 2 minor

Summary. The manuscript develops an agent-based simulation that combines the capability approach with a reinforcement learning environment in which agents representing people experiencing homelessness (PEH) and social workers attempt to restore central human capabilities under legal and environmental constraints. Bayesian inverse reinforcement learning is used to calibrate profile-dependent trust and engagement parameters for the PEH agents; the authors argue that the resulting simulations can evaluate policies under discussion in Barcelona and that the work opens a path to mitigate health inequity through improved trust relationships.

Significance. If the calibration step can be shown to produce parameters that are not merely fitted but are independently predictive of observed behavior, the framework would constitute a useful non-invasive tool for testing social-service policies before real-world deployment. The integration of the capability approach with modern IRL techniques in a multi-agent setting is a distinctive application that could interest both policy-oriented and technical audiences in multi-agent systems.

major comments (2)

[Methods] Methods section on Bayesian IRL: the calibration of profile-dependent trust and engagement parameters is presented as the key step that enables policy insight, yet the manuscript provides no validation against observational or survey data on actual trust dynamics between PEH and social workers in Barcelona, nor any cross-validation or hold-out tests that would demonstrate the parameters are not simply recovered from the same data used to define the reward function.
[Results] Results and policy-evaluation sections: the reported benefits of trust-building policies are shown only in the calibrated simulation; without sensitivity checks on the choice of reward function, prior distributions, or the abstract RL environment itself, it is impossible to assess whether the policy recommendations remain stable when these modeling choices are varied.

minor comments (2)

[Abstract] The abstract and introduction would benefit from a short paragraph clarifying the precise data sources (simulated trajectories, expert elicitation, or limited field observations) that feed the Bayesian IRL step.
[Model definition] Notation for the capability-restoration reward function and the trust parameter vector should be introduced once and used consistently; occasional shifts between prose descriptions and symbols reduce readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the strengths and limitations of our approach. We address the major comments below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Methods] Methods section on Bayesian IRL: the calibration of profile-dependent trust and engagement parameters is presented as the key step that enables policy insight, yet the manuscript provides no validation against observational or survey data on actual trust dynamics between PEH and social workers in Barcelona, nor any cross-validation or hold-out tests that would demonstrate the parameters are not simply recovered from the same data used to define the reward function.

Authors: The referee correctly identifies a limitation in the current presentation. The Bayesian IRL calibration draws on prior distributions informed by published studies on trust and engagement in homeless services and consultations with local experts, rather than new observational data from Barcelona. We did not conduct cross-validation or hold-out tests in the submitted manuscript. In the revised version, we will add a new subsection in the Methods detailing the sources of the priors and reward function, include within-sample cross-validation results to show parameter identifiability, and explicitly discuss the absence of external validation as a limitation while proposing it as future work. revision: yes
Referee: [Results] Results and policy-evaluation sections: the reported benefits of trust-building policies are shown only in the calibrated simulation; without sensitivity checks on the choice of reward function, prior distributions, or the abstract RL environment itself, it is impossible to assess whether the policy recommendations remain stable when these modeling choices are varied.

Authors: We agree that sensitivity analyses are necessary to evaluate the robustness of the policy insights. The submitted manuscript focuses on the main calibrated scenario without reporting variations. We will incorporate sensitivity checks by varying key parameters in the reward function, the hyperparameters of the Bayesian priors, and aspects of the environment such as legal constraints and resource availability. These analyses will be added to the Results section, with figures or tables showing how the policy benefits change under different assumptions. revision: yes

Circularity Check

0 steps flagged

Derivation chain is self-contained; calibration feeds simulation without reduction by construction

full rationale

The paper defines an RL environment grounded in the capability approach, applies Bayesian IRL solely to calibrate profile-dependent trust and engagement parameters from (simulated or limited) data, and then uses the resulting model to evaluate policy scenarios. This is a standard forward simulation workflow in which fitted parameters are inputs rather than outputs that are renamed as independent predictions. No equations or sections are shown that equate a reported result directly to the IRL fit by construction, no load-bearing self-citation chain is invoked to justify uniqueness or the ansatz, and the central claim about trust-building paths is presented as an implication of the simulation outputs rather than a tautological restatement of the calibration step. The derivation therefore remains non-circular on the evidence available.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim depends on the capability approach providing a complete and measurable set of central capabilities, on the RL environment faithfully encoding legal and environmental constraints, and on Bayesian IRL recovering interpretable trust parameters from behavior.

free parameters (1)

profile-dependent trust and engagement parameters
Calibrated via Bayesian IRL; these are the load-bearing fitted values that determine agent behavior toward social workers.

axioms (2)

domain assumption The capability approach supplies an explicit, policy-relevant definition of human well-being that can be operationalized as agent reward functions.
Invoked in the abstract when agents are defined to aim at restoring central human capabilities.
domain assumption Existing environmental and legal constraints in Barcelona can be encoded as fixed rules in the RL environment without further empirical validation.
Stated as the setting in which agents operate.

pith-pipeline@v0.9.0 · 5718 in / 1327 out tokens · 28236 ms · 2026-05-19T02:27:02.318349+00:00 · methodology

Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)