Toward an AI-Powered Computational Testbed for Workforce Policy

Ashley V. Whillans; Sumer S. Vaid

arxiv: 2605.19064 · v1 · pith:SXJCTSDMnew · submitted 2026-05-18 · 💻 cs.HC · cs.AI

Toward an AI-Powered Computational Testbed for Workforce Policy

Sumer S. Vaid , Ashley V. Whillans This is my paper

Pith reviewed 2026-05-20 08:17 UTC · model grok-4.3

classification 💻 cs.HC cs.AI

keywords workforce simulationorganizational changegenerative agentsAI integrationemployee forecastingcomputational testbedworkforce policyLLM applications in HR

0 comments

The pith

Organizations can forecast how employees will respond to AI-driven workplace changes by running simulations with AI agents built from real worker data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a computational testbed that combines large language model agents with established management science to create dynamic employee agents. These agents draw on HR records, psychometric measures, and digital activity data to generate day-by-day simulations of how workers think, feel, and behave during planned organizational shifts such as AI integration. A sympathetic reader would care because workforce transformations are hard to predict and expensive when mismanaged, yet current practice lacks tools to anticipate individual psychological and behavioral reactions. The authors outline the architecture needed to run these simulations and the safeguards required for privacy, accuracy, and representativeness. They present the resulting forecasting infrastructure as necessary for steering the global realignment of knowledge work around AI.

Core claim

The central claim is that dynamic employee agents can be constructed by merging LLM-powered generative agents with foundational organizational behavior research. When seeded with consenting employees' HR records, validated psychometric measures, and digital activity data, these agents simulate cognitive, emotional, and behavioral trajectories across successive workdays during organizational changes. The paper details the required computational architecture and specifies privacy, accuracy, and representativeness safeguards for responsible deployment, arguing that such prospective forecasting is a critical requirement for managing AI-related workforce realignment.

What carries the argument

Dynamic employee agents: LLM-powered generative agents seeded with HR records, psychometric measures, and activity data to simulate daily cognitive, emotional, and behavioral trajectories during organizational change.

If this is right

The testbed would enable prospective forecasting of workforce responses to AI integration before changes are implemented.
Simulations could capture evolving employee reactions across multiple successive workdays rather than single snapshots.
Responsible deployment requires explicit safeguards for privacy, accuracy, and representativeness of the seeded data.
The infrastructure would reduce costs from mismanaging difficult workforce transformations.
Such a platform addresses a technical gap in managing the current global workforce realignment around AI.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Policymakers could run controlled comparisons of alternative change-management strategies inside the simulation before committing resources.
The same seeding and simulation approach might extend to modeling responses in non-AI organizational interventions such as restructuring or remote-work shifts.
Ongoing validation against fresh real-world data would be required to maintain predictive value as agent capabilities evolve.
Team-level interactions could be modeled by running multiple employee agents in parallel to study collective dynamics during change.

Load-bearing premise

That LLM-powered generative agents seeded with HR records, validated psychometric measures, and digital activity data will produce trajectories that accurately reflect real human psychological and behavioral responses to organizational change.

What would settle it

A side-by-side comparison of agent-generated daily trajectories against actual employee survey responses, mood logs, and performance metrics collected during a real AI integration initiative in a consenting organization.

Figures

Figures reproduced from arXiv: 2605.19064 by Ashley V. Whillans, Sumer S. Vaid.

read the original abstract

Workforce transformations are difficult to forecast and costly to mismanage. In particular, the integration of artificial intelligence into knowledge work currently affects a substantial share of the global workforce, yet this transition proceeds without tools to forecast how individual employees will respond psychologically and behaviorally. We combine recent advances in LLM-powered generative agents with foundational management science and organizational behavior research to propose dynamic employee agents. Among consenting populations, these agents can be seeded with HR records, validated psychometric measures, and digital activity data to simulate employees' cognitive, emotional, and behavioral trajectories across successive workdays during planned organizational changes. In this article, we detail the computational architecture required to construct this simulation platform and define the privacy, accuracy, and representativeness safeguards necessary for responsible deployment. We argue that establishing this prospective forecasting infrastructure is a critical technical requirement for managing the current global workforce realignment around AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a forward-looking proposal for an LLM-agent workforce simulator that flags a real forecasting gap but offers no implementation, validation plan, or results.

read the letter

The core takeaway is that the authors sketch a testbed where generative agents, seeded with HR records, psychometrics, and digital traces, would simulate daily employee responses to AI-driven org changes. They outline modules for prompting, memory, and interaction plus privacy and representativeness safeguards. That combination is framed as new, though it mainly extends existing agent-based modeling and organizational behavior work rather than introducing a fresh mechanism or dataset. The paper does a clean job naming the practical problem—workforce AI shifts lack prospective tools—and it stays concrete on the architecture components without overclaiming current capabilities. Credit for pulling in management science references to ground the agent behaviors. The soft spots are exactly where the stress-test note lands. The central claim that these agents will produce trajectories reflecting real cognitive and emotional responses has no described calibration procedure, no hold-out checks against longitudinal records, and no error bounds. Without that grounding step the whole forecasting value stays speculative. The manuscript is a high-level proposal only; there are no experiments, code, or even toy runs to assess fidelity. Citation pattern looks standard and not circular. This piece is aimed at HCI and policy researchers who think about simulation infrastructure for labor issues. A reader already working on generative agents or workforce analytics might pick up useful framing or safeguard ideas, but anyone expecting empirical grounding will find it thin. It deserves a serious referee because the problem is timely and the architecture sketch could usefully structure follow-on work, even if the current version needs substantial additions on validation before it can be taken as more than a concept note. I would recommend sending it out for review with explicit instructions to add a concrete validation roadmap.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a computational testbed for workforce policy that uses LLM-powered generative agents, combined with management science and organizational behavior research, to simulate individual employees' cognitive, emotional, and behavioral responses to organizational changes such as AI integration. Agents are to be seeded with HR records, validated psychometric measures, and digital activity data to generate day-to-day trajectories, with the paper outlining the required architecture, privacy safeguards, accuracy considerations, and representativeness requirements for responsible use.

Significance. If the proposed platform could be implemented and validated to produce trajectories that reliably match real employee data, it would offer a novel prospective forecasting tool for managing workforce transformations, addressing a gap in current policy analysis that relies on retrospective or aggregate methods. The integration of generative agents with established OB constructs is a promising direction, but the manuscript provides no empirical grounding or implementation to assess feasibility.

major comments (3)

[Abstract and dynamic employee agents section] Abstract and § on dynamic employee agents: The central claim that LLM-powered agents seeded with HR/psychometric/digital data will produce trajectories reflecting real human psychological and behavioral responses lacks any described validation procedure, calibration method, or hold-out testing against longitudinal employee records. This is load-bearing for the forecasting utility asserted in the abstract.
[Architecture section] Architecture section: The description of prompting, memory, and interaction modules provides no concrete metrics for simulation fidelity (e.g., error bounds on emotional or behavioral outputs) or a proposed grounding process against observed human data, leaving the accuracy safeguards mentioned in the abstract without operational detail.
[Safeguards and deployment section] Safeguards and deployment section: While privacy and representativeness are addressed, there is no discussion of how to quantify or mitigate divergence between simulated trajectories and real employee responses, which undermines the claim that the testbed can serve as a reliable policy tool.

minor comments (2)

The manuscript would benefit from explicit section headings and a figure or diagram illustrating the data flow from seeding sources through agent modules to output trajectories.
Clarify whether the proposal assumes access to real-time digital activity data or relies on historical aggregates, as this affects the day-to-day simulation claim.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed report. The manuscript is a conceptual proposal that outlines the architecture, privacy safeguards, accuracy considerations, and representativeness requirements for a future computational testbed. We address each major comment below, indicating where revisions will be made to improve clarity without altering the proposal's scope.

read point-by-point responses

Referee: [Abstract and dynamic employee agents section] Abstract and § on dynamic employee agents: The central claim that LLM-powered agents seeded with HR/psychometric/digital data will produce trajectories reflecting real human psychological and behavioral responses lacks any described validation procedure, calibration method, or hold-out testing against longitudinal employee records. This is load-bearing for the forecasting utility asserted in the abstract.

Authors: We agree that the manuscript does not describe a specific validation procedure, calibration method, or hold-out testing protocol. As a proposal paper, the focus is on defining the necessary architecture and safeguards rather than reporting an implemented system. The forecasting utility is framed as contingent on future validation against real employee data. We will revise the abstract and dynamic employee agents section to explicitly state that such validation procedures are a required next step for any deployment, thereby clarifying the load-bearing assumptions. revision: yes
Referee: [Architecture section] Architecture section: The description of prompting, memory, and interaction modules provides no concrete metrics for simulation fidelity (e.g., error bounds on emotional or behavioral outputs) or a proposed grounding process against observed human data, leaving the accuracy safeguards mentioned in the abstract without operational detail.

Authors: The architecture section provides a high-level description of the required modules. We acknowledge the absence of concrete fidelity metrics or a detailed grounding process, which would necessarily be implementation-specific. We will partially revise this section to propose example metrics (such as correlation thresholds with observed behavioral indicators) and a high-level outline of a grounding process, while noting that full operationalization depends on pilot data and will be developed in follow-on work. revision: partial
Referee: [Safeguards and deployment section] Safeguards and deployment section: While privacy and representativeness are addressed, there is no discussion of how to quantify or mitigate divergence between simulated trajectories and real employee responses, which undermines the claim that the testbed can serve as a reliable policy tool.

Authors: We agree that explicit discussion of quantifying and mitigating divergence is needed to support reliability claims. The current section emphasizes privacy and representativeness as core safeguards. We will revise the safeguards and deployment section to include methods for monitoring divergence (e.g., periodic benchmarking against available longitudinal records) and mitigation approaches such as iterative model updating, thereby strengthening the case for responsible use as a policy tool. revision: yes

Circularity Check

0 steps flagged

Proposal for LLM-based workforce simulation testbed contains no derivations or fitted predictions

full rationale

The paper is a forward-looking conceptual proposal that outlines an architecture for dynamic employee agents using LLM-powered generative agents seeded with HR records, psychometric measures, and digital activity data. It describes prompting, memory, and interaction modules along with privacy safeguards but presents no equations, mathematical derivations, fitted parameters, or empirical predictions. No load-bearing steps reduce any claim to prior inputs by construction, self-citation chains, or ansatzes. The central suggestion for a forecasting infrastructure stands as an independent proposal without circular reduction to its own assumptions or data.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The proposal rests on the untested assumption that current LLM agents can faithfully model human responses when seeded with the listed data types; no free parameters or new physical entities are introduced.

axioms (1)

domain assumption LLM-powered generative agents can be seeded with HR records, psychometric measures, and digital activity data to produce realistic simulations of employee trajectories.
This is the core premise invoked when defining dynamic employee agents.

invented entities (1)

dynamic employee agents no independent evidence
purpose: To simulate cognitive, emotional, and behavioral trajectories of real employees during organizational change.
New term and concept introduced to describe the proposed simulation entities.

pith-pipeline@v0.9.0 · 5675 in / 1321 out tokens · 41058 ms · 2026-05-20T08:17:05.165463+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

E. Bonabeau, Agent-based modeling: Methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. U.S.A. 99, 7280–7287 (2002). 15. E. Bruch, J. Atwell, Agent-based models in empirical social research. Sociol. Methods Res. 44, 186–221 (2015). 16. D. Demszky, D. Yang, D. S. Yeager, C. J. Bryan, M. Clapper, S. Chandhok, J. C. Eichstaedt, C. Hech...

work page 2002
[2]

Brynjolfsson, D

E. Brynjolfsson, D. Li, L. R. Raymond, Generative AI at work. Q. J. Econ. 140, 889–942 (2025). 33. National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research (U.S. Department of Health, Education, and Welfare, 1979); ...

work page arXiv 2025

[1] [1]

Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs

E. Bonabeau, Agent-based modeling: Methods and techniques for simulating human systems. Proc. Natl. Acad. Sci. U.S.A. 99, 7280–7287 (2002). 15. E. Bruch, J. Atwell, Agent-based models in empirical social research. Sociol. Methods Res. 44, 186–221 (2015). 16. D. Demszky, D. Yang, D. S. Yeager, C. J. Bryan, M. Clapper, S. Chandhok, J. C. Eichstaedt, C. Hech...

work page 2002

[2] [2]

Brynjolfsson, D

E. Brynjolfsson, D. Li, L. R. Raymond, Generative AI at work. Q. J. Econ. 140, 889–942 (2025). 33. National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research (U.S. Department of Health, Education, and Welfare, 1979); ...

work page arXiv 2025