This human study did not involve human subjects: Validat- ing llm simulations as behavioral evidence

Jessica Hullman, David Broska, Huaman Sun, Aaron Shaw · 2026 · arXiv 2602.15785

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

support 1

representative citing papers

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM

cs.CL · 2026-06-26 · conditional · novelty 7.0

Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.

Adaptive Querying with AI Persona Priors

stat.ML · 2026-05-01 · unverdicted · novelty 7.0

A persona-induced latent variable model with LLM response distributions enables closed-form Bayesian updates and finite-mixture predictions for scalable adaptive querying of user-dependent quantities.

Post-training makes large language models less human-like

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.

When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.

When simulations look right but causal effects go wrong: Large language models as behavioral simulators

cs.CY · 2026-04-02 · unverdicted · novelty 6.0

LLMs reproduce observed attitudinal patterns in climate interventions reasonably well but diverge on causal effect estimates, with descriptive fit failing to predict causal accuracy across interventions and outcomes.

An Algebraic Exposition of the Theory of Dyadic Morality

cs.AI · 2026-05-15 · unverdicted · novelty 4.0

Algebraic formalization of dyadic morality via SCM with operators for moral judgment and applications to AI policy design.

citing papers explorer

Showing 6 of 6 citing papers.

Mitigating LLM-based p-Hacking by Preregistering for the Next LLM cs.CL · 2026-06-26 · conditional · none · ref 8
Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.
Adaptive Querying with AI Persona Priors stat.ML · 2026-05-01 · unverdicted · none · ref 41
A persona-induced latent variable model with LLM response distributions enables closed-form Bayesian updates and finite-mixture predictions for scalable adaptive querying of user-dependent quantities.
Post-training makes large language models less human-like cs.CL · 2026-05-08 · unverdicted · none · ref 12
Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.
When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation cs.LG · 2026-04-12 · unverdicted · none · ref 13
Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.
When simulations look right but causal effects go wrong: Large language models as behavioral simulators cs.CY · 2026-04-02 · unverdicted · none · ref 27
LLMs reproduce observed attitudinal patterns in climate interventions reasonably well but diverge on causal effect estimates, with descriptive fit failing to predict causal accuracy across interventions and outcomes.
An Algebraic Exposition of the Theory of Dyadic Morality cs.AI · 2026-05-15 · unverdicted · none · ref 11
Algebraic formalization of dyadic morality via SCM with operators for moral judgment and applications to AI policy design.

This human study did not involve human subjects: Validat- ing llm simulations as behavioral evidence

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer