Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.
This human study did not involve human subjects: Validat- ing llm simulations as behavioral evidence
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
support 1representative citing papers
A persona-induced latent variable model with LLM response distributions enables closed-form Bayesian updates and finite-mixture predictions for scalable adaptive querying of user-dependent quantities.
Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.
Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.
LLMs reproduce observed attitudinal patterns in climate interventions reasonably well but diverge on causal effect estimates, with descriptive fit failing to predict causal accuracy across interventions and outcomes.
Algebraic formalization of dyadic morality via SCM with operators for moral judgment and applications to AI policy design.
citing papers explorer
-
Mitigating LLM-based p-Hacking by Preregistering for the Next LLM
Preregistering LLM experiments to run on the first future eligible model blocks p-hacking transfer in roughly 73% of cases across 20 models and 11 configurations on two tasks with known ground truth.
-
Adaptive Querying with AI Persona Priors
A persona-induced latent variable model with LLM response distributions enables closed-form Bayesian updates and finite-mixture predictions for scalable adaptive querying of user-dependent quantities.
-
Post-training makes large language models less human-like
Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.
-
When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation
Stronger reasoning models in LLMs reduce behavioral negotiation by defaulting to authority outcomes in multi-agent settings, unlike structured scaffolds that enable concessions.
-
When simulations look right but causal effects go wrong: Large language models as behavioral simulators
LLMs reproduce observed attitudinal patterns in climate interventions reasonably well but diverge on causal effect estimates, with descriptive fit failing to predict causal accuracy across interventions and outcomes.
-
An Algebraic Exposition of the Theory of Dyadic Morality
Algebraic formalization of dyadic morality via SCM with operators for moral judgment and applications to AI policy design.