Minor perturbations in persona format, instruction framing, and network structure shift cooperation by up to 76 percentage points and polarization metrics consistently, showing that LLM social simulations require per-claim robustness audits via the new TRAILS taxonomy.
Benchmarking prompt sensitivity in large language models
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
A new MTMM-geometric framework unifies LLM evaluation metrics into three latent dimensions to separate method variance from true capabilities.
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.
citing papers explorer
-
Understanding the Mechanism of Altruism in Large Language Models
A small set of sparse autoencoder features in LLMs drives shifts between generous and selfish allocations in dictator games, with causal patching and steering confirming their role and generalization to other social games.