Prosocial Persuasion at Scale? Large Language Models Outperform Humans in Donation Appeals Across Levels of Personalization
Pith reviewed 2026-05-13 17:57 UTC · model grok-4.3
The pith
LLM-generated donation appeals produced more donations, higher engagement, and stronger persuasiveness ratings than human-written ones in two experiments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM-authored donation appeals outperform human-authored appeals: participants gave more of their bonus to the charities, showed greater engagement with the material, and judged the appeals more persuasive when the text came from an LLM rather than a human author. This advantage held across generic, personalized, and falsely personalized versions of the appeals.
What carries the argument
Experimental contrast between human-written and LLM-generated donation appeals, crossed with three levels of personalization, measured by actual bonus allocation, reading time, and self-reported persuasiveness.
If this is right
- Charities and nonprofits could generate large volumes of donation appeals with LLMs and expect higher response rates than from equivalent human writing.
- Accurate personalization increases giving while inaccurate personalization decreases it, so organizations must verify data before using it in appeals.
- LLMs shift from being viewed primarily as risks for misinformation to tools that can promote positive social behaviors such as charitable giving.
- The same generation approach may apply to other prosocial requests such as volunteering or blood donation appeals.
Where Pith is reading between the lines
- If LLM performance holds in real-world campaigns, organizations might reduce reliance on human copywriters for routine fundraising materials.
- Similar experiments could test whether the advantage extends to other domains like environmental appeals or health-behavior messages.
- Long-term use of LLM appeals might change donor expectations, making highly polished text the new baseline for what feels authentic.
Load-bearing premise
Allocating a small experimental bonus accurately reflects how people donate their own money in real charitable situations, and that human and LLM texts were produced under comparable effort and quality standards.
What would settle it
A field study in which participants use their own funds to donate after seeing either human or LLM appeals and show equal or lower giving for the LLM versions.
read the original abstract
Large Language Models (LLMs) are increasingly regarded as having the potential to generate persuasive content at scale. While previous studies have focused on the risks associated with LLM-generated misinformation, the role of LLMs in enabling prosocial persuasion is still underexplored. We investigate whether donation appeals authored by LLMs are as effective as those written by humans across degrees of personalization. Two preregistered online experiments (Study 1: N = 658; Study 2: N = 642) manipulated Personalization (generic vs. personalized vs. falsely personalized) and Content source (human vs. LLM) and presented participants with donation appeals for charities. We assessed how participants distributed their bonus money across the charities, how they engaged with the donation appeals, and how persuasive they found them. In both experiments, LLM-generated content yielded more donations, resulted in higher engagement, and was rated as more persuasive than human-authored content. There was a gain associated with personalization (Study 2) and a penalty for false personalization (Study 1). Our results suggest that LLMs may be a suitable technology for generating content that can encourage prosocial behavior.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper reports two preregistered online experiments (Study 1: N=658; Study 2: N=642) that manipulate content source (human vs. LLM) and personalization level (generic vs. personalized vs. falsely personalized) in donation appeals for charities. Participants allocated a bonus across charities while measures of actual donations, engagement, and perceived persuasiveness were collected. The central finding is that LLM-generated appeals produced higher donations, greater engagement, and higher persuasiveness ratings than human-authored appeals in both studies, with a personalization benefit in Study 2 and a false-personalization penalty in Study 1.
Significance. If the results hold after clarifying production constraints, the work supplies preregistered evidence that LLMs can generate prosocial persuasive content at scale more effectively than humans under the tested conditions. This has direct implications for charitable fundraising and scalable prosocial messaging. The preregistration and sample sizes are strengths that support the empirical contribution, though external validity hinges on whether the human-LLM comparison isolates model capability rather than differences in authoring effort.
major comments (1)
- [Methods] Methods section on content generation: the procedures for producing human-authored appeals provide only high-level information and omit details on writer expertise, time allocated per appeal, compensation, and whether iterative editing was allowed. In contrast, LLM prompting is described in detail. Because the headline claim treats the comparison as direct evidence of LLM superiority, this missing information on effort parity is load-bearing and must be addressed to support attribution of the donation and persuasiveness differences to the content source itself.
minor comments (1)
- [Abstract] Abstract: no statistical details, effect sizes, exclusion criteria, or checks for confounds such as prompt quality or content length are supplied, which weakens the immediate readability of the central claim.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the methodological details necessary to support our claims. We address the major comment below and will revise the manuscript accordingly.
read point-by-point responses
-
Referee: [Methods] Methods section on content generation: the procedures for producing human-authored appeals provide only high-level information and omit details on writer expertise, time allocated per appeal, compensation, and whether iterative editing was allowed. In contrast, LLM prompting is described in detail. Because the headline claim treats the comparison as direct evidence of LLM superiority, this missing information on effort parity is load-bearing and must be addressed to support attribution of the donation and persuasiveness differences to the content source itself.
Authors: We agree that the current Methods section provides only high-level information on human-authored appeal generation and that greater transparency is needed to evaluate effort parity. In the revised manuscript, we will expand this section to specify that the human writers were experienced freelance content creators recruited via a standard platform, allocated up to 30 minutes per appeal, compensated at prevailing market rates for similar tasks, and instructed to produce a single draft without iterative editing or external feedback. This protocol was designed to approximate the one-shot generation used for the LLM condition. We will also add a brief discussion of how these constraints align with typical real-world production differences between human and automated content. These additions will allow readers to better assess whether the observed differences are attributable to content source rather than unequal authoring effort. revision: yes
Circularity Check
No circularity: straightforward empirical comparison with no derivations or self-referential reductions
full rationale
The paper reports results from two preregistered experiments (N=658 and N=642) that directly measure donation amounts, engagement metrics, and persuasiveness ratings for LLM-generated versus human-authored appeals under varying personalization conditions. No equations, fitted parameters, predictive models, or derivation chains appear in the abstract or described methods. Claims rest on experimental data collection rather than any self-definition, fitted-input prediction, or load-bearing self-citation of uniqueness theorems. The comparison is presented as an empirical test without reduction to prior author work or ansatzes.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math Standard assumptions of experimental design and statistical inference in behavioral science (e.g., random assignment, normality for t-tests/ANOVA)
Reference graph
Works this paper leans on
-
[1]
Voelkel, Shane Muldowney, Johannes C
https://doi.org/10.1038/s41467-025-61345-5 Bajpai, S., Sameer, A., & Fatima, R. (2025). Insights into Moral Reasoning of AI: A Comparative Study Between Humans and Large Language Models. Journal of Media Ethics, 1–15. Barclay, P., & Barker, J. L. (2020). Greener than thou: People who protect the environment are more cooperative, compete to be environmenta...
-
[2]
https://doi.org/10.1177/0093650220961965 Zettler, I., & Strandsbjerg, C. F. (2025). Personalized interventions. Current Opinion in Psychology, 66, 102147. https://doi.org/10.1016/j.copsyc.2025.102147 Zhu, Q., Chong, L., Yang, M., & Luo, J. (2024). Reading users’ minds from what they say: An investigation into llm-based empathic mental inference. Internati...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.