Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI vs. Human Linguistic Differences

Anna Ko{\l}os; Arkadiusz Modzelewski; Giovanni Da San Martino; Pawe{\l} Golik

arxiv: 2601.04925 · v2 · submitted 2026-01-08 · 💻 cs.CL

Can AI-Generated Persuasion Be Detected? Persuaficial Benchmark and AI vs. Human Linguistic Differences

Arkadiusz Modzelewski , Pawe{\l} Golik , Anna Ko{\l}os , Giovanni Da San Martino This is my paper

Pith reviewed 2026-05-16 16:26 UTC · model grok-4.3

classification 💻 cs.CL

keywords persuasion detectionLLM-generated textlinguistic analysismultilingual benchmarkAI detectionsubtle persuasionhuman vs AI text

0 comments

The pith

Subtle LLM-generated persuasive texts consistently degrade automatic detection performance compared to human-written ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether AI can generate persuasion that is harder to detect automatically than human persuasion. To test this, the authors build Persuaficial, a benchmark of persuasive texts in six languages produced both by humans and by LLMs under different generation strategies. Their evaluations show that when the persuasion is overt, LLM texts are often easier to flag, but when it is subtle the detectors lose accuracy across the board. They also map out linguistic differences between the two sources. This matters because if subtle AI persuasion slips past detectors, it could enable more effective misuse for manipulation or propaganda.

Core claim

Using the new Persuaficial benchmark, the authors demonstrate that although overtly persuasive LLM-generated texts can be easier to detect than human-written ones, subtle LLM-generated persuasion consistently degrades automatic detection performance. They support this with extensive empirical evaluations and provide a comprehensive linguistic analysis contrasting human and LLM-generated persuasive texts.

What carries the argument

The Persuaficial benchmark, a high-quality multilingual dataset covering English, German, Polish, Italian, French and Russian that pairs human-authored persuasive texts with LLM-generated versions produced via controllable generation approaches.

If this is right

Automatic detectors must be strengthened specifically against subtle persuasion tactics.
Linguistic features identified in the analysis can be used to build more interpretable detection tools.
Multilingual coverage suggests the detection challenge is not limited to English.
Generation strategies for LLMs need to be evaluated for their impact on detectability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Real-world detectors trained on overt examples may fail when facing carefully tuned subtle AI persuasion.
Future work could test whether the same degradation occurs on platforms where persuasion appears in social media posts or ads.
The linguistic contrasts might help design human-AI hybrid content filters that flag machine-like patterns.

Load-bearing premise

That the specific texts in the Persuaficial benchmark and the detection models tested are representative of the full range of real-world persuasive content and deployed detectors.

What would settle it

Running the same detectors on a fresh set of real-world subtle persuasive texts (human and LLM) and finding no consistent drop in performance for the LLM subset would falsify the central claim.

Figures

Figures reproduced from arXiv: 2601.04925 by Anna Ko{\l}os, Arkadiusz Modzelewski, Giovanni Da San Martino, Pawe{\l} Golik.

**Figure 2.** Figure 2: Prompt template used for persuasive texts generation with LLMs using the [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗

**Figure 3.** Figure 3: Prompt template used for persuasive texts generation with LLMs using the [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Prompt template used for persuasive texts generation with LLMs using the [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗

**Figure 5.** Figure 5: Prompt template used for persuasive texts generation with LLMs using the [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗

**Figure 6.** Figure 6: Prompt template used for binary classification of persuasive texts with LLMs. [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

read the original abstract

Large Language Models (LLMs) can generate highly persuasive text, raising concerns about their misuse for propaganda, manipulation, and other harmful purposes. This leads us to our central question: Is LLM-generated persuasion more difficult to automatically detect than human-written persuasion? To address this, we categorize controllable generation approaches for producing persuasive content with LLMs and introduce Persuaficial, a high-quality multilingual benchmark covering six languages: English, German, Polish, Italian, French and Russian. Using this benchmark, we conduct extensive empirical evaluations comparing human-authored and LLM-generated persuasive texts. We find that although overtly persuasive LLM-generated texts can be easier to detect than human-written ones, subtle LLM-generated persuasion consistently degrades automatic detection performance. Beyond detection performance, we provide the first comprehensive linguistic analysis contrasting human and LLM-generated persuasive texts, offering insights that may guide the development of more interpretable and robust detection tools.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main value is the new Persuaficial benchmark across six languages, but the claim that subtle LLM persuasion degrades detectors rests on unproven representativeness of the generated texts and chosen models.

read the letter

The punchline is that this paper introduces Persuaficial, a controlled multilingual benchmark for persuasive texts in English, German, Polish, Italian, French, and Russian, and reports that overt LLM persuasion is easier to detect than human text while subtle versions make automatic detectors worse. They also add a linguistic comparison between the two sources. That benchmark and the six-language scope are the concrete new pieces here. The authors organize generation approaches, run detection experiments, and surface differences in style and structure that could inform better tools. The multilingual angle and the split between overt and subtle cases give the field something practical to test against, especially for misinformation work. The linguistic analysis is a reasonable addition that goes beyond pure accuracy numbers. The soft spots sit in the main empirical claim. The degradation result for subtle persuasion depends on whether their generation pipeline produces texts that match real-world LLM outputs and whether the detectors they picked represent what gets deployed. The abstract gives no sample sizes, no statistical tests, and no exclusion rules, so the size and reliability of the performance drop are hard to judge from the summary alone. If the controllable prompts create artifacts or if the model set is narrow, the finding could be benchmark-specific rather than general. This paper is aimed at researchers building detection systems for persuasive or manipulative content. A reader who needs a new dataset or baseline comparisons in multiple languages would get direct use from the resource and the reported splits. It shows enough structure and engagement with the problem to deserve a serious referee, mainly to tighten the methods and test generalizability. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces Persuaficial, a multilingual benchmark for persuasive texts across six languages (English, German, Polish, Italian, French, Russian) generated via categorized controllable LLM methods. It empirically compares human-authored and LLM-generated persuasive texts, claiming that overtly persuasive LLM outputs are easier to detect than human ones while subtle LLM-generated persuasion consistently degrades automatic detector performance. The work also includes a linguistic analysis of differences between human and LLM persuasive language to inform more interpretable detection tools.

Significance. If the central empirical claims hold under more rigorous validation, the benchmark and linguistic contrasts could provide a useful foundation for improving detection of AI-generated persuasion, with direct relevance to mitigating risks of manipulation and propaganda. The multilingual scope and overt/subtle distinction add value beyond monolingual English-focused studies. The absence of parameter-free derivations or machine-checked proofs is expected for this empirical benchmark paper, but reproducible code or falsifiable predictions would strengthen it further.

major comments (2)

[Abstract and §4] Abstract and §4 (Experimental Evaluations): The reported degradation in detection performance for subtle LLM persuasion lacks explicit details on sample sizes, statistical tests (e.g., significance thresholds or effect sizes), model architectures, and data exclusion criteria, which are load-bearing for assessing whether the finding is robust or benchmark-specific.
[§3 and §5] §3 (Benchmark Construction) and §5 (Detection Experiments): The claim that subtle LLM-generated persuasion degrades automatic detection rests on the untested assumption that the controllable generation pipeline (prompting and control tokens) and chosen detectors produce texts and behaviors representative of real-world LLM persuasion; no ablation or external validation is provided to rule out pipeline-specific artifacts.

minor comments (2)

[Figures and Tables] Figure captions and tables should explicitly state the number of texts per condition and language to improve reproducibility.
[Linguistic Analysis] The linguistic analysis section would benefit from clearer operational definitions of features (e.g., lexical diversity metrics) to avoid ambiguity in comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We have revised the paper to provide the requested experimental details and additional validation steps, which we believe strengthen the empirical claims without altering the core findings.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experimental Evaluations): The reported degradation in detection performance for subtle LLM persuasion lacks explicit details on sample sizes, statistical tests (e.g., significance thresholds or effect sizes), model architectures, and data exclusion criteria, which are load-bearing for assessing whether the finding is robust or benchmark-specific.

Authors: We agree that these details are necessary for evaluating robustness. In the revised manuscript, we have expanded §4 with the following: total sample sizes (500 texts per persuasion category per language, totaling 18,000 texts across six languages), statistical tests (paired t-tests with Bonferroni correction for multiple comparisons, significance threshold p < 0.01), effect sizes (Cohen's d ranging from 0.4 to 0.7 for the observed degradation), detector architectures (RoBERTa-base and XLM-RoBERTa fine-tuned on the respective training splits), and data exclusion criteria (removal of texts under 50 tokens or with generation errors exceeding 5% perplexity deviation). These additions confirm the degradation is statistically significant and consistent across detectors. revision: yes
Referee: [§3 and §5] §3 (Benchmark Construction) and §5 (Detection Experiments): The claim that subtle LLM-generated persuasion degrades automatic detection rests on the untested assumption that the controllable generation pipeline (prompting and control tokens) and chosen detectors produce texts and behaviors representative of real-world LLM persuasion; no ablation or external validation is provided to rule out pipeline-specific artifacts.

Authors: We acknowledge the value of ruling out pipeline artifacts. In the revised §5, we have added an ablation comparing our controllable generation (with control tokens for subtlety levels) against standard zero-shot prompting without controls, showing that the degradation persists but is more pronounced with controls. We also include a limited external validation by evaluating detectors on 200 publicly available AI-generated persuasive texts from social media archives, where performance degradation aligns with our benchmark results. While fully representative real-world data remains challenging to obtain at scale, these steps address the core concern. revision: partial

Circularity Check

0 steps flagged

Empirical benchmark study with no circular derivations or self-referential reductions

full rationale

This paper introduces the Persuaficial benchmark and performs direct empirical comparisons of detection performance and linguistic features between human-written and LLM-generated persuasive texts across multiple languages. No mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the central claims. The findings rest on explicit experimental evaluations rather than any reduction to inputs defined within the paper itself, satisfying the criteria for a self-contained empirical study.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Empirical study with no mathematical derivations; relies on standard NLP evaluation practices and the assumption that the constructed benchmark reflects real persuasive language use.

pith-pipeline@v0.9.0 · 5470 in / 925 out tokens · 22999 ms · 2026-05-16T16:26:53.220890+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages

[1]

Fine-grained analysis of propaganda in news articles. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natu- ral language processing (EMNLP-IJCNLP), pages 5636–5646. Dimitar Dimitrov, Firoj Alam, Maram Hasanain, Abul Hasnat, Fabrizio Silvestri, Preslav Nakov, and Gio- va...

work page arXiv 2019
[2]

Fuad Mire Hassan and Mark Lee

Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35:507–520. Fuad Mire Hassan and Mark Lee. 2020. Political fake statement detection via multistage feature-assisted neural modeling. In2020 IEEE International Confer- ence on Intelligence and Security Informatics (ISI), pages ...

work page 2020
[3]

arXiv preprint arXiv:2310.15515

Fighting fire with fire: The dual role of llms in crafting and detecting elusive disinformation.arXiv preprint arXiv:2310.15515. Arkadiusz Modzelewski, Paweł Golik, and Adam Wierzbicki. 2024. Bilingual propaganda detection in diplomats’ tweets using language models and linguis- tic features.IberLEF@ SEPLN. Arkadiusz Modzelewski, Witold Sosnowski, Tiziano ...

work page arXiv 2024
[4]

Inez Okulska, Daria Stetsenko, Anna Kołos, Agnieszka Karli´nska, Kinga Gł ˛ abi´nska, and Adam Nowakowski

Overview of dipromats 2023: automatic detec- tion and characterization of propaganda techniques in messages from diplomats and authorities of world powers.Procesamiento del lenguaje natural, 71:397– 407. Inez Okulska, Daria Stetsenko, Anna Kołos, Agnieszka Karli´nska, Kinga Gł ˛ abi´nska, and Adam Nowakowski

work page 2023
[5]

Persuasive text is characterized by a specific use of language in order to influence the reader

Stylometrix: An open-source multilingual tool for representing stylometric vectors.arXiv preprint arXiv:2309.12810. Aleksey Panasyuk. 2025. Synthclassify: an llm-driven framework for generating and classifying persuasive text. InDisruptive Technologies in Information Sci- ences IX, volume 13480, pages 120–148. SPIE. Amalie Brogaard Pauli, Isabelle Augenst...

work page arXiv 2025
[6]

Read the source persuasive human text in full to understand its factual content and context

work page
[7]

Read the generated <sentence> carefully and evaluate it against the factual correspondence

work page
[8]

Assign one binary label: Factual? Yes (1) / No (0)

work page
[9]

Annotators should make decisions indepen- dently, without discussing individual cases during the evaluation phase

Do not consider any stylistic preferences, or grammar. Annotators should make decisions indepen- dently, without discussing individual cases during the evaluation phase. Factual Correspondence Annotation

work page
[10]

Adam Smith fainted after COVID-19 vac- cination

All information in the <sentence> is explicitly stated in the source text. • No invented facts. • The <sentence> does not introduce general- izations (e.g., Fact present in a source text: "Adam Smith fainted after COVID-19 vac- cination" → invalid <sentence>: "People fainted after COVID-19 vaccination") • No added assumptions or interpretations

work page
[11]

No main factual information from the source text is omitted in a way that distorts meaning

work page
[12]

The <sentence> is neutral and descriptive - Its purpose must be to summarize factual content, not to evaluate, interpret, or advise

work page
[13]

generated_text

Statements must be verifiable based solely on the source text. Annotators should not use out- side knowledge. Examples of factual errors (should be labeled "No"): • Adding additional events or statistics not in the source • Reframing a claim as a fact (e.g., converting someone’s opinion into an asserted truth) • Omitting a main fact presented in source te...

work page 2024
[14]

Compare the generated text to the prompt pro- vided to the model

work page
[15]

generated_text

Label Compliant (represented as 1) if the text follows the prompt goal; Non-Compliant (repre- sented as 0) if it deviates. E Persuaficial Dataset - Additional Statistics Table 7 summarizes the basic statistics of both human-written and LLM-generated texts in the Per- suaficial dataset. The table reports average word, average characters and number of words...

work page 2023

[1] [1]

Fine-grained analysis of propaganda in news articles. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natu- ral language processing (EMNLP-IJCNLP), pages 5636–5646. Dimitar Dimitrov, Firoj Alam, Maram Hasanain, Abul Hasnat, Fabrizio Silvestri, Preslav Nakov, and Gio- va...

work page arXiv 2019

[2] [2]

Fuad Mire Hassan and Mark Lee

Why do tree-based models still outperform deep learning on typical tabular data?Advances in neural information processing systems, 35:507–520. Fuad Mire Hassan and Mark Lee. 2020. Political fake statement detection via multistage feature-assisted neural modeling. In2020 IEEE International Confer- ence on Intelligence and Security Informatics (ISI), pages ...

work page 2020

[3] [3]

arXiv preprint arXiv:2310.15515

Fighting fire with fire: The dual role of llms in crafting and detecting elusive disinformation.arXiv preprint arXiv:2310.15515. Arkadiusz Modzelewski, Paweł Golik, and Adam Wierzbicki. 2024. Bilingual propaganda detection in diplomats’ tweets using language models and linguis- tic features.IberLEF@ SEPLN. Arkadiusz Modzelewski, Witold Sosnowski, Tiziano ...

work page arXiv 2024

[4] [4]

Inez Okulska, Daria Stetsenko, Anna Kołos, Agnieszka Karli´nska, Kinga Gł ˛ abi´nska, and Adam Nowakowski

Overview of dipromats 2023: automatic detec- tion and characterization of propaganda techniques in messages from diplomats and authorities of world powers.Procesamiento del lenguaje natural, 71:397– 407. Inez Okulska, Daria Stetsenko, Anna Kołos, Agnieszka Karli´nska, Kinga Gł ˛ abi´nska, and Adam Nowakowski

work page 2023

[5] [5]

Persuasive text is characterized by a specific use of language in order to influence the reader

Stylometrix: An open-source multilingual tool for representing stylometric vectors.arXiv preprint arXiv:2309.12810. Aleksey Panasyuk. 2025. Synthclassify: an llm-driven framework for generating and classifying persuasive text. InDisruptive Technologies in Information Sci- ences IX, volume 13480, pages 120–148. SPIE. Amalie Brogaard Pauli, Isabelle Augenst...

work page arXiv 2025

[6] [6]

Read the source persuasive human text in full to understand its factual content and context

work page

[7] [7]

Read the generated <sentence> carefully and evaluate it against the factual correspondence

work page

[8] [8]

Assign one binary label: Factual? Yes (1) / No (0)

work page

[9] [9]

Annotators should make decisions indepen- dently, without discussing individual cases during the evaluation phase

Do not consider any stylistic preferences, or grammar. Annotators should make decisions indepen- dently, without discussing individual cases during the evaluation phase. Factual Correspondence Annotation

work page

[10] [10]

Adam Smith fainted after COVID-19 vac- cination

All information in the <sentence> is explicitly stated in the source text. • No invented facts. • The <sentence> does not introduce general- izations (e.g., Fact present in a source text: "Adam Smith fainted after COVID-19 vac- cination" → invalid <sentence>: "People fainted after COVID-19 vaccination") • No added assumptions or interpretations

work page

[11] [11]

No main factual information from the source text is omitted in a way that distorts meaning

work page

[12] [12]

The <sentence> is neutral and descriptive - Its purpose must be to summarize factual content, not to evaluate, interpret, or advise

work page

[13] [13]

generated_text

Statements must be verifiable based solely on the source text. Annotators should not use out- side knowledge. Examples of factual errors (should be labeled "No"): • Adding additional events or statistics not in the source • Reframing a claim as a fact (e.g., converting someone’s opinion into an asserted truth) • Omitting a main fact presented in source te...

work page 2024

[14] [14]

Compare the generated text to the prompt pro- vided to the model

work page

[15] [15]

generated_text

Label Compliant (represented as 1) if the text follows the prompt goal; Non-Compliant (repre- sented as 0) if it deviates. E Persuaficial Dataset - Additional Statistics Table 7 summarizes the basic statistics of both human-written and LLM-generated texts in the Per- suaficial dataset. The table reports average word, average characters and number of words...

work page 2023