Authorship Attribution in Multilingual Machine-Generated Texts

Andrea Tagarelli; Dominik Macko; Ivan Srba; Lucio La Cava; R\'obert M\'oro

arxiv: 2508.01656 · v2 · pith:NJJ4FHDInew · submitted 2025-08-03 · 💻 cs.CL · cs.AI· cs.CY· cs.HC· physics.soc-ph

Authorship Attribution in Multilingual Machine-Generated Texts

Lucio La Cava , Dominik Macko , R\'obert M\'oro , Ivan Srba , Andrea Tagarelli This is my paper

classification 💻 cs.CL cs.AIcs.CYcs.HCphysics.soc-ph

keywords multilingualattributionllmsauthorshipgeneratorsmonolingualacrossdiverse

0 comments

read the original abstract

As Large Language Models (LLMs) have reached human-like fluency and coherence, distinguishing machine-generated text (MGT) from human-written content becomes increasingly difficult. While early efforts in MGT detection have focused on binary classification, the growing landscape and diversity of LLMs require a more fine-grained yet challenging authorship attribution (AA), i.e., being able to identify the precise generator (LLM or human) behind a text. However, AA remains nowadays confined to a monolingual setting, with English being the most investigated one, overlooking the multilingual nature and usage of modern LLMs. In this work, we introduce the problem of Multilingual Authorship Attribution, which involves attributing texts to human or multiple LLM generators across diverse languages. Focusing on 18 languages -- covering multiple families and writing scripts -- and 8 generators (7 LLMs and the human-authored class), we investigate the multilingual suitability of monolingual AA methods in terms of their cross-lingual transferability, and the impact of generators on attribution performance. Our results reveal that while certain monolingual AA methods can be adapted to multilingual settings, significant limitations and challenges remain, particularly in transferring across diverse language families, underscoring the complexity of multilingual AA and the need for more robust approaches to better match real-world scenarios.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling
cs.CL 2026-04 unverdicted novelty 7.0

Luminol-AIDetect detects machine-generated text zero-shot by extracting perplexity-based features from original and shuffled text versions, using density estimation and ensemble prediction to exploit greater structura...
When New Generators Arrive: Lifelong Machine-Generated Text Attribution via Ridge Feature Transfer
cs.CL 2026-06 unverdicted novelty 6.0

RidgeFT enables replay-free lifelong MGT attribution via frozen encoder, class-wise sufficient statistics, covariance calibration, and closed-form ridge regression updates, outperforming baselines on macro-F1 and rete...
Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling
cs.CL 2026-04 unverdicted novelty 6.0

Luminol-AIDetect detects machine-generated text zero-shot by extracting perplexity-based features from an input and its shuffled version, using density estimation to exploit greater dispersion in MGT perplexity under ...