Decentralized moderation on Mastodon produces stable signed network structure that allows information from moderating instances to spread more efficiently while constraining flow from banned domains.
DeTox-Fed: Detecting Toxic Conversations in the Fediverse with Federated Graph Neural Networks
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
The rise of decentralized social networks (DSNs), and in particular the rapid uptake of the Fediverse (e.g., Pleroma, Mastodon, Lemygrad), introduces new challenges in content moderation. Independent instances host their own data, follow different moderation policies, and often observe only partial views of conversations. We present DeTox-Fed, a federated graph-learning framework for detecting toxic conversations in DSNs without requiring instances to share raw conversations or moderation labels. Each instance constructs a local conversation graph, where nodes represent conversation trees and edges capture shared user participation across conversations. A Graph Neural Network (GNN) is then trained in a federated learning setup, allowing instances to collaboratively learn a toxicity classifier while preserving data locality. Unlike text-only moderation approaches, DeTox-Fed combines conversational structure, user-interaction patterns, conversation-level statistics, and aggregate sentiment signals. We evaluate the framework on a large Pleroma conversation dataset and show that it achieves stable toxic conversation detection under limited local labels, partial client participation, and varying moderation thresholds. Our results indicate that federated graph-based moderation is a promising direction for semi-automated moderation in decentralized social networks.
fields
physics.soc-ph 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
On the Effects of Decentralized Moderation on Network Robustness and Information Diffusion in Mastodon
Decentralized moderation on Mastodon produces stable signed network structure that allows information from moderating instances to spread more efficiently while constraining flow from banned domains.