pith. sign in

MASH: Evading Black-Box AI-Generated Text Detectors via Style Humanization

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

The increasing misuse of AI-generated texts (AIGT) has motivated the rapid development of AIGT detection methods. However, the reliability of these detectors remains fragile against adversarial evasions. Existing attack strategies often rely on white-box assumptions or demand prohibitively high computational and interaction costs, rendering them ineffective under practical black-box scenarios. In this paper, we propose Multi-stage Alignment for Style Humanization (MASH), a novel framework that evades black-box detectors based on style transfer. MASH sequentially employs style-injection supervised fine-tuning, direct preference optimization, and inference-time refinement to shape the distributions of AI-generated texts to resemble those of human-written texts. Experiments across 6 datasets and 5 detectors demonstrate the superior performance of MASH over 11 baseline evaders. Specifically, MASH achieves an average Attack Success Rate (ASR) of 92%, surpassing the strongest baselines by an average of 24%, while maintaining superior linguistic quality.

fields

cs.CL 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Base Models Look Human To AI Detectors

cs.CL · 2026-05-19 · unverdicted · novelty 7.0

Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.

citing papers explorer

Showing 1 of 1 citing paper.

  • Base Models Look Human To AI Detectors cs.CL · 2026-05-19 · unverdicted · none · ref 21 · internal anchor

    Base model text evades AI detectors better than instruction-tuned text, and the HIP method strengthens this trade-off across model sizes.