arxiv: 2605.03969 · v1 · submitted 2026-05-05 · 💻 cs.CL · cs.AI

Recognition: unknown

Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators

Mohamed Mady , Johannes Reschke , Bj\"orn Schuller

Authors on Pith no claims yet

Pith reviewed 2026-05-07 03:46 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords AI-generated text detectiontransformer modelsfeature augmentationdomain shiftlinguistic featuresrobustnessattention fusion

0 comments

The pith

Augmenting transformers with attention-based linguistic features improves their robustness to shifts in domain and text generator for AI-generated text detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that standard transformer detectors for AI text work well on familiar data but falter when faced with new domains or different generation methods. By fusing linguistic features such as readability and vocabulary into the model's attention process, performance under these shifts improves noticeably. Testing with a single fixed decision threshold across multiple evaluation sets demonstrates that the augmented approach maintains higher accuracy than base models or zero-shot methods. This matters because AI text now appears in varied contexts, requiring detectors that do not need retraining for each new situation.

Core claim

Feature augmentation through attention-based fusion of linguistic features enables transformer-based detectors to achieve better balanced accuracy under cross-dataset and cross-generator shifts, reaching 85.9 percent on a challenging multi-domain benchmark while outperforming zero-shot baselines by as much as 7.22 points, all while using a fixed threshold calibrated on the training validation set.

What carries the argument

Attention-based linguistic feature fusion, which integrates explicit linguistic signals like readability and vocabulary into the transformer's attention layers to enhance generalization across shifts.

If this is right

Base transformer models reach near-perfect scores on the data they were trained on but show large drops and model-dependent errors when applied to shifted distributions.
Readability and vocabulary features provide the largest gains in robustness according to category ablations.
The results hold stably across multiple random seeds.
Using a fixed threshold rather than per-test-set tuning gives a realistic view of how detectors would perform in practice.
The combination of a modern transformer backbone and feature augmentation surpasses earlier models under shift.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the linguistic features continue to add value, this method could allow detectors to handle emerging AI generators without full retraining.
Similar feature fusion techniques might apply to other tasks involving distribution shift in text classification.
The fixed-threshold protocol highlights the need to account for error asymmetries when deploying detectors in varied environments.

Load-bearing premise

Linguistic features like readability and vocabulary remain useful and do not cause overfitting when the domain or the text generator changes, allowing a single threshold to work across different test distributions.

What would settle it

Observing that the feature-augmented model achieves lower balanced accuracy than a plain transformer model on a new dataset with unseen domains and generators would disprove the improvement in robustness.

Figures

Figures reproduced from arXiv: 2605.03969 by Bj\"orn Schuller, Johannes Reschke, Mohamed Mady.

**Figure 1.** Figure 1: BERT+FEATATTN architecture. The input text is encoded by the transformer, yielding a [CLS] representation. In parallel, linguistic features are extracted, reduced to the selected subset, and mapped to a compact embedding via a feature-attention module. The [CLS] vector and feature embedding are concatenated and passed to an MLP classifier to predict Human vs AI. feature pipeline and feature-attention modul… view at source ↗

**Figure 2.** Figure 2: DeBERTa-v3 overview. (A) Encoder block using disentangled attention with separate content and relative-position representations. (B) Disentangled attention factorises content-to-content, content-to-position, and position-to-content interactions. (C) ELECTRA-style pre-training via replaced-token detection (RTD) with gradient-disentangled embedding sharing. per detector on the pooled validation set Dval = v… view at source ↗

**Figure 3.** Figure 3: Baseline robustness under distribution shift (BA). Balanced accuracy (BA) of BERT and ROBERTA under a fixed decision threshold calibrated on HC3 PLUS validation and held constant across targets. Results are shown for HC3 PLUS (test qa, test si), M4 English domains, and AI-TEXT-DETECTIONPILE. thresholds fixed from HC3 PLUS validation and report BA together with class-wise recall and additional metrics ( view at source ↗

**Figure 4.** Figure 4: Effect of feature augmentation under distribution shift (BA). Balanced accuracy (BA) of BERT versus BERT+FEATATTN under the same fixed threshold calibrated on HC3 PLUS validation. We report in-domain performance on HC3 PLUS (test qa, test si), domain-level transfer on M4, and external evaluation on AI-TEXT-DETECTION-PILE. Feature augmentation improves robustness across all targets, with the largest gain … view at source ↗

read the original abstract

AI-generated text is nowadays produced at scale across domains and heterogeneous generation pipelines, making robustness to distribution shift a central requirement for supervised binary detectors. We train transformer-based detectors on HC3 PLUS and calibrate a single decision threshold by maximising balanced accuracy on held-out validation; this threshold is then kept fixed for all downstream test distributions, revealing domain- and generator-dependent error asymmetries under shift. We evaluate in-domain on HC3 PLUS, under cross-dataset transfer to the multi-domain, multi-generator M4 benchmark, and on the external AI-Text-Detection-Pile. Although base models achieve near-ceiling in-domain performance (up to 99.5% balanced accuracy), performance under shift is brittle and strongly model-dependent. Feature augmentation via attention-based linguistic feature fusion improves transfer, with our best model (DeBERTa-v3-base+FeatAttn) achieving 85.9% balanced accuracy on M4. Multi-seed experiments confirm high stability. Under the same fixed-threshold protocol, our model outperforms strong zero-shot baselines by up to +7.22 points. Category-level ablations further show that readability and vocabulary features contribute most to robustness under shift. Overall, these results demonstrate that feature augmentation and a modern DeBERTa backbone significantly outperform earlier BERT/RoBERTa models, while the fixed-threshold protocol provides a more realistic and informative assessment of practical detector robustness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes training transformer-based AI-text detectors on HC3 PLUS, calibrating a single decision threshold by maximizing balanced accuracy on held-out validation data, and then applying this fixed threshold uniformly to in-domain HC3 PLUS evaluation, cross-dataset transfer on the multi-domain/multi-generator M4 benchmark, and the external AI-Text-Detection-Pile. The central claim is that attention-based fusion of linguistic features (readability and vocabulary) with a DeBERTa-v3-base backbone yields 85.9% balanced accuracy on M4, with multi-seed stability and category ablations showing these features drive robustness under shift; under the fixed-threshold protocol the best model outperforms strong zero-shot baselines by up to +7.22 points.

Significance. If the evaluation protocol is shown to be fair, the work offers a concrete, practical advance in supervised detection by demonstrating that lightweight linguistic feature augmentation can improve transferability where base transformers degrade. The fixed-threshold protocol, multi-seed checks, and feature-category ablations are genuine strengths that move beyond in-domain ceiling performance. The result would be useful for practitioners needing detectors that generalize across generators and domains without per-distribution retuning.

major comments (2)

[M4 results paragraph] M4 results paragraph (reporting 85.9% balanced accuracy and +7.22 point gain): the outperformance claim over zero-shot baselines (perplexity, watermark, etc.) rests on applying the single HC3-validation-derived threshold to all models. Because zero-shot detectors produce scores whose location and scale can differ from the supervised logit distribution, this shared threshold may place the baselines at a non-optimal operating point. The manuscript should report baseline performance when each is given its own threshold (either standard or calibrated on a small M4 hold-out) to confirm the reported gain is not an artifact of the protocol.
[Methods section describing the fixed-threshold protocol] Methods section describing the fixed-threshold protocol: the claim that the protocol provides a 'more realistic and informative assessment' is load-bearing for the robustness narrative, yet no analysis is given of how the threshold interacts with the score distributions of the zero-shot baselines. A short sensitivity study (e.g., sweeping the threshold around the HC3 optimum and plotting balanced-accuracy curves for each baseline) would directly address whether the +7.22 point margin is stable.

minor comments (2)

[Methods] The notation and integration details for the FeatAttn module (attention-based linguistic feature fusion) are not fully specified; a diagram or explicit equations showing how readability/vocabulary vectors are projected and attended with the transformer hidden states would improve reproducibility.
[Ablation tables/figures] Table or figure captions for the category-level ablations should explicitly state the number of seeds and whether error bars represent standard deviation or standard error.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which help clarify the strengths and potential limitations of our evaluation protocol. We address each major comment point by point below and commit to revisions that directly respond to the concerns while preserving the core contribution of the fixed-threshold assessment.

read point-by-point responses

Referee: [M4 results paragraph] M4 results paragraph (reporting 85.9% balanced accuracy and +7.22 point gain): the outperformance claim over zero-shot baselines (perplexity, watermark, etc.) rests on applying the single HC3-validation-derived threshold to all models. Because zero-shot detectors produce scores whose location and scale can differ from the supervised logit distribution, this shared threshold may place the baselines at a non-optimal operating point. The manuscript should report baseline performance when each is given its own threshold (either standard or calibrated on a small M4 hold-out) to confirm the reported gain is not an artifact of the protocol.

Authors: We appreciate the referee's point on score distribution differences. The fixed-threshold protocol is deliberately chosen to reflect realistic deployment, where a detector is calibrated once on source validation data and deployed to unknown target distributions without access to target labels for recalibration. Calibrating zero-shot baselines on M4 would violate this constraint and overstate their practical performance. Nevertheless, we agree that an auxiliary comparison with per-baseline optimal thresholds on M4 would be informative. In the revision we will add a new paragraph and table in the M4 results section reporting balanced accuracy for each zero-shot baseline when its threshold is calibrated on a small M4 hold-out (e.g., 10% split). This will allow readers to see both the fixed-threshold results (our primary protocol) and the per-model upper-bound results, thereby confirming that the reported gains are not solely an artifact of the shared threshold. revision: yes
Referee: [Methods section describing the fixed-threshold protocol] Methods section describing the fixed-threshold protocol: the claim that the protocol provides a 'more realistic and informative assessment' is load-bearing for the robustness narrative, yet no analysis is given of how the threshold interacts with the score distributions of the zero-shot baselines. A short sensitivity study (e.g., sweeping the threshold around the HC3 optimum and plotting balanced-accuracy curves for each baseline) would directly address whether the +7.22 point margin is stable.

Authors: We agree that a sensitivity analysis would strengthen the justification for the fixed-threshold protocol. We will add a new figure and accompanying text (in the Methods or a dedicated subsection of Results) that sweeps the decision threshold in a neighborhood of the HC3 optimum (e.g., 0.3 to 0.7 in 0.05 increments) and plots balanced-accuracy curves for our feature-augmented model as well as the zero-shot baselines on M4. This will directly illustrate the stability of the performance margin and the interaction between threshold choice and each method's score distribution, addressing the concern about whether the +7.22 point advantage is robust. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results measured on distinct external test sets

full rationale

The paper trains models on HC3 PLUS, calibrates a single threshold by maximizing balanced accuracy on a held-out validation split from the same corpus, and then applies that fixed threshold to entirely separate external benchmarks (M4 and AI-Text-Detection-Pile). The reported 85.9% balanced accuracy and +7.22-point gains are direct measurements on these held-out distributions; they are not quantities defined by construction from the training data or the calibration step. No equations, self-citations, ansatzes, or uniqueness theorems are present that would collapse any claimed result back to its inputs. The fixed-threshold protocol is an explicit methodological choice for realism under distribution shift and does not create a self-referential loop. The evaluation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on empirical performance gains from attention-based fusion of linguistic features under distribution shift. The only explicitly fitted element is the decision threshold; the key domain assumption is that readability and vocabulary statistics remain useful signals even when domains and generators change.

free parameters (1)

decision threshold
Single scalar chosen by maximizing balanced accuracy on held-out validation split of HC3 PLUS; kept fixed for all test distributions.

axioms (1)

domain assumption Linguistic features such as readability and vocabulary statistics remain informative for distinguishing AI-generated text even under domain and generator distribution shifts.
This assumption underpins the claim that feature augmentation improves transfer performance on M4 and external benchmarks.

pith-pipeline@v0.9.0 · 5549 in / 1613 out tokens · 96963 ms · 2026-05-07T03:46:10.236918+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

9 extracted references · 4 canonical work pages · 1 internal anchor

[1]

URL https://proceedings.mlr.press/ v235/chakraborty24a.html. Chen, C. and Wang, J.-K. Online detection of LLM- generated texts via sequential hypothesis testing by bet- ting. In Singh, A., Fazel, M., Hsu, D., Lacoste-Julien, S., Berkenkamp, F., Maharaj, T., Wagstaff, K., and Zhu, J. (eds.),Proceedings of the 42nd International Conference on Machine Learni...
[2]

InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

URL https://proceedings.mlr.press/ v267/chen25bn.html. Clark, K., Luong, M.-T., Le, Q. V ., and Manning, C. D. ELECTRA: Pre-training text encoders as discrimina- tors rather than generators. InInternational Confer- ence on Learning Representations, 2020. URL https: //openreview.net/forum?id=r1xMH1BtvB. Cover, T. M. and Thomas, J. A.Elements of Information...

work page doi:10.18653/v1/ 2020
[3]

How close is ChatGPT to human experts? Comparison corpus, evaluation, and detection

doi: 10.48550/arXiv.2301.07597. URL https: //arxiv.org/abs/2301.07597. Hans, A., Schwarzschild, A., Cherepanova, V ., Kazemi, H., Saha, A., Goldblum, M., Geiping, J., and Goldstein, T. Spotting LLMs with binoculars: Zero-shot detection of machine-generated text. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berken...

work page doi:10.48550/arxiv.2301.07597
[4]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

URL https://proceedings.mlr.press/ v235/hans24a.html. He, P., Gao, J., and Chen, W. DeBERTaV3: Improv- ing DeBERTa using ELECTRA-style pre-training with gradient-disentangled embedding sharing. 2021a. doi: 10.48550/arXiv.2111.09543. URL https://arxiv. org/abs/2111.09543. He, P., Liu, X., Gao, J., and Chen, W. DeBERTa: Decoding- enhanced BERT with disentan...

work page internal anchor Pith review doi:10.48550/arxiv.2111.09543 2019
[5]

URL https://proceedings.mlr.press/ v202/mitchell23a.html. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. PyTorch: An imper- ative style, high-per...

2019
[6]

Su, Z., Wu, X., Zhou, W., Ma, G., and Hu, S

URL https://jmlr.org/papers/v12/ pedregosa11a.html. Su, Z., Wu, X., Zhou, W., Ma, G., and Hu, S. HC3 PLUS: A semantic-invariant human–ChatGPT comparison corpus
[7]

URL https: //arxiv.org/abs/2309.02731

doi: 10.48550/arXiv.2309.02731. URL https: //arxiv.org/abs/2309.02731. Tate, R. F. A note on the correlation between a discrete and a continuous variable.The Annals of Mathemati- cal Statistics, 25(3):603–607, 1954. doi: 10.1214/aoms/ 1177728730. Wang, Y ., Mansurov, J., Ivanov, P., Su, J., Shelmanov, A., Tsvigun, A., Whitehouse, C., Afzal, O. M., Mah- mo...

work page doi:10.48550/arxiv.2309.02731 1954
[8]

Zhang, H., Edelman, B

URL https://proceedings.mlr.press/ v235/wouters24a.html. Zhang, H., Edelman, B. L., Francati, D., Venturi, D., Ate- niese, G., and Barak, B. Watermarks in the sand: Im- possibility of strong watermarking for language mod- els. In Salakhutdinov, R., Kolter, Z., Heller, K., Weller, A., Oliver, N., Scarlett, J., and Berkenkamp, F. (eds.), Proceedings of the ...
[9]

URL https://proceedings.mlr.press/ v235/zhang24o.html. 11 Feature-Augmented Transformers for Robust AI-Text Detection Across Domains and Generators Appendix This appendix provides (i) dataset statistics and breakdowns used in our experiments, and (ii) additional details to facili- tate reproducibility. A. Dataset Statistics A.1. HC3 PLUS: Split Sizes (Eng...

2024