SV-Detect: AI-generated Text Detection with Steering Vectors

Mikhail Vishnyakov; Tatiana Gaintseva

arxiv: 2606.07313 · v1 · pith:JCD7LUCNnew · submitted 2026-06-05 · 💻 cs.CL · cs.AI

SV-Detect: AI-generated Text Detection with Steering Vectors

Mikhail Vishnyakov , Tatiana Gaintseva This is my paper

Pith reviewed 2026-06-27 22:14 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords AI-generated text detectionsteering vectorsdistribution shifthidden representationslanguage model probingmachine text classificationediting robustness

0 comments

The pith

Steering vectors extracted layer by layer from a frozen language model separate human-written from machine-generated text and support accurate detection even after domain changes, model switches, or editing attacks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that at each layer of a frozen language model one can find a direction that best separates human text from machine text. Each input is then summarized by how strongly it aligns with these directions across layers, and a small classifier on those alignments produces the detection score. This approach maintains high performance when the test text comes from new domains, new source models, or has been polished or rewritten by another model. The directions turn out to capture stylistic signals plus additional information not visible from surface features alone. The work therefore frames fake-text detection as a representation-space probing task solved by steering vectors.

Core claim

A detector is built by computing, at every layer of a frozen language model, the direction that best separates hidden representations of human-written text from machine-generated text; each new input is represented by its projection onto these layer-wise directions, and a lightweight classifier trained on the resulting feature vectors yields the final detection score. This construction achieves strong accuracy both in-distribution and under distribution shift across domains, source models, and machine-editing operations such as polishing and rewriting.

What carries the argument

Steering vectors: the set of layer-specific directions in hidden representation space that separate human from machine text, used as projection features for classification.

If this is right

The same layer-wise directions remain informative when the source model or domain changes.
Performance holds after polishing or rewriting edits performed by another model.
The directions capture stylistic cues plus signal beyond surface-level features.
Fake-text detection reduces to finding and using these representation-space directions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the directions are stable across many shifts, they may also help detect text generated by entirely unseen future models.
The method could be tested on distinguishing text from two different machine sources rather than human versus machine.
Layer-wise projections might serve as a lightweight probe for other generation-related properties such as factual consistency.

Load-bearing premise

Stable separating directions exist in the hidden space of the frozen model and remain useful across the distribution shifts tested.

What would settle it

A new test collection that applies a strong editing transformation or domain shift where the layer-wise projection features yield near-random classification accuracy would falsify the claim.

Figures

Figures reproduced from arXiv: 2606.07313 by Mikhail Vishnyakov, Tatiana Gaintseva.

**Figure 2.** Figure 2: Overview of SV-Detect. A frozen LLM is used to extract mean-pooled hidden activations from each layer. These activations are projected onto layer-wise steering vectors, and the resulting cosinesimilarity scores are standardized and passed to a logistic-regression classifier for fake-text detection. written and machine-generated texts, (iii) projecting text representations onto these directions to obtain … view at source ↗

**Figure 3.** Figure 3: Performance on DetectRL. et al., 2025; Chen et al., 2024; Bao et al., 2023; Mitchell et al., 2023), for fair comparison with baseline methods, in all our experiments, the reference model used for activation extraction is a frozen GPT-Neo-2.7B (Black et al., 2021). Texts are tokenized with truncation to a maximum length of 2048 tokens. For each text, we extract the meanpooled hidden representation from ev… view at source ↗

**Figure 4.** Figure 4: Results on MIRAGE and cross-benchmark transfer. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Ablation summary on DetectRL MultiDomain transfer. Bars show mean in-distribution AUROC, mean transfer AUROC, and worst-case transfer AUROC. Logistic regression performs best both as the downstream classifier and as the steering vector construction method, while alternatives often degrade toward chance under transfer. tion C, we compare several alternative backbones of similar scale. The results show t… view at source ↗

**Figure 6.** Figure 6: Consensus tokens across the four LMs. A colored dot indicates that the token appears in that LM’s pooled top-token set. the signal is explained by simple stylistic features, we train a logistic regression on interpretable regexbased counts derived from the logit-lens analysis. The full setup in given in the Appendix Sec. D. These features achieve 76-82 AUROC on MIRAGE and 88-91 AUROC on DetectRL, but the … view at source ↗

**Figure 7.** Figure 7: Steering vector interpretation. 6 Conclusion We introduced SV-Detect, a fake-text detector based on steering vectors extracted from the hidden representations of a frozen language model. By representing each text through its alignment with layer-wise real-vs-fake directions, SV-Detect provides a simple, interpretable alternative to text-level score-based and fully supervised detectors. Experiments show t… view at source ↗

**Figure 9.** Figure 9: Top-pooled tokens per reference LM, sepa [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Detecting machine-generated text is especially difficult under distribution shift, such as transfer across domains, source models, and editing attacks. We propose a fake-text detector based on steering vectors extracted from the hidden representations of a frozen language model. At each layer, we construct a direction that separates human-written from machine-generated text, and represent each input by its layer-wise alignment with these directions. A lightweight classifier trained on these projection features yields the final detection score. Our method achieves strong performance both in-distribution and under distribution shift, including across domains, source models, and machine-editing transformations such as polishing and rewriting. Interpretation analyses show that the learned directions align with recognizable stylistic cues while capturing substantial additional signal beyond surface features. These results position fake-text detection as a representation-space probing problem and show that steering vectors provide a simple and effective solution.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The layer-wise steering vector idea for shift-robust detection is a clean framing but the abstract gives no numbers or setup details to judge whether it actually works.

read the letter

The main contribution is extracting a separating direction at each layer of a frozen LM, then feeding the projection features into a lightweight classifier. This is presented as a way to handle domain shifts, model shifts, and editing attacks without retraining the base model. The interpretation section on stylistic cues plus extra signal is a reasonable addition if the analyses are there.

What stands out is the explicit focus on distribution shift as the core problem rather than just in-distribution accuracy. That matches real deployment needs. The method stays simple and avoids heavy fine-tuning, which is practical.

The soft spot is obvious from the abstract alone: no datasets, no baselines, no error bars, no description of how the directions are computed, and no quantitative results. The claim of strong cross-shift performance is the load-bearing one, yet nothing is shown to support it. Without the full experiments it is impossible to tell whether the directions generalize or whether the classifier just fits the training distribution.

The full manuscript was not available for this read, so I cannot check the actual tables, the exact construction of the vectors, or whether the cross-editing results hold up. If those sections exist and look clean, the paper is worth referee time. If they are missing or weak, it is not.

This is for people building detectors that need to survive new generators and edits. A reader already working on representation-based methods would see the most value. I would send it to review if the experiments are solid and reproducible; otherwise desk reject.

Referee Report

1 major / 0 minor

Summary. The paper proposes SV-Detect, a detector for AI-generated text that extracts steering vectors from the hidden representations of a frozen language model. At each layer a separating direction between human and machine text is constructed; each input is represented by its layer-wise alignment with these directions; and a lightweight classifier is trained on the resulting projection features to yield the detection score. The central claim is that this yields strong performance both in-distribution and under distribution shift (domains, source models, polishing/rewriting edits), while the learned directions align with stylistic cues yet capture additional signal.

Significance. If the results and the existence of consistent cross-shift separating directions are substantiated, the work would be significant: it reframes fake-text detection as a representation-probing task and supplies a simple, potentially generalizable alternative to standard supervised classifiers that often fail under shift. The interpretability analysis is a secondary strength if it is shown to go beyond surface features.

major comments (1)

[Abstract] Abstract: the claim of 'strong performance both in-distribution and under distribution shift' is stated without any metrics, baselines, datasets, error bars, or experimental protocol, rendering the central claim unevaluable from the supplied text.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and the positive assessment of the work's potential significance. We address the single major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'strong performance both in-distribution and under distribution shift' is stated without any metrics, baselines, datasets, error bars, or experimental protocol, rendering the central claim unevaluable from the supplied text.

Authors: We agree that the abstract would be strengthened by including concrete quantitative details to support the performance claims. In the revised manuscript we will update the abstract to report key metrics (e.g., accuracy or AUC on in-distribution and out-of-distribution settings), name the primary datasets and source models, reference the main baselines, and briefly note the experimental protocol and error-bar reporting. This change will make the central claim directly evaluable while preserving the abstract's length and readability. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The described approach extracts separating directions from hidden representations of a frozen LM (standard mean-difference or similar construction on labeled data), projects inputs onto those directions, and trains a lightweight classifier on the resulting features. This is a conventional supervised pipeline on derived representations with no equations or steps that reduce the final detection score to the inputs by definition, no fitted parameters renamed as predictions, and no load-bearing self-citations or uniqueness theorems invoked. The central performance claims under distribution shift are empirically testable against external benchmarks and do not collapse into self-referential constructions.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Based on abstract only; the approach assumes existence of stable separating directions and relies on standard ML training without detailing free parameters or new entities.

free parameters (1)

lightweight classifier parameters
Parameters of the final classifier are fitted to the projection features from training data.

axioms (1)

domain assumption Existence of layer-wise directions separating human and machine text
Method constructs and uses these directions at each layer as the core representation.

pith-pipeline@v0.9.1-grok · 5666 in / 1081 out tokens · 20785 ms · 2026-06-27T22:14:57.422579+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Representation Engineering: A Top-Down Approach to AI Transparency

Andy Zou and Long Phan and Sarah Li Chen and James Campbell and Phillip Guo and Richard Ren and Alexander Pan and Xuwang Yin and Mantas Mazeika and Ann. Representation Engineering:. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2310.01405 , eprinttype =. 2310.01405 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.01405 2023
[2]

AI-generated text detection:

Tanzila Kehkashan and Raja Adil Riaz and Ahmad Sami Al. AI-generated text detection:. Comput. Sci. Rev. , volume =. 2025 , url =. doi:10.1016/J.COSREV.2025.100793 , timestamp =

work page doi:10.1016/j.cosrev.2025.100793 2025
[3]

Wong and Shu Yang and Xinyi Yang and Yulin Yuan and Lidia S

Junchao Wu and Runzhe Zhan and Derek F. Wong and Shu Yang and Xinyi Yang and Yulin Yuan and Lidia S. Chao , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2410.23746 , eprinttype =. 2410.23746 , timestamp =

work page doi:10.48550/arxiv.2410.23746 2024
[4]

DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models , journal =

Jiachen Fu and Chun. DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models , journal =. 2025 , url =. doi:10.48550/ARXIV.2509.14268 , eprinttype =. 2509.14268 , timestamp =

work page doi:10.48550/arxiv.2509.14268 2025
[5]

CoRR , volume =

Yuxia Wang and Artem Shelmanov and Jonibek Mansurov and Akim Tsvigun and Vladislav Mikhailov and Rui Xing and Zhuohan Xie and Jiahui Geng and Giovanni Puccetti and Ekaterina Artemova and Jinyan Su and Minh Ngoc Ta and Mervat Abassy and Kareem Ashraf Elozeiri and Saad El Dine Ahmed El Etter and Maiya Goloburda and Tarek Mahmoud and Raj Vardhan Tomar and Nu...

work page doi:10.48550/arxiv.2501.11012 2025
[6]

CoRR , volume =

Jiaqi Chen and Xiaoye Zhu and Tianyang Liu and Ying Chen and Xinhui Chen and Yiwen Yuan and Chak Tou Leong and Zuchao Li and Tang Long and Lei Zhang and Chenyu Yan and Guanghao Mei and Jie Zhang and Lefei Zhang , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2412.10432 , eprinttype =. 2412.10432 , timestamp =

work page doi:10.48550/arxiv.2412.10432 2024
[7]

CoRR , volume =

Guangsheng Bao and Yanbin Zhao and Zhiyang Teng and Linyi Yang and Yue Zhang , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2310.05130 , eprinttype =. 2310.05130 , timestamp =

work page doi:10.48550/arxiv.2310.05130 2023
[8]

Behnam Mohammadi

Eric Mitchell and Yoonho Lee and Alexander Khazatsky and Christopher D. Manning and Chelsea Finn , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2301.11305 , eprinttype =. 2301.11305 , timestamp =

work page doi:10.48550/arxiv.2301.11305 2023
[9]

arXiv:2401.12070

Abhimanyu Hans and Avi Schwarzschild and Valeriia Cherepanova and Hamid Kazemi and Aniruddha Saha and Micah Goldblum and Jonas Geiping and Tom Goldstein , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.12070 , eprinttype =. 2401.12070 , timestamp =

work page doi:10.48550/arxiv.2401.12070 2024
[10]

Petzold and William Yang Wang and Haifeng Chen , title =

Xianjun Yang and Wei Cheng and Linda R. Petzold and William Yang Wang and Haifeng Chen , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2305.17359 , eprinttype =. 2305.17359 , timestamp =

work page doi:10.48550/arxiv.2305.17359 2023
[11]

CoRR , volume =

Jinyan Su and Terry Yue Zhuo and Di Wang and Preslav Nakov , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2306.05540 , eprinttype =. 2306.05540 , timestamp =

work page doi:10.48550/arxiv.2306.05540 2023
[12]

CoRR , volume =

Laida Kushnareva and Daniil Cherniavskii and Vladislav Mikhailov and Ekaterina Artemova and Serguei Barannikov and Alexander Bernstein and Irina Piontkovskaya and Dmitri Piontkovski and Evgeny Burnaev , title =. CoRR , volume =. 2021 , url =. 2109.04825 , timestamp =

arXiv 2021
[13]

Nikolenko and Irina Piontkovskaya , title =

Kristian Kuznetsov and Eduard Tulchinskii and Laida Kushnareva and German Magai and Serguei Barannikov and Sergey I. Nikolenko and Irina Piontkovskaya , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2410.08113 , eprinttype =. 2410.08113 , timestamp =

work page doi:10.48550/arxiv.2410.08113 2024
[14]

URLhttps://aclanthology

Junchao Wu and Shu Yang and Runzhe Zhan and Yulin Yuan and Lidia S. Chao and Derek Fai Wong , title =. Comput. Linguistics , volume =. 2025 , url =. doi:10.1162/COLI\_A\_00549 , timestamp =

work page doi:10.1162/coli 2025
[15]

Testing of Detection Tools for AI-Generated Text , journal =

Debora Weber. Testing of Detection Tools for AI-Generated Text , journal =. 2023 , url =. doi:10.48550/ARXIV.2306.15666 , eprinttype =. 2306.15666 , timestamp =

work page doi:10.48550/arxiv.2306.15666 2023
[16]

Automatic Detection of Machine Generated Text:

Ganesh Jawahar and Muhammad Abdul. Automatic Detection of Machine Generated Text:. Proceedings of the 28th International Conference on Computational Linguistics,. 2020 , url =. doi:10.18653/V1/2020.COLING-MAIN.208 , timestamp =

work page doi:10.18653/v1/2020.coling-main.208 2020
[17]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

Yafu Li and Qintong Li and Leyang Cui and Wei Bi and Zhilin Wang and Longyue Wang and Linyi Yang and Shuming Shi and Yue Zhang , editor =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.3 , timestamp =

work page doi:10.18653/v1/2024.acl-long.3 2024
[18]

A Practical Examination of AI-Generated Text Detectors for Large Language Models , booktitle =

Brian Tufts and Xuandong Zhao and Lei Li , editor =. A Practical Examination of AI-Generated Text Detectors for Large Language Models , booktitle =. 2025 , url =. doi:10.18653/V1/2025.FINDINGS-NAACL.271 , timestamp =

work page doi:10.18653/v1/2025.findings-naacl.271 2025
[19]

2024 , eprint=

AI-generated text boundary detection with RoFT , author=. 2024 , eprint=

2024
[20]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics,

Yuxia Wang and Jonibek Mansurov and Petar Ivanov and Jinyan Su and Artem Shelmanov and Akim Tsvigun and Chenxi Whitehouse and Osama Mohammed Afzal and Tarek Mahmoud and Toru Sasaki and Thomas Arnold and Alham Fikri Aji and Nizar Habash and Iryna Gurevych and Preslav Nakov , editor =. Proceedings of the 18th Conference of the European Chapter of the Associ...

2024
[21]

2020 , month = aug, howpublished =

nostalgebraist , title =. 2020 , month = aug, howpublished =

2020
[22]

G en AI Content Detection Task 1: E nglish and Multilingual Machine-Generated Text Detection: AI vs

Wang, Yuxia and Shelmanov, Artem and Mansurov, Jonibek and Tsvigun, Akim and Mikhailov, Vladislav and Xing, Rui and Xie, Zhuohan and Geng, Jiahui and Puccetti, Giovanni and Artemova, Ekaterina and Su, Jinyan and Ta, Minh Ngoc and Abassy, Mervat and Elozeiri, Kareem Ashraf and El Etter, Saad El Dine Ahmed and Goloburda, Maiya and Mahmoud, Tarek and Tomar, ...

2025
[23]

Release Strategies and the Social Impacts of Language Models , journal =

Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert. Release Strategies and the Social Impacts of Language Models , journal =. 2019 , url =. 1908.09203 , timestamp =

Pith/arXiv arXiv 2019
[24]

Detecting Fake Content with Relative Entropy Scoring , booktitle =

Thomas Lavergne and Tanguy Urvoy and Fran. Detecting Fake Content with Relative Entropy Scoring , booktitle =. 2008 , url =

2008
[25]

Rush , title =

Sebastian Gehrmann and Hendrik Strobelt and Alexander M. Rush , title =. CoRR , volume =. 2019 , url =. 1906.04043 , timestamp =

Pith/arXiv arXiv 2019
[26]

Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT , booktitle =

Biru Zhu and Lifan Yuan and Ganqu Cui and Yangyi Chen and Chong Fu and Bingxiang He and Yangdong Deng and Zhiyuan Liu and Maosong Sun and Ming Gu , editor =. Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.463 , timestamp =

work page doi:10.18653/v1/2023.emnlp-main.463 2023
[27]

CoRR , volume =

Sungjoon Park and Jihyung Moon and Sungdong Kim and Won. CoRR , volume =. 2021 , url =. 2105.09680 , timestamp =

arXiv 2021
[28]

Unsupervised Cross-lingual Representation Learning at Scale , journal =

Alexis Conneau and Kartikay Khandelwal and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm. Unsupervised Cross-lingual Representation Learning at Scale , journal =. 2019 , url =. 1911.02116 , timestamp =

Pith/arXiv arXiv 2019
[29]

Automatic Detection of Generated Text is Easiest when Humans are Fooled , booktitle =

Daphne Ippolito and Daniel Duckworth and Chris Callison. Automatic Detection of Generated Text is Easiest when Humans are Fooled , booktitle =. 2020 , url =. doi:10.18653/V1/2020.ACL-MAIN.164 , timestamp =

work page doi:10.18653/v1/2020.acl-main.164 2020
[30]

CoRR , volume =

Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , title =. CoRR , volume =. 2019 , url =. 1907.11692 , timestamp =

Pith/arXiv arXiv 2019
[31]

2021 , howpublished=

GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , author=. 2021 , howpublished=

2021
[32]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025
[33]

2025 , eprint=

Gemma 3 Technical Report , author=. 2025 , eprint=

2025

[1] [1]

Representation Engineering: A Top-Down Approach to AI Transparency

Andy Zou and Long Phan and Sarah Li Chen and James Campbell and Phillip Guo and Richard Ren and Alexander Pan and Xuwang Yin and Mantas Mazeika and Ann. Representation Engineering:. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2310.01405 , eprinttype =. 2310.01405 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.01405 2023

[2] [2]

AI-generated text detection:

Tanzila Kehkashan and Raja Adil Riaz and Ahmad Sami Al. AI-generated text detection:. Comput. Sci. Rev. , volume =. 2025 , url =. doi:10.1016/J.COSREV.2025.100793 , timestamp =

work page doi:10.1016/j.cosrev.2025.100793 2025

[3] [3]

Wong and Shu Yang and Xinyi Yang and Yulin Yuan and Lidia S

Junchao Wu and Runzhe Zhan and Derek F. Wong and Shu Yang and Xinyi Yang and Yulin Yuan and Lidia S. Chao , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2410.23746 , eprinttype =. 2410.23746 , timestamp =

work page doi:10.48550/arxiv.2410.23746 2024

[4] [4]

DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models , journal =

Jiachen Fu and Chun. DetectAnyLLM: Towards Generalizable and Robust Detection of Machine-Generated Text Across Domains and Models , journal =. 2025 , url =. doi:10.48550/ARXIV.2509.14268 , eprinttype =. 2509.14268 , timestamp =

work page doi:10.48550/arxiv.2509.14268 2025

[5] [5]

CoRR , volume =

Yuxia Wang and Artem Shelmanov and Jonibek Mansurov and Akim Tsvigun and Vladislav Mikhailov and Rui Xing and Zhuohan Xie and Jiahui Geng and Giovanni Puccetti and Ekaterina Artemova and Jinyan Su and Minh Ngoc Ta and Mervat Abassy and Kareem Ashraf Elozeiri and Saad El Dine Ahmed El Etter and Maiya Goloburda and Tarek Mahmoud and Raj Vardhan Tomar and Nu...

work page doi:10.48550/arxiv.2501.11012 2025

[6] [6]

CoRR , volume =

Jiaqi Chen and Xiaoye Zhu and Tianyang Liu and Ying Chen and Xinhui Chen and Yiwen Yuan and Chak Tou Leong and Zuchao Li and Tang Long and Lei Zhang and Chenyu Yan and Guanghao Mei and Jie Zhang and Lefei Zhang , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2412.10432 , eprinttype =. 2412.10432 , timestamp =

work page doi:10.48550/arxiv.2412.10432 2024

[7] [7]

CoRR , volume =

Guangsheng Bao and Yanbin Zhao and Zhiyang Teng and Linyi Yang and Yue Zhang , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2310.05130 , eprinttype =. 2310.05130 , timestamp =

work page doi:10.48550/arxiv.2310.05130 2023

[8] [8]

Behnam Mohammadi

Eric Mitchell and Yoonho Lee and Alexander Khazatsky and Christopher D. Manning and Chelsea Finn , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2301.11305 , eprinttype =. 2301.11305 , timestamp =

work page doi:10.48550/arxiv.2301.11305 2023

[9] [9]

arXiv:2401.12070

Abhimanyu Hans and Avi Schwarzschild and Valeriia Cherepanova and Hamid Kazemi and Aniruddha Saha and Micah Goldblum and Jonas Geiping and Tom Goldstein , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.12070 , eprinttype =. 2401.12070 , timestamp =

work page doi:10.48550/arxiv.2401.12070 2024

[10] [10]

Petzold and William Yang Wang and Haifeng Chen , title =

Xianjun Yang and Wei Cheng and Linda R. Petzold and William Yang Wang and Haifeng Chen , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2305.17359 , eprinttype =. 2305.17359 , timestamp =

work page doi:10.48550/arxiv.2305.17359 2023

[11] [11]

CoRR , volume =

Jinyan Su and Terry Yue Zhuo and Di Wang and Preslav Nakov , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2306.05540 , eprinttype =. 2306.05540 , timestamp =

work page doi:10.48550/arxiv.2306.05540 2023

[12] [12]

CoRR , volume =

Laida Kushnareva and Daniil Cherniavskii and Vladislav Mikhailov and Ekaterina Artemova and Serguei Barannikov and Alexander Bernstein and Irina Piontkovskaya and Dmitri Piontkovski and Evgeny Burnaev , title =. CoRR , volume =. 2021 , url =. 2109.04825 , timestamp =

arXiv 2021

[13] [13]

Nikolenko and Irina Piontkovskaya , title =

Kristian Kuznetsov and Eduard Tulchinskii and Laida Kushnareva and German Magai and Serguei Barannikov and Sergey I. Nikolenko and Irina Piontkovskaya , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2410.08113 , eprinttype =. 2410.08113 , timestamp =

work page doi:10.48550/arxiv.2410.08113 2024

[14] [14]

URLhttps://aclanthology

Junchao Wu and Shu Yang and Runzhe Zhan and Yulin Yuan and Lidia S. Chao and Derek Fai Wong , title =. Comput. Linguistics , volume =. 2025 , url =. doi:10.1162/COLI\_A\_00549 , timestamp =

work page doi:10.1162/coli 2025

[15] [15]

Testing of Detection Tools for AI-Generated Text , journal =

Debora Weber. Testing of Detection Tools for AI-Generated Text , journal =. 2023 , url =. doi:10.48550/ARXIV.2306.15666 , eprinttype =. 2306.15666 , timestamp =

work page doi:10.48550/arxiv.2306.15666 2023

[16] [16]

Automatic Detection of Machine Generated Text:

Ganesh Jawahar and Muhammad Abdul. Automatic Detection of Machine Generated Text:. Proceedings of the 28th International Conference on Computational Linguistics,. 2020 , url =. doi:10.18653/V1/2020.COLING-MAIN.208 , timestamp =

work page doi:10.18653/v1/2020.coling-main.208 2020

[17] [17]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),

Yafu Li and Qintong Li and Leyang Cui and Wei Bi and Zhilin Wang and Longyue Wang and Linyi Yang and Shuming Shi and Yue Zhang , editor =. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers),. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.3 , timestamp =

work page doi:10.18653/v1/2024.acl-long.3 2024

[18] [18]

A Practical Examination of AI-Generated Text Detectors for Large Language Models , booktitle =

Brian Tufts and Xuandong Zhao and Lei Li , editor =. A Practical Examination of AI-Generated Text Detectors for Large Language Models , booktitle =. 2025 , url =. doi:10.18653/V1/2025.FINDINGS-NAACL.271 , timestamp =

work page doi:10.18653/v1/2025.findings-naacl.271 2025

[19] [19]

2024 , eprint=

AI-generated text boundary detection with RoFT , author=. 2024 , eprint=

2024

[20] [20]

Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics,

Yuxia Wang and Jonibek Mansurov and Petar Ivanov and Jinyan Su and Artem Shelmanov and Akim Tsvigun and Chenxi Whitehouse and Osama Mohammed Afzal and Tarek Mahmoud and Toru Sasaki and Thomas Arnold and Alham Fikri Aji and Nizar Habash and Iryna Gurevych and Preslav Nakov , editor =. Proceedings of the 18th Conference of the European Chapter of the Associ...

2024

[21] [21]

2020 , month = aug, howpublished =

nostalgebraist , title =. 2020 , month = aug, howpublished =

2020

[22] [22]

G en AI Content Detection Task 1: E nglish and Multilingual Machine-Generated Text Detection: AI vs

Wang, Yuxia and Shelmanov, Artem and Mansurov, Jonibek and Tsvigun, Akim and Mikhailov, Vladislav and Xing, Rui and Xie, Zhuohan and Geng, Jiahui and Puccetti, Giovanni and Artemova, Ekaterina and Su, Jinyan and Ta, Minh Ngoc and Abassy, Mervat and Elozeiri, Kareem Ashraf and El Etter, Saad El Dine Ahmed and Goloburda, Maiya and Mahmoud, Tarek and Tomar, ...

2025

[23] [23]

Release Strategies and the Social Impacts of Language Models , journal =

Irene Solaiman and Miles Brundage and Jack Clark and Amanda Askell and Ariel Herbert. Release Strategies and the Social Impacts of Language Models , journal =. 2019 , url =. 1908.09203 , timestamp =

Pith/arXiv arXiv 2019

[24] [24]

Detecting Fake Content with Relative Entropy Scoring , booktitle =

Thomas Lavergne and Tanguy Urvoy and Fran. Detecting Fake Content with Relative Entropy Scoring , booktitle =. 2008 , url =

2008

[25] [25]

Rush , title =

Sebastian Gehrmann and Hendrik Strobelt and Alexander M. Rush , title =. CoRR , volume =. 2019 , url =. 1906.04043 , timestamp =

Pith/arXiv arXiv 2019

[26] [26]

Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT , booktitle =

Biru Zhu and Lifan Yuan and Ganqu Cui and Yangyi Chen and Chong Fu and Bingxiang He and Yangdong Deng and Zhiyuan Liu and Maosong Sun and Ming Gu , editor =. Beat LLMs at Their Own Game: Zero-Shot LLM-Generated Text Detection via Querying ChatGPT , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.463 , timestamp =

work page doi:10.18653/v1/2023.emnlp-main.463 2023

[27] [27]

CoRR , volume =

Sungjoon Park and Jihyung Moon and Sungdong Kim and Won. CoRR , volume =. 2021 , url =. 2105.09680 , timestamp =

arXiv 2021

[28] [28]

Unsupervised Cross-lingual Representation Learning at Scale , journal =

Alexis Conneau and Kartikay Khandelwal and Naman Goyal and Vishrav Chaudhary and Guillaume Wenzek and Francisco Guzm. Unsupervised Cross-lingual Representation Learning at Scale , journal =. 2019 , url =. 1911.02116 , timestamp =

Pith/arXiv arXiv 2019

[29] [29]

Automatic Detection of Generated Text is Easiest when Humans are Fooled , booktitle =

Daphne Ippolito and Daniel Duckworth and Chris Callison. Automatic Detection of Generated Text is Easiest when Humans are Fooled , booktitle =. 2020 , url =. doi:10.18653/V1/2020.ACL-MAIN.164 , timestamp =

work page doi:10.18653/v1/2020.acl-main.164 2020

[30] [30]

CoRR , volume =

Yinhan Liu and Myle Ott and Naman Goyal and Jingfei Du and Mandar Joshi and Danqi Chen and Omer Levy and Mike Lewis and Luke Zettlemoyer and Veselin Stoyanov , title =. CoRR , volume =. 2019 , url =. 1907.11692 , timestamp =

Pith/arXiv arXiv 2019

[31] [31]

2021 , howpublished=

GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow , author=. 2021 , howpublished=

2021

[32] [32]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

2025

[33] [33]

2025 , eprint=

Gemma 3 Technical Report , author=. 2025 , eprint=

2025