arxiv: 2604.16058 · v1 · submitted 2026-04-17 · 💻 cs.SE · cs.CL

Recognition: unknown

LLMSniffer: Detecting LLM-Generated Code via GraphCodeBERT and Supervised Contrastive Learning

Mahir Labib Dihan , Abir Muhtasim

Authors on Pith no claims yet

Pith reviewed 2026-05-10 08:24 UTC · model grok-4.3

classification 💻 cs.SE cs.CL

keywords LLM-generated code detectionGraphCodeBERTsupervised contrastive learningcode classificationsoftware securityacademic integrityAI ethics

0 comments

The pith

LLMSniffer detects LLM-generated code more accurately by fine-tuning GraphCodeBERT with a two-stage supervised contrastive learning pipeline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces LLMSniffer as a system to distinguish code written by large language models from code written by humans. It adapts GraphCodeBERT through comment removal followed by two stages of supervised contrastive learning and then feeds the resulting embeddings into a multilayer perceptron classifier. On the GPTSniffer benchmark the method raises accuracy from 70 percent to 78 percent and lifts the F1 score from 68 percent to 78 percent. On the Whodunit benchmark accuracy moves from 91 percent to 94.65 percent with a matching F1 improvement. The authors release checkpoints and an interactive demo so others can test the embeddings directly.

Core claim

LLMSniffer improves detection of LLM-generated code by fine-tuning GraphCodeBERT with a two-stage supervised contrastive learning pipeline after comment removal preprocessing and applying an MLP classifier. This produces accuracy of 78 percent on GPTSniffer (F1 78 percent) and 94.65 percent on Whodunit (F1 94.64 percent), exceeding prior baselines. t-SNE visualizations show that the contrastive stage creates compact, well-separated clusters for human-written and LLM-generated code.

What carries the argument

The two-stage supervised contrastive learning pipeline on GraphCodeBERT, which pulls embeddings of code snippets from the same class closer together while pushing embeddings from different classes farther apart.

If this is right

Accuracy on the GPTSniffer dataset rises by 8 percentage points over earlier detectors.
Accuracy on the Whodunit dataset rises by 3.65 percentage points over earlier detectors.
t-SNE plots confirm tighter intra-class clusters and clearer separation between human and LLM code embeddings.
Public release of the trained model and demo allows direct inspection of the learned representations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Detection performance is likely to decline on outputs from newer LLMs unless the model is periodically retrained on fresh examples.
Comment removal succeeds because comments often carry distinctive human phrasing or LLM-specific artifacts that the contrastive stage can exploit.
The same contrastive pipeline could be applied to detect AI assistance in non-code artifacts such as documentation or test cases.
Embedding-based detection might be inserted into continuous-integration pipelines to flag potential AI contributions during code review.

Load-bearing premise

That the GPTSniffer and Whodunit datasets sufficiently represent real-world distributions of LLM-generated code and that the reported gains will hold for new LLMs or coding styles without retraining.

What would settle it

Running LLMSniffer on code samples produced by an LLM released after the training data or written in a programming style absent from the benchmarks and finding that accuracy drops to or below the prior baseline levels.

Figures

Figures reproduced from arXiv: 2604.16058 by Abir Muhtasim, Mahir Labib Dihan.

**Figure 1.** Figure 1: Overview of LLMSniffer. Stage 1 trains the encoder and projection head with supervised contrastive loss. Stage 2 freezes the encoder and trains an MLP classifier with binary cross-entropy loss. 4.2 Model Architecture LLMSniffer consists of three components (Figure 1): (1) GraphCodeBERT Encoder. After comment removal, the code snippet is tokenized and fed along with its data-flow graph edges into GraphCod… view at source ↗

**Figure 2.** Figure 2: t-SNE of test set [CLS] embeddings (GPTSniffer benchmark). Left (GPTSniffer baseline): The two classes (human=blue, AI=orange) are highly interleaved with no clear decision boundary. Right (LLMSniffer): Contrastive fine-tuning produces a compact, isolated AI-generated cluster (right) that is linearly separable from the human-written cluster, directly explaining the improved classification performance. Conf… view at source ↗

read the original abstract

The rapid proliferation of Large Language Models (LLMs) in software development has made distinguishing AI-generated code from human-written code a critical challenge with implications for academic integrity, code quality assurance, and software security. We present LLMSniffer, a detection framework that fine-tunes GraphCodeBERT using a two-stage supervised contrastive learning pipeline augmented with comment removal preprocessing and an MLP classifier. Evaluated on two benchmark datasets - GPTSniffer and Whodunit - LLMSniffer achieves substantial improvements over prior baselines: accuracy increases from 70% to 78% on GPTSniffer (F1: 68% to 78%) and from 91% to 94.65% on Whodunit (F1: 91% to 94.64%). t-SNE visualizations confirm that contrastive fine-tuning yields well-separated, compact embeddings. We release our model checkpoints, datasets, codes and a live interactive demo to facilitate further research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents LLMSniffer, a detection framework that fine-tunes GraphCodeBERT via a two-stage supervised contrastive learning pipeline, augmented by comment removal preprocessing and an MLP classifier. It reports concrete accuracy and F1 improvements over baselines on the GPTSniffer (70% to 78% accuracy, 68% to 78% F1) and Whodunit (91% to 94.65% accuracy, 91% to 94.64% F1) benchmarks, supported by t-SNE evidence of improved embedding separation, and releases model checkpoints, datasets, code, and a live demo.

Significance. If the reported gains prove robust, the work advances practical methods for distinguishing LLM-generated code, relevant to academic integrity, code quality, and security in software engineering. The explicit release of model checkpoints, datasets, code, and an interactive demo is a clear strength that supports reproducibility and follow-on research.

major comments (2)

[Evaluation] Evaluation section: The accuracy and F1 lifts are reported as single point estimates (e.g., 70% to 78% on GPTSniffer) with no error bars, standard deviations across runs, ablation studies isolating the contrastive loss or comment removal, or statistical significance tests; this directly undermines confidence that the gains are reliable rather than dataset-specific.
[Methods] Methods section: The two-stage supervised contrastive learning pipeline is described at a high level but lacks concrete details on the contrastive loss formulation, temperature and margin hyperparameters, the exact integration of the MLP classifier, and training schedules; without these, the central claim of improvement via this specific architecture cannot be fully assessed or replicated.

minor comments (1)

[Results] The t-SNE visualizations are presented as qualitative evidence of separation but would benefit from quantitative metrics (e.g., silhouette score or inter-cluster distance) to strengthen the supporting claim.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach rests on standard assumptions of transfer learning and contrastive objectives rather than new axioms or invented entities.

free parameters (2)

contrastive loss temperature and margin
Typical hyperparameters in supervised contrastive learning that must be chosen or tuned; not enumerated in abstract.
MLP hidden sizes and learning rate schedule
Standard fine-tuning choices that affect final accuracy.

axioms (2)

domain assumption Pre-trained GraphCodeBERT embeddings contain transferable features for distinguishing human vs. LLM code after contrastive adaptation.
Invoked by the choice to fine-tune rather than train from scratch.
domain assumption Comment removal preprocessing does not discard task-relevant signals.
Stated as part of the pipeline without justification in abstract.

pith-pipeline@v0.9.0 · 5472 in / 1445 out tokens · 56392 ms · 2026-05-10T08:24:25.912017+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 16 canonical work pages · 5 internal anchors

[1]

T. Chen, S. Kornblith, M. Norouzi, and G. Hinton. A simple framework for contrastive learning of visual representations. In International Conference on Machine Learning (ICML), pages 1597--1607, 2020. URL https://arxiv.org/abs/2002.05709

work page internal anchor Pith review arXiv 2020
[2]

Ding et al

Z. Ding et al. Towards understanding the capability of large language models on code clone detection: A survey. In Proceedings of the International Conference on Software Engineering (ICSE), 2022

2022
[3]

Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou. CodeBERT : A pre-trained model for programming and natural languages. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1536--1547, 2020. doi:10.18653/v1/2020.findings-emnlp.139

work page doi:10.18653/v1/2020.findings-emnlp.139 2020
[4]

others (2024)

S. Gehrmann, H. Strobelt, and A. M. Rush. GLTR : Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 111--116, 2019. doi:10.18653/v1/P19-3019

work page doi:10.18653/v1/p19-3019 2019
[5]

GitHub Copilot : Your AI pair programmer, 2021

GitHub . GitHub Copilot : Your AI pair programmer, 2021. URL https://github.com/features/copilot

2021
[6]

Supervised contrastive learning for pre-trained language model fine-tuning,

B. Gunel, J. Du, A. Conneau, and V. Stoyanov. Supervised contrastive learning for pre-trained language model fine-tuning. In International Conference on Learning Representations (ICLR), 2021. URL https://arxiv.org/abs/2011.01403

work page arXiv 2021
[7]

D. Guo, S. Ren, S. Lu, Z. Feng, D. Tang, S. Liu, L. Zhou, N. Duan, A. Svyatkovskiy, S. Fu, M. Tufano, S. K. Deng, C. Clement, D. Drain, N. Sundaresan, J. Yin, D. Jiang, and M. Zhou. GraphCodeBERT : Pre-training code representations with data flow. In International Conference on Learning Representations (ICLR), 2021. URL https://openreview.net/forum?id=jLoC4ez43PZ

2021
[8]

Hung et al

C.-Y. Hung et al. Exploring the feasibility of automated detection of LLM -generated code through explainable AI . arXiv preprint, 2024

2024
[9]

CodeSearchNet Challenge: Evaluating the State of Semantic Code Search

H. Husain, H.-H. Wu, T. Gazit, M. Allamanis, and M. Brockschmidt. CodeSearchNet challenge: Evaluating the state of semantic code search. arXiv preprint arXiv:1909.09436, 2019. URL https://arxiv.org/abs/1909.09436

work page internal anchor Pith review arXiv 1909
[10]

Idialu et al

J. Idialu et al. Our students are using ChatGPT : Detecting AI -generated code in programming assignments. arXiv preprint, 2024

2024
[11]

arXiv preprint arXiv:2004.11362 , year=

P. Khosla, Y. Tian, M. Tschannen, J. Lienen, Y. Zhang, L. Jiang, D. Krishnan, and Y. Tian. Supervised contrastive learning. In Advances in Neural Information Processing Systems (NeurIPS), volume 33, pages 18661--18673, 2020. URL https://arxiv.org/abs/2004.11362

work page arXiv 2020
[12]

A watermark for large language models.arXiv preprint arXiv:2301.10226, 2023a

J. Kirchenbauer, J. Geiping, Y. Wen, J. Katz, I. Miers, and T. Goldstein. A watermark for large language models. In International Conference on Machine Learning (ICML), 2023. URL https://arxiv.org/abs/2301.10226

work page arXiv 2023
[13]

Decoupled Weight Decay Regularization

I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations (ICLR), 2019. URL https://arxiv.org/abs/1711.05101

work page internal anchor Pith review Pith/arXiv arXiv 2019
[14]

DetectGPT: Zero-shot machine-generated text detection using proba- bility curvature.arXiv:2301.11305, 2023

E. Mitchell, Y. Lee, A. Khazatsky, C. D. Manning, and C. Finn. DetectGPT : Zero-shot machine-generated text detection using probability curvature. arXiv preprint arXiv:2301.11305, 2023. URL https://arxiv.org/abs/2301.11305

work page arXiv 2023
[15]

P. T. Nguyen et al. Whodunit? classifying code as human authored or GPT-4 generated: A case study on CodeChef problems. arXiv preprint arXiv:2403.04013, 2024 a . URL https://arxiv.org/abs/2403.04013

work page arXiv 2024
[16]

P. T. Nguyen et al. GPTSniffer : Detecting ChatGPT -generated code via fine-tuned language models. Journal of Systems and Software, 2024 b . doi:10.1016/j.jss.2024.112031. URL https://www.sciencedirect.com/science/article/pii/S0164121224001043

work page doi:10.1016/j.jss.2024.112031 2024
[17]

GPT-4 Technical Report

OpenAI . GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023. URL https://arxiv.org/abs/2303.08774

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

R. Puri, D. Kung, G. Janssen, W. Zhang, G. Domeniconi, V. Zolotov, J. Dolby, J. Chen, M. Choudhury, L. Decker, et al. CodeNet : A large-scale AI for code dataset for learning a diversity of coding tasks, 2021. URL https://arxiv.org/abs/2105.12655

work page arXiv 2021
[19]

Radford, J

A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. Language models are unsupervised multitask learners. OpenAI Blog, 1 0 (8): 0 9, 2019. URL https://openai.com/blog/better-language-models

2019
[20]

Code Llama: Open Foundation Models for Code

B. Rozi \`e re, J. Gehring, F. Gloeckle, S. Sootla, I. Gat, X. E. Tan, Y. Adi, J. Liu, T. Remez, J. Rapin, et al. Code Llama : Open foundation models for code. arXiv preprint arXiv:2308.12950, 2023. URL https://arxiv.org/abs/2308.12950

work page internal anchor Pith review arXiv 2023
[21]

Shi et al

H. Shi et al. Zero-shot detection of LLM -generated code. arXiv preprint, 2024

2024
[22]

Emma Strubell, Ananya Ganesh, and Andrew McCallum

I. Solaiman, M. Brundage, J. Clark, A. Askell, A. Herbert-Voss, J. Wu, A. Radford, and J. Wang. Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203, 2019. URL https://arxiv.org/abs/1908.09203

work page arXiv 1908
[23]

van der Maaten and G

L. van der Maaten and G. Hinton. Visualizing data using t-SNE . Journal of Machine Learning Research, 9: 0 2579--2605, 2008. URL https://www.jmlr.org/papers/v9/vandermaaten08a.html

2008
[24]

Y. Wang, W. Wang, S. Joty, and S. C. Hoi. CodeT5 : Identifier-aware unified pre-trained encoder-decoder models for code understanding and generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 8696--8708, 2021. doi:10.18653/v1/2021.emnlp-main.685

work page doi:10.18653/v1/2021.emnlp-main.685 2021
[25]

Ye et al

J. Ye et al. Detecting LLM -generated code: A zero-shot approach. arXiv preprint, 2024

2024
[26]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...
[27]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...