pith. sign in

arxiv: 2510.01801 · v2 · submitted 2025-10-02 · 💻 cs.CL

Detecting LLM-Generated Spam Reviews by Integrating Language Model Embeddings and Graph Neural Network

Pith reviewed 2026-05-18 10:38 UTC · model grok-4.3

classification 💻 cs.CL
keywords spam detectionLLM-generated reviewsgraph neural networksreview fraudhybrid detectionsynthetic datasetsonline platform security
0
0 comments X

The pith

FraudSquad detects LLM-generated spam reviews by combining language model embeddings with a gated graph transformer to capture semantic and behavioral signals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper first builds three new datasets of spam reviews produced by different large language models, each prompted with real product details and genuine reviews to mimic realistic deception. It then presents FraudSquad as a hybrid classifier that feeds text embeddings from a pre-trained language model into a gated graph transformer, allowing the model to classify individual reviews as spam by jointly modeling content and the review network structure. This design avoids manual feature engineering and large labeled sets while still delivering large gains over prior detectors. A sympathetic reader would care because persuasive machine-written reviews now threaten the reliability of product feedback on major platforms, and a lightweight method that works on both synthetic and human spam could offer a practical defense.

Core claim

The central claim is that integrating pre-trained language model embeddings with a gated graph transformer in FraudSquad enables effective spam node classification by capturing both semantic content and behavioral connections in review graphs, without manual features or massive training resources, and yields up to 44.22 percent higher precision and 43.01 percent higher recall than state-of-the-art baselines on three LLM-generated spam datasets while remaining effective on human-written spam data.

What carries the argument

FraudSquad, the hybrid model that integrates text embeddings from a pre-trained language model with a gated graph transformer for spam node classification in review graphs.

If this is right

  • FraudSquad outperforms state-of-the-art baselines by up to 44.22% in precision and 43.01% in recall on the three LLM-generated datasets.
  • The model also produces promising results when tested on two separate human-written spam datasets.
  • FraudSquad requires only a modest model size and minimal labeled training data for practical deployment.
  • The new synthetic datasets demonstrate high persuasion and deceptive potential according to GPT-4.1 evaluations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The graph component may allow the detector to identify coordinated spam campaigns even when individual reviews appear human-like.
  • Similar embedding-plus-graph architectures could extend to detecting LLM-generated fake news or social media comments.
  • If behavioral signals remain useful against machine-generated text, platforms might shift focus from pure linguistic analysis toward network patterns.

Load-bearing premise

The three synthetic datasets, built by guiding LLMs with product metadata and genuine reference reviews, accurately represent the deceptive and persuasive qualities of real-world LLM-generated spam that would appear on platforms.

What would settle it

Testing FraudSquad on a collection of actual LLM-generated spam reviews scraped from live e-commerce sites and verifying whether the reported precision and recall gains hold.

Figures

Figures reproduced from arXiv: 2510.01801 by Jason Liao, Jiao Sun, Ling Huang, Rongwu Xu, Wei Xu, Xin Liu, Xinyi Jia.

Figure 1
Figure 1. Figure 1: Workflow of LLM-generated review spamming. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evaluation results of LLM-generated and human [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The overall architecture of FraudSquad by integrating LM-enhanced node embedding and graph neural network [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Ablation study on node embeddings (different bars) [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
read the original abstract

The rise of large language models (LLMs) has enabled the generation of highly persuasive spam reviews that closely mimic human writing. These reviews pose significant challenges for existing detection systems and threaten the credibility of online platforms. In this work, we first create three realistic LLM-generated spam review datasets using three distinct LLMs, each guided by product metadata and genuine reference reviews. Evaluations by GPT-4.1 confirm the high persuasion and deceptive potential of these reviews. To address this threat, we propose FraudSquad, a hybrid detection model that integrates text embeddings from a pre-trained language model with a gated graph transformer for spam node classification. FraudSquad captures both semantic and behavioral signals without relying on manual feature engineering or massive training resources. Experiments show that FraudSquad outperforms state-of-the-art baselines by up to 44.22% in precision and 43.01% in recall on three LLM-generated datasets, while also achieving promising results on two human-written spam datasets. Furthermore, FraudSquad maintains a modest model size and requires minimal labeled training data, making it a practical solution for real-world applications. Our contributions include new synthetic datasets, a practical detection framework, and empirical evidence highlighting the urgency of adapting spam detection to the LLM era. Our code and datasets are available at: https://anonymous.4open.science/r/FraudSquad-5389/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces three synthetic datasets of LLM-generated spam reviews created by prompting three different LLMs with product metadata and genuine reference reviews; these are validated for persuasiveness via GPT-4.1. It proposes FraudSquad, a hybrid architecture that fuses embeddings from a pre-trained language model with a gated graph transformer to perform node classification on a review graph, thereby capturing both semantic content and behavioral connectivity signals. Experiments report that FraudSquad outperforms existing baselines by up to 44.22% in precision and 43.01% in recall on the three synthetic datasets and yields promising results on two human-written spam datasets. The method is presented as practical due to its modest size and low labeled-data requirements. Code and datasets are released via an anonymous repository.

Significance. If the empirical claims are substantiated, the work is significant because it supplies the first publicly available synthetic benchmarks specifically targeting LLM-generated spam and demonstrates a lightweight, non-feature-engineered detector that jointly models textual semantics and review-graph structure. The open release of code and data supports reproducibility and future benchmarking in an area where realistic evaluation resources have been scarce. The results underscore the practical urgency of adapting spam-detection pipelines to generative models that can produce human-like deceptive text.

major comments (3)
  1. [§3] §3 (Dataset Construction): The generation procedure—guiding LLMs with product metadata plus genuine reference reviews—receives no ablation or control experiments that isolate generation-specific artifacts (e.g., consistent lexical or embedding biases traceable to the source LLMs) from deception-specific signals. Because the central performance claims rest on these datasets accurately proxying real-world LLM spam, the absence of such controls leaves open the possibility that reported gains exploit generation artifacts rather than genuine deceptive patterns.
  2. [§5] §5 (Experiments): The paper states improvements of up to 44.22% precision and 43.01% recall but provides neither statistical significance tests, standard deviations across multiple runs, nor detailed descriptions of baseline re-implementations and hyper-parameter choices. Without these, it is impossible to determine whether the gains are robust or partly attributable to implementation differences, undermining the load-bearing claim of consistent outperformance.
  3. [§4.2] §4.2 (Model Architecture): The gated graph transformer component is described at a high level, yet the precise definition of the review graph—node features, edge construction criteria, and connectivity rules—is not specified. This detail is essential for verifying that behavioral signals are genuinely captured rather than being an artifact of how the synthetic data were assembled.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly list the three LLMs used for dataset generation and the two human-written spam datasets referenced in the results.
  2. [Figures/Tables] Figure captions and table headers would benefit from additional detail on the exact metrics and dataset splits being reported.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which has helped us identify areas for improvement in the manuscript. We address each major comment below and commit to substantial revisions that will strengthen the empirical rigor, clarity, and reproducibility of the work while preserving its core contributions.

read point-by-point responses
  1. Referee: [§3] §3 (Dataset Construction): The generation procedure—guiding LLMs with product metadata plus genuine reference reviews—receives no ablation or control experiments that isolate generation-specific artifacts (e.g., consistent lexical or embedding biases traceable to the source LLMs) from deception-specific signals. Because the central performance claims rest on these datasets accurately proxying real-world LLM spam, the absence of such controls leaves open the possibility that reported gains exploit generation artifacts rather than genuine deceptive patterns.

    Authors: We acknowledge the validity of this concern. Although the datasets were constructed to simulate realistic LLM-generated spam and validated for persuasiveness by GPT-4.1, and although consistent gains across three distinct source LLMs provide some indirect evidence against pure artifact exploitation, we agree that explicit controls are needed. In the revised manuscript we will add ablation studies that (i) compare performance on reference-guided vs. metadata-only generations, (ii) quantify lexical and embedding biases across the three LLMs, and (iii) evaluate FraudSquad on a held-out set of human-written deceptive reviews to further separate generation artifacts from deception signals. revision: yes

  2. Referee: [§5] §5 (Experiments): The paper states improvements of up to 44.22% precision and 43.01% recall but provides neither statistical significance tests, standard deviations across multiple runs, nor detailed descriptions of baseline re-implementations and hyper-parameter choices. Without these, it is impossible to determine whether the gains are robust or partly attributable to implementation differences, undermining the load-bearing claim of consistent outperformance.

    Authors: We fully agree that statistical rigor and implementation transparency are required to substantiate the performance claims. In the revised version we will (i) report mean and standard deviation over at least five independent runs with different random seeds, (ii) include paired statistical significance tests (e.g., McNemar or t-tests) against each baseline, and (iii) provide an expanded appendix detailing baseline re-implementations, hyper-parameter search ranges, and the exact training/validation splits used for all experiments. revision: yes

  3. Referee: [§4.2] §4.2 (Model Architecture): The gated graph transformer component is described at a high level, yet the precise definition of the review graph—node features, edge construction criteria, and connectivity rules—is not specified. This detail is essential for verifying that behavioral signals are genuinely captured rather than being an artifact of how the synthetic data were assembled.

    Authors: We appreciate this request for greater precision. In the revised Section 4.2 we will explicitly define: node features as the concatenation of the pre-trained language-model embedding with reviewer and product metadata; edges as undirected connections between reviews that share the same product or exhibit cosine similarity above a tunable threshold on their embeddings; and the connectivity rules that govern graph construction from the raw review metadata. These additions will clarify how behavioral signals are extracted independently of the synthetic generation process. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation with independent metrics and baselines

full rationale

The paper creates synthetic LLM-generated spam datasets via metadata and reference reviews, proposes FraudSquad as a hybrid LM-embedding + gated graph transformer model, and reports empirical precision/recall gains against external baselines on both synthetic and human-written datasets. No derivation chain, equations, or first-principles results exist that could reduce to inputs by construction. No self-citations, fitted parameters renamed as predictions, or ansatz smuggling appear in the abstract or described contributions. The central claims rest on reported performance numbers and code/dataset release, which are externally verifiable and do not rely on self-referential definitions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the realism of the LLM-generated datasets and the assumption that graph structure encodes useful behavioral signals for classification; standard supervised learning assumptions apply but are not enumerated.

axioms (1)
  • domain assumption Graph constructed from reviews encodes behavioral signals relevant to spam detection
    The gated graph transformer component depends on this modeling choice for capturing non-textual patterns.

pith-pipeline@v0.9.0 · 5785 in / 1182 out tokens · 34435 ms · 2026-05-18T10:38:17.328008+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. JARVIS: An Evidence-Grounded Retrieval System for Interpretable Deceptive Reviews Adjudication

    cs.IR 2026-02 unverdicted novelty 5.0

    JARVIS combines hybrid retrieval and evidence graphs with LLMs to raise deceptive-review detection precision from 0.953 to 0.988 and recall from 0.830 to 0.901 on a custom dataset while cutting manual inspection time ...

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages · cited by 1 Pith paper · 11 internal anchors

  1. [1]

    Giuseppina Andresini, Andrea Iovine, Roberto Gasbarro, Marco Lomolino, Marco Degemmis, and Annalisa Appice. 2022. Review Spam Detection using Multi-View Deep Learning Combining Content and Behavioral Features. InThe 1st Italian Conference on Big Data and Data Science (itaDATA)

  2. [2]

    Alex Beutel, Wanhong Xu, Venkatesan Guruswami, Christopher Palow, and Chris- tos Faloutsos. 2013. CopyCatch: Stopping Group Attacks by Spotting Lockstep Behavior In Social Networks. InProceedings of the 22nd International Conference on World Wide Web. 119–130

  3. [3]

    Qiang Cao, Xiaowei Yang, Jieqi Yu, and Christopher Palow. 2014. Uncovering Large Groups of Active Malicious Accounts in Online Social Networks. InPro- ceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security. ACM, 477–488

  4. [4]

    2024.The Economic Toll of Fake Reviews: Market Data and Pre- vention Strategies

    Max Chekalov. 2024.The Economic Toll of Fake Reviews: Market Data and Pre- vention Strategies. https://www.99firms.com/blog/the-economic-toll-of-fake- reviews/?utm_source=chatgpt.com

  5. [5]

    DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

  6. [6]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

  7. [7]

    Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S. Yu. 2020. Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters.Proceedings of the 29th ACM International Conference on Information & Knowledge Management(2020)

  8. [8]

    Mingjiang Duan, Tongya Zheng, Yang Gao, Gang Wang, Zunlei Feng, and Xinyu Wang. 2024. DGA-GNN: Dynamic Grouping Aggregation GNN for Fraud Detec- tion. InProceedings of the AAAI Conference on Artificial Intelligence. 11820–11828

  9. [9]

    Nyamawe, Jude Tchaye-Kondi, and Abdulganiyu Abdu Yusuf

    Ramadhani Ally Duma, Zhendong Niu, Ally S. Nyamawe, Jude Tchaye-Kondi, and Abdulganiyu Abdu Yusuf. 2023. A Deep Hybrid Model for fake review detection by jointly leveraging review text, overall ratings, and aspect ratings. Soft Computing27 (2023), 6281–6296

  10. [10]

    Vijay Prakash Dwivedi and Xavier Bresson. 2020. A generalization of transformer networks to graphs.arXiv preprint arXiv:2012.09699(2020)

  11. [11]

    Shangbin Feng, Herun Wan, Ningnan Wang, Zhaoxuan Tan, Minnan Luo, and Yulia Tsvetkov. 2024. What does the bot say? Opportunitiesm and risks of large language models in social media bot detection. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics. 3580–3601

  12. [12]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Ab- hishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava...

  13. [13]

    Jiawei Gu, Xuhui Jiang, Zhichao Shi, Hexiang Tan, Xuehao Zhai, Chengjin Xu, Wei Li, Yinghan Shen, Shengjie Ma, Honghao Liu, et al . 2024. A survey on llm-as-a-judge.arXiv preprint arXiv:2411.15594(2024)

  14. [14]

    Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. InAdvances in Neural Information Processing Systems. 1024–1034

  15. [15]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv:1502.01852 [cs.CV] https://arxiv.org/abs/1502.01852

  16. [16]

    Bryan Hooi, Hyun Ah Song, Alex Beutel, Neil Shah, Kijung Shin, and Christos Faloutsos. 2016. FRAUDAR: Bounding Graph Fraud in the Face of Camouflage. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 895–904

  17. [17]

    Yupeng Hou, Jiacheng Li, Zhankui He, An Yan, Xiusi Chen, and Julian McAuley

  18. [18]

    Bridging Language and Items for Retrieval and Recommendation.arXiv preprint arXiv:2403.03952(2024)

  19. [19]

    Nitin Jindal and Bing Liu. 2008. Opinion spam and analysis. InProceedings of the 2008 International Conference on Web Search and Data Mining. 219–230

  20. [20]

    Adam: A Method for Stochastic Optimization

    Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Opti- mization.arXiv preprint arXiv:1412.6980(2014). https://arxiv.org/abs/1412.6980

  21. [21]

    Kipf and Max Welling

    Thomas N. Kipf and Max Welling. 2017. Semi-Supervised Classification with Graph Convolutional Networks. InProceedings of the International Conference on Learning Representations

  22. [22]

    Fangtao Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. 2011. Learning to identify review spam. InProceedings of the Twenty-Second International Joint Conference on Artificial Intelligence - Volume Volume Three(Barcelona, Catalonia, Spain) (IJCAI’11). AAAI Press, 2488–2493

  23. [23]

    Fei Li, Minlie Huang, Yi Yang, and Xiaoyan Zhu. 2014. Towards a holistic approach to detect spam reviews in online review platforms.Proceedings of the 23rd International Conference on World Wide Web(2014), 459–470

  24. [24]

    Shiyu Li, Yang Tang, Shizhe Chen, and Xi Chen. 2024. Conan-embedding: General Text Embedding with More and Better Negative Samples. arXiv:2408.15710

  25. [25]

    Ee-Peng Lim, Viet-An Nguyen, Nitin Jindal, Bing Liu, and Hady Wirawan Lauw

  26. [26]

    InProceedings of the 19th ACM International Conference on Information and Knowledge Manage- ment(Toronto, ON, Canada)(CIKM ’10)

    Detecting product review spammers using rating behaviors. InProceedings of the 19th ACM International Conference on Information and Knowledge Manage- ment(Toronto, ON, Canada)(CIKM ’10). Association for Computing Machinery, New York, NY, USA, 939–948. https://doi.org/10.1145/1871437.1871557

  27. [27]

    Aiwei Liu, Qiang Sheng, and Xuming Hu. 2024. Preventing and Detecting Mis- information Generated by Large Language Models. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3001–3004

  28. [28]

    Yang Liu, Xiang Ao, Zidi Qin, Jianfeng Chi, Jinghua Feng, Hao Yang, and Qing He. 2021. Pick and Choose: A GNN-based Imbalanced Learning Approach for Fraud Detection. InProceedings of the Web Conference 2021. 3168–3177

  29. [29]

    Yuli Liu, Yiqun Liu, Ke Zhou, Min Zhang, and Shaoping Ma. 2017. Detecting Col- lusive Spamming Activities in Community Question Answering. InProceedings of the 26th International Conference on World Wide Web. 1073–1082

  30. [30]

    Niklas Muennighoff, Nouamane Tazi, Loïc Magne, and Nils Reimers. 2022. MTEB: Massive Text Embedding Benchmark.arXiv preprint arXiv:2210.07316(2022). https://arxiv.org/abs/2210.07316

  31. [31]

    OpenAI. 2023. GPT-4 Technical Report.OpenAI Report(2023). https://cdn.openai. com/papers/gpt-4.pdf

  32. [32]

    OpenAI. 2025. Introducing GPT-4.1 in the API. https://openai.com/index/gpt-4-1/. Accessed: 2025-05-22

  33. [33]

    Myle Ott, Yejin Choi, Claire Cardie, and Jeffrey T Hancock. 2011. Finding de- ceptive opinion spam by any stretch of the imagination. InProceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 309–319

  34. [34]

    Yikang Pan, Liangming Pan, Wenhu Chen, Preslav Nakov, Min-Yen Kan, and William Yang Wang. 2023. On the risk of misinformation pollution with large language models.arXiv preprint arXiv:2305.13661(2023)

  35. [35]

    Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a Method for Automatic Evaluation of Machine Translation. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Philadelphia, Pennsylvania, USA, 311–318. https: //doi.org/10.3115/1073083.1073135

  36. [36]

    Shebuti Rayana and Leman Akoglu. 2015. Collective Opinion Spam Detection: Bridging Review Networks and metadata. InProceeding of the 21st ACM SIGKDD international conference on Knowledge discovery and data mining

  37. [37]

    Shebuti Rayana and Leman Akoglu. 2015. Collective opinion spam detection: Bridging review networks and metadata. InProceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 985–994

  38. [38]

    Yunsheng Shi, Zhengjie Huang, Shikun Feng, Hui Zhong, Wenjing Wang, and Yu Sun. 2021. Masked Label Prediction: Unified Message Passing Model for Semi- Supervised Classification. InProceedings of the 30th International Joint Conference on Artificial Intelligence

  39. [39]

    Tian Tian, Jun Zhu, Fen Xia, Xin Zhuang, and Tong Zhang. 2015. Crowd fraud de- tection in internet advertising. InProceedings of the 24th International Conference on World Wide Web. 1100–1110

  40. [40]

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971(2023)

  41. [41]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, Dan Bikel, Lukas Blecher, Cristian Canton Ferrer, Moya Chen, Guillem Cucu- rull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony...

  42. [42]

    Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. 2018. Graph Attention Networks. InProceedings of the Sixth International Conference on Learning Representations

  43. [43]

    Jian Wang, Shuhua Feng, Bing Liu, and Yuming Li. 2017. Using a hybrid content- based and behavior-based featuring approach in fake review detection. InPro- ceedings of the 2017 International Conference on Information Systems. 849–861

  44. [44]

    Minjie Wang, Da Zheng, Zihao Ye, Quan Gan, Mufei Li, Xiang Song, Jinjing Zhou, Chao Ma, Lingfan Yu, Yu Gai, Tianjun Xiao, Tong He, George Karypis, Jinyang Li, and Zheng Zhang. 2020. Deep Graph Library: A Graph-Centric, Detecting LLM-Generated Spam Reviews by Integrating Language Model Embeddings and Graph Neural Network Conference’17, July 2017, Washingto...

  45. [45]

    Jiaying Wu, Jiafeng Guo, and Bryan Hooi. 2024. Fake News in Sheep’s Clothing: Robust Fake News Detection Against LLM-Empowered Style Attacks. InPro- ceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 3367–3378

  46. [46]

    Jiaying Wu and Bryan Hooi. 2023. DECOR: Degree-Corrected Social Graph Refinement for Fake News Detection. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2582–2593

  47. [47]

    Sheng Xiang, Mingzhi Zhu, Dawei Cheng, Enxia Li, Ruihui Zhao, Yi Ouyang, Ling Chen, and Yefeng Zheng. 2023. Semi-Supervised Credit Card Fraud Detection via Attribute-Driven Graph Representation. InProceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence. Article 1633, 9 pages

  48. [48]

    Fan Xu, Nan Wang, Hao Wu, Xuezhi Wen, Xibin Zhao, and Hai Wan. 2023. Revisiting Graph-Based Fraud Detection in Sight of Heterophily and Spectrum. arXiv preprint arXiv:2312.06441(2023)

  49. [49]

    Rongwu Xu, Xiaojian Li, Shuo Chen, and Wei Xu. 2025. Nuclear Deployed: Analyzing Catastrophic Risks in Decision-making of Autonomous LLM Agents. arXiv:2502.11355 [cs.CL] https://arxiv.org/abs/2502.11355

  50. [50]

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Cheng- peng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfen...

  51. [51]

    Hang Yu, Zhengyang Liu, and Xiangfeng Luo. 2024. Barely Supervised Learning for Graph-Based Fraud Detection.Proceedings of the AAAI Conference on Artificial Intelligence(Mar. 2024), 16548–16557

  52. [52]

    Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. 2019. Defending against neural fake news. InProceedings of the 33rd International Conference on Neural Information Processing Systems, Vol. 32. Article 812

  53. [53]

    Yufan Zeng and Jiashan Tang. 2021. RLC-GNN: An Improved Deep Architecture for Spatial-Based Graph Neural Network with Application to Fraud Detection. Applied Sciences11, 12 (2021), 5656

  54. [54]

    Dun Zhang, Jiacheng Li, Ziyang Zeng, and Fulong Wang. 2025. Jasper and Stella: distillation of SOTA embedding models. arXiv:2412.19048

  55. [55]

    Shijie Zhang, Hongzhi Yin, Tong Chen, Quoc Viet Nguyen Hung, Zi Huang, and Lizhen Cui. 2020. GCN-Based User Representation Learning for Unifying Robust Recommendation and Fraudster Detection. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval. 689–698

  56. [56]

    Yu Zhang, Pang-Ning Tan, and Ying Ding. 2020. Fraud review detection using graph convolutional networks. InProceedings of the 29th ACM International Conference on Information and Knowledge Management. ACM, 2773–2781

  57. [57]

    Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models.arXiv preprint arXiv:2303.182231, 2 (2023)