Recognition: no theorem link
UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains
Pith reviewed 2026-05-10 15:57 UTC · model grok-4.3
The pith
LLMs guided by domain knowledge generate general transaction summaries to detect fraud across different blockchains.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
UniDetect is a multi-chain cryptocurrency fraud account detection method based on large language models. Domain knowledge guides the LLM to generate general transaction summary texts applicable to heterogeneous blockchain accounts, which serve as evidence for fraud account detection. A two-stage alternating training strategy continuously and dynamically enhances the multimodal joint reasoning for detecting fraudulent accounts based on both the textual evidence and the transaction graph patterns.
What carries the argument
Domain-knowledge-guided LLM that produces general transaction summary texts, paired with two-stage alternating training that alternates between textual evidence and graph patterns for multimodal fraud classification.
If this is right
- Outperforms existing methods by 5.57% to 7.58% in Kolmogorov-Smirnov statistic on multiple blockchains.
- Identifies over 94.58% of fraudulent accounts in cross-chain zero-shot detection scenarios.
- Delivers a 6.06% F1 improvement when applied to non-blockchain data.
- Supports monitoring of DeFi protocols that reorganize illicit funds into uniform assets across chains.
Where Pith is reading between the lines
- The summaries could serve as a bridge for integrating fraud signals from blockchains with traditional financial transaction logs.
- If the alternating training stabilizes, similar LLM-plus-graph pipelines might apply to other heterogeneous network fraud problems such as social media account abuse.
- Broader deployment would require testing whether performance holds when the underlying LLM is updated or replaced.
Load-bearing premise
Domain-knowledge-guided LLM prompts can reliably produce general, unbiased transaction summary texts that serve as effective evidence for fraud detection across truly heterogeneous blockchains without hallucinations or chain-specific artifacts.
What would settle it
Running the method on a previously unseen blockchain where the generated summaries contain detectable chain-specific phrasing or errors that cause detection accuracy to drop below baseline levels.
Figures
read the original abstract
As cross-chain interoperability advances, decentralized finance (DeFi) protocols enable illicit funds to be reorganized into uniform liquid assets that flow throughout the cryptocurrency market. Such operations can bypass monitoring targeted at individual blockchains and thereby weaken current regulatory frameworks. Motivated by these, we introduce UniDetect, a multi-chain cryptocurrency fraud account detection method based on large language models (LLMs). Specifically, we use domain knowledge to guide the LLM to generate general transaction summary texts applicable to heterogeneous blockchain accounts, which serve as evidence for fraud account detection. Furthermore, we introduce a two-stage alternating training strategy to continuously and dynamically enhance the multimodal joint reasoning for detecting fraudulent accounts based on both the textual evidence and the transaction graph patterns. Experiments on multiple blockchains show that UniDetect outperforms existing methods 5.57% to 7.58% in Kolmogorov-Smirnov (KS). For cross-chain zero-shot detection, UniDetect identifies over 94.58% of fraudulent accounts. It also generalizes well to non-blockchain data, delivering a 6.06% improvement in F1 over existing methods. The dataset and source code are available at https://github.com/msy0513/UniDetect.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents UniDetect, an LLM-driven framework for universal fraud account detection across heterogeneous blockchains. Domain knowledge guides LLMs to produce general transaction summary texts as evidence; these are fused with graph patterns via a two-stage alternating training procedure. Reported results include 5.57–7.58% KS gains over baselines on multiple chains, >94.58% recall in cross-chain zero-shot settings, and 6.06% F1 improvement on non-blockchain data, with code and data released.
Significance. If validated, the approach would offer a practical step toward cross-chain monitoring in DeFi by leveraging textual evidence to bridge heterogeneous transaction graphs. The open release of dataset and code strengthens reproducibility and enables follow-on work on multimodal fraud detection.
major comments (3)
- [Section 3] Section 3 (method): the central performance claims rest on LLM-generated transaction summaries serving as reliable, general evidence. No quantitative validation is provided (hallucination rates, inter-chain consistency metrics, human evaluation, or automated checks), nor is there an ablation that isolates the contribution of the text modality versus graph patterns alone. Without these, the reported KS and zero-shot gains cannot be attributed to the claimed universal mechanism rather than chain-specific artifacts or LLM biases.
- [Experiments] Experimental section: the abstract and results report concrete KS/F1 improvements and zero-shot recall, yet the manuscript provides insufficient detail on baseline implementations, data splits, statistical significance testing (e.g., paired t-tests or multiple-run variance), and whether post-hoc prompt or hyperparameter choices were tuned on the test sets. This undermines confidence that the 5.57–7.58% margins are robust.
- [Section 3.2] Two-stage training: the alternating optimization between textual and graph modalities introduces additional hyperparameters (alternation schedule, fusion weights) whose sensitivity is not analyzed. If these are tuned per chain, the universality claim is weakened.
minor comments (2)
- [Section 3.1] Notation for the multimodal fusion module is introduced without a clear equation or diagram showing how textual embeddings are aligned with graph node features.
- [Figures 4-6] Figure captions and axis labels in the results section could be expanded to indicate exact dataset sizes and number of runs per bar.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen validation, experimental details, and analysis of the training procedure.
read point-by-point responses
-
Referee: [Section 3] Section 3 (method): the central performance claims rest on LLM-generated transaction summaries serving as reliable, general evidence. No quantitative validation is provided (hallucination rates, inter-chain consistency metrics, human evaluation, or automated checks), nor is there an ablation that isolates the contribution of the text modality versus graph patterns alone. Without these, the reported KS and zero-shot gains cannot be attributed to the claimed universal mechanism rather than chain-specific artifacts or LLM biases.
Authors: We agree that additional quantitative validation would strengthen attribution of the gains to the universal textual mechanism. In the revised manuscript, we will add a human evaluation of a sampled set of LLM-generated summaries, reporting hallucination rates and inter-chain consistency metrics. We will also include an ablation study comparing graph-only performance against the full multimodal model to isolate the text modality's contribution. revision: yes
-
Referee: [Experiments] Experimental section: the abstract and results report concrete KS/F1 improvements and zero-shot recall, yet the manuscript provides insufficient detail on baseline implementations, data splits, statistical significance testing (e.g., paired t-tests or multiple-run variance), and whether post-hoc prompt or hyperparameter choices were tuned on the test sets. This undermines confidence that the 5.57–7.58% margins are robust.
Authors: We will expand the experimental section with full details on baseline implementations and their adaptations to our multi-chain setting. Data splits will be explicitly described, and we will report results with means and standard deviations over multiple runs along with paired t-test significance. We confirm and will clarify that all prompt and hyperparameter tuning occurred on held-out validation sets with no test-set access. revision: yes
-
Referee: [Section 3.2] Two-stage training: the alternating optimization between textual and graph modalities introduces additional hyperparameters (alternation schedule, fusion weights) whose sensitivity is not analyzed. If these are tuned per chain, the universality claim is weakened.
Authors: We will add a sensitivity analysis subsection evaluating performance stability across ranges of the alternation schedule and fusion weights. The hyperparameters follow a single validation-based selection procedure applied uniformly to all chains; they are not tuned individually per chain. The added analysis will demonstrate robustness and support the universality claim. revision: partial
Circularity Check
No circularity: empirical performance claims rest on held-out experiments, not self-referential derivations
full rationale
The paper introduces an LLM-guided method for generating transaction summaries followed by two-stage multimodal training, then reports concrete empirical metrics (KS gains of 5.57–7.58%, 94.58% zero-shot recall, 6.06% F1 on external data) evaluated on held-out blockchain and non-blockchain datasets. No equations, fitted parameters, or uniqueness theorems are presented whose outputs reduce by construction to the inputs; the central results are standard supervised-learning performance numbers whose validity depends on data splits and evaluation protocols rather than definitional equivalence or self-citation chains.
Axiom & Free-Parameter Ledger
free parameters (2)
- LLM prompt design and domain knowledge injection
- Two-stage training hyperparameters and alternation schedule
axioms (2)
- domain assumption LLM-generated transaction summaries are accurate, unbiased, and general across heterogeneous blockchains
- domain assumption Transaction graph patterns provide complementary signal to the textual summaries
Reference graph
Works this paper leans on
-
[1]
Scott Beamer, Krste Asanović, and David Patterson. 2013. Direction-optimizing breadth-first search.Scientific Programming21, 3-4 (2013), 137–148
2013
-
[2]
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. 2014. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555(2014)
work page internal anchor Pith review arXiv 2014
-
[3]
Zhihao Ding, Jieming Shi, Qing Li, and Jiannong Cao. 2024. Effective illicit account detection on large cryptocurrency multigraphs. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 457–466
2024
-
[4]
Yingtong Dou, Zhiwei Liu, Li Sun, Yutong Deng, Hao Peng, and Philip S Yu. 2020. Enhancing graph neural network-based fraud detectors against camouflaged fraudsters. InProceedings of the 29th ACM international conference on information & knowledge management. 315–324
2020
-
[5]
Mingjiang Duan, Tongya Zheng, Yang Gao, Gang Wang, Zunlei Feng, and Xinyu Wang. 2024. Dga-gnn: Dynamic grouping aggregation gnn for fraud detection. In Proceedings of the AAAI conference on artificial intelligence, Vol. 38. 11820–11828
2024
-
[6]
Elliptic. 2025. Chain-hopping Emerges as the Defining Money Laundering Method of 2025. https://www.elliptic.co/blog/chain-hopping-defining-money- laundering-method-of-2025
2025
-
[7]
Sam Gilbert. 2022. Crypto, web3, and the Metaverse.Bennett Institute for Public Policy, Cambridge, Policy Brief(2022)
2022
-
[8]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable feature learning for networks. InProceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining. 855–864
2016
-
[9]
Will Hamilton, Zhitao Ying, and Jure Leskovec. 2017. Inductive Representation Learning on Large Graphs. InAdvances in Neural Information Processing Systems, I. Guyon, U. Von Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett (Eds.), Vol. 30. Curran Associates, Inc
2017
-
[10]
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models.ICLR1, 2 (2022), 3
2022
-
[11]
Sihao Hu, Zhen Zhang, Bingqiao Luo, Shengliang Lu, Bingsheng He, and Ling Liu. 2023. Bert4eth: A pre-trained transformer for ethereum fraud detection. In Proceedings of the ACM Web Conference 2023. 2189–2197
2023
-
[12]
Xuanwen Huang, Kaiqiao Han, Yang Yang, Dezheng Bao, Quanjin Tao, Ziwei Chai, and Qi Zhu. 2024. Can gnn be good adapter for llms?. InProceedings of the ACM Web Conference 2024. 893–904
2024
-
[13]
Semi-Supervised Classification with Graph Convolutional Networks
Thomas N. Kipf and Max Welling. 2016. Semi-Supervised Classification with Graph Convolutional Networks.CoRRabs/1609.02907 (2016). arXiv:1609.02907
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [14]
- [15]
-
[16]
Liheng Ma, Chen Lin, Derek Lim, Adriana Romero-Soriano, Puneet K Dokania, Mark Coates, Philip Torr, and Ser-Nam Lim. 2023. Graph inductive biases in transformers without message passing. InInternational Conference on Machine Learning. PMLR, 23321–23337
2023
-
[17]
Shuyi Miao, Wangjie Qiu, Xiaofan Tu, Yunze Li, Yongxin Wen, and Zhiming Zheng. 2026. Tracing Your Account: A Gradient-Aware Dynamic Window Graph Framework for Ethereum under Privacy-Preserving Services.IEEE Transactions on Information Forensics and Security(2026)
2026
-
[18]
Shuyi Miao, Wangjie Qiu, Hongwei Zheng, Qinnan Zhang, Xiaofan Tu, Xunan Liu, Yang Liu, Jin Dong, and Zhiming Zheng. 2025. Know Your Account: Double Graph Inference-based Account De-anonymization on Ethereum.2025 IEEE 41rd International Conference on Data Engineering (ICDE)(2025)
2025
-
[19]
Dang Nguyen, Tu Dinh Nguyen, Wei Luo, and Svetha Venkatesh. 2018. Trans2vec: learning transaction embedding via items and frequent itemsets. InPacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 361–372
2018
-
[20]
Bryan Perozzi, Rami Al-Rfou, and Steven Skiena. 2014. Deepwalk: Online learning of social representations. InProceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining. 701–710
2014
-
[21]
Farimah Poursafaei, Reihaneh Rabbany, and Zeljko Zilic. 2021. Sigtran: signature vectors for detecting illicit activities in blockchain transaction networks. In Pacific-Asia Conference on Knowledge Discovery and Data Mining. Springer, 27– 39
2021
-
[22]
Nils Reimers and Iryna Gurevych. 2019. Sentence-bert: Sentence embeddings using siamese bert-networks.arXiv preprint arXiv:1908.10084(2019)
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[23]
Jie Shen, Jiajun Zhou, Yunyi Xie, Shanqing Yu, and Qi Xuan. 2021. Identity inference on blockchain using graph neural network. InInternational conference on blockchain and trustworthy systems. Springer, 3–17
2021
-
[24]
Jianheng Tang, Jiajin Li, Ziqi Gao, and Jia Li. 2022. Rethinking graph neural networks for anomaly detection. InInternational conference on machine learning. PMLR, 21076–21089
2022
-
[25]
Gemma Team, Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupati- raju, Shreya Pathak, Laurent Sifre, Morgane Rivière, Mihir Sanjay Kale, Juliette Love, et al. 2024. Gemma: Open models based on gemini research and technology. arXiv preprint arXiv:2403.08295(2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
2017
-
[27]
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, Yoshua Bengio, et al. 2017. Graph attention networks.stat1050, 20 (2017), 10–48550
2017
-
[28]
Jinhuan Wang, Pengtao Chen, Xinyao Xu, Jiajing Wu, Meng Shen, Qi Xuan, and Xiaoniu Yang. 2025. Tsgn: Transaction subgraph networks assisting phishing detection in ethereum.IEEE Transactions on Dependable and Secure Computing (2025)
2025
-
[29]
Ronald J Williams. 1992. Simple statistical gradient-following algorithms for connectionist reinforcement learning.Machine learning8, 3 (1992), 229–256
1992
- [30]
-
[31]
Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How Powerful are Graph Neural Networks?CoRRabs/1810.00826 (2018). arXiv:1810.00826
work page internal anchor Pith review arXiv 2018
-
[32]
Chengdong Yang, Hongrui Liu, Daixin Wang, Zhiqiang Zhang, Cheng Yang, and Chuan Shi. 2025. Flag: Fraud detection with llm-enhanced graph neural network. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 5150–5160. Conference’17, July 2017, Washington, DC, USA Miao et al
2025
-
[33]
Haibin Zheng, Minying Ma, Haonan Ma, Jinyin Chen, Haiyang Xiong, and Zhijun Yang. 2023. Tegdetector: a phishing detector that knows evolving transaction behaviors.IEEE Transactions on Computational Social Systems11, 3 (2023), 3988– 4000
2023
-
[34]
Jiajun Zhou, Chenkai Hu, Jianlei Chi, Jiajing Wu, Meng Shen, and Qi Xuan. 2022. Behavior-aware account de-anonymization on Ethereum interaction graph.IEEE Transactions on Information Forensics and Security17 (2022), 3433–3448
2022
-
[35]
Wei Zhuo, Zemin Liu, Bryan Hooi, Bingsheng He, Guang Tan, Rizal Fathony, and Jia Chen. 2024. Partitioning message passing for graph fraud detection.arXiv preprint arXiv:2412.00020(2024). UniDetect: LLM-Driven Universal Fraud Detection across Heterogeneous Blockchains Conference’17, July 2017, Washington, DC, USA Supplementary Material Outline.This supplem...
-
[36]
Task Description You are a blockchain behavior analyst tasked with analyzing a account's transaction of [The specific chain name] (e.g., Ethereum) patterns to produce a professional, engaging, and cohesive narrative
-
[37]
Output Constraint The output should be suitable for generating rich text embeddings or supporting classification tasks
-
[38]
Chain-of-Thought Instructions (CoT)
-
[39]
Analyze Transaction Data: Value Flow: Assess whether the account primarily acts as a net receiver, net distributor, or remains balanced based on overall incoming and outgoing value. Describe the scale of value transfers (e.g., large if total_value_out > 1000 ETH, small if < 100 ETH); Partner Ratio: Compare incoming and outgoing counterparties to ident...
-
[40]
Generate Narrative: Produce a 3-5 sentence paragraph that weaves together insights on value flow, partner ratios, transaction timing, and Gas expenditure
-
[41]
2015-08-07
Data Preparation Provided Data: [Transactions] Prompt for Forensics Analyst Agent Figure 7: CoT prompt template of the forensics analyst agent. Ethereum.We obtain the fraud labels from the Etherscan label cloud6, XLabelCloud7 and CryptoScamDB8. These labels cover ma- jor illegal account types such as phishing, Upbit vulnerability at- tacks, gambling, and ...
2015
-
[42]
Task Description For each transaction summary, identify the MOST distinguishing causal pattern that indicates whether the address is normal or fraudulent (e.g., phishing, scam)
-
[43]
• Fund Flows: Aggregation (many sources to one address) or Dispersion (one address to many targets), layered transfers via intermediates
Analysis Framework: • Transaction Patterns: High-frequency/large-value transfers, amounts near regulatory thresholds(e.g., < $50k if the reporting limit is $50k), round-number transfers (e.g., 1,000/10,000 units) without business logic, or self-transfers/reversals. • Fund Flows: Aggregation (many sources to one address) or Dispersion (one address to many ...
-
[44]
Output Generate ONE sentence per address. Prompt for Discriminative Summary Analyst Agent Role definition You are provided with a list of [The specific chain name] (e.g., Ethereum) accounts' historical transaction summaries. Each account is classified as either normal or fraudulent (e.g., associated with phishing, scams, or other illicit activities) based...
-
[45]
These sentences must present objective, raw data points, without any assessment of normal, fraudulent, or significance
Task Description For each transaction summary, Write ONE factual sentence. These sentences must present objective, raw data points, without any assessment of normal, fraudulent, or significance
-
[46]
Analysis Framework • Noise Extraction: Exclude unsupported or unverifiable statements (e.g., speculative risk assertions and fabricated associations), as well as task- irrelevant generation noise such as prompt echoing and redundant content. • Allowed Factual Data Points (Strictly Quantitative and Objective): Retain only descriptive, verifiable statements...
-
[47]
Prompt for Residual Summary Analyst Agent Figure 8: CoT prompt template of discriminative summary analyst agent and residual summary analyst agent
Output Generate ONE sentence per address. Prompt for Residual Summary Analyst Agent Figure 8: CoT prompt template of discriminative summary analyst agent and residual summary analyst agent. edges, to enable a more comprehensive evaluation of our model. (1)Bitcoin-Mcontains the first 1.5 million transactions before June 2015. For each transaction edge in B...
2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.