pith. sign in

arxiv: 2602.13480 · v2 · pith:AW744XBYnew · submitted 2026-02-13 · 💻 cs.CR · cs.LG

MELT: A Behavioral Trace Dataset for High-Risk Memecoin Launch Detection

Pith reviewed 2026-05-22 10:36 UTC · model grok-4.3

classification 💻 cs.CR cs.LG
keywords memecoinSolanabehavioral traceslaunch detectioninsider coordinationdatasetmachine learningrisk mitigation
0
0 comments X

The pith

A dataset of parsed behavioral traces from Solana memecoin launches enables detection of high-risk cases by exposing coordinated insider activity that standard methods miss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MELT as the first structured dataset for analyzing high-risk memecoin launches, parsing over 200 million transactions from 41,000 launches into typed records that separate swaps, wash trades, transfers, and mints. It adds bundle-trace data that connects accounts controlled by the same entity and shows these coordinated groups hold 36.5 percent of token supply on average, a tactic that conceals true concentration from buyers. The dataset supplies 122 behavioral features plus risk-level annotations that support supervised learning at large scale. When model predictions from these features guide a basic selection strategy, they produce a significant drop in investment losses. This matters because launchpads now dominate memecoin issuance and existing rug-pull detectors fail to address the underlying coordination patterns.

Core claim

MELT supplies the first behavioral trace dataset for high-risk memecoin launch detection on Solana by converting raw blockchain transactions into typed records and bundle traces that link accounts under common control, revealing average coordinated holdings of 36.5 percent of supply and providing 122 features with risk annotations that allow machine learning models to flag threats and cut losses when used in selection strategies.

What carries the argument

Bundle-trace data that links accounts controlled by the same entity and reveals concealed ownership concentration across the parsed transaction records.

If this is right

  • Supervised models trained on the 122 features can operate at population scale to classify new launches by risk level.
  • Incorporating model outputs into a simple buy-selection rule produces a concrete reduction in realized investment losses.
  • Bundle traces expose that coordinated accounts systematically mask supply concentration from ordinary buyers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trace-parsing approach could be adapted to flag coordination patterns in other Solana-based token activities.
  • Real-time versions of these features might allow launchpad platforms to surface risk signals to users before trading opens.
  • The dataset structure supports longitudinal studies of how launch mechanics affect the prevalence of bundled accounts.

Load-bearing premise

The parsed behavioral records and risk annotations accurately capture real insider coordination and launch risk without substantial errors from heterogeneous blockchain data or labeling mistakes.

What would settle it

Run the trained models on a fresh set of memecoin launches after the dataset cutoff and check whether launches flagged as high-risk produce measurably higher loss rates or rug events than those flagged as low-risk.

Figures

Figures reproduced from arXiv: 2602.13480 by Ling Liu, Selim Furkan Tekin, Sihao Hu, Yichang Xu.

Figure 1
Figure 1. Figure 1: Early buyers accumulate substantial token holdings [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: After the memecoin is created (Stage 1), the de [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of a low-risk memecoin and two manipulated memecoins. For memecoin (b) and (c), manipulators dynami [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Meanwhile, the average buy volume of high-risk meme [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Feature importance score calculated by the RF model. Blue marks the contextual information features, orange marks [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Launchpads have become the dominant mechanism for issuing memecoins, exposing investors to a new class of high-risk launches that existing rug-pull detection methods cannot capture. We argue that detecting these threats requires structured behavioral traces that underlie raw heterogeneous blockchain data, i.e., how insiders accumulate, coordinate, and unwind positions. To enable such analysis, we introduce MELT (MEmecoin Launch Trace, the first behavioral trace dataset for analyzing and detecting high-risk memecoin launches on Solana. MELT covers 41k+ memecoin launches with 200M+ transactions parsed into typed behavioral records that distinguish swaps, wash trades, transfers, and mints. Beyond per-account behaviors, MELT contributes bundle-trace data that links accounts controlled by the same entity, revealing that, on average, 36.5% of token supply is held by coordinated accounts, a concealment strategy that disguises the true ownership concentration from unsuspecting buyers. On top of these traces, MELT provides 122 behavioral features and risk-level annotations, enabling supervised learning at a population scale. We benchmark representative ML models on the high-risk launch detection task. Integrating their predictions into a simple memecoin selection strategy reduces investment loss significantly, demonstrating that behavioral traces can be translated into risk mitigation. Our dataset and code is available at https://github.com/git-disl/MELT.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces MELT, the first behavioral trace dataset for high-risk memecoin launch detection on Solana, covering 41k+ launches with 200M+ transactions parsed into typed records (swaps, wash trades, transfers, mints). It supplies bundle-trace data linking coordinated accounts (revealing on average 36.5% of supply held by such entities), 122 behavioral features, and risk-level annotations to support supervised learning. Representative ML models are benchmarked on high-risk detection, and their predictions are shown to reduce investment loss when integrated into a simple selection strategy.

Significance. If the annotations prove reliable, the work supplies a large-scale, publicly released resource (with code at the linked GitHub) that enables population-level study of insider coordination and behavioral risk signals in memecoins, a domain where existing rug-pull detectors fall short. The coordinated-account statistic and the empirical loss-reduction result are concrete strengths that could inform both research and practical mitigation strategies.

major comments (2)
  1. [§4 (Annotation and Labeling)] §4 (Annotation and Labeling): The risk-level annotations central to the supervised benchmarks and loss-reduction demonstration are described without reporting inter-annotator agreement, external validation against known rug-pull cases, or confirmation that labels exclude post-launch information (e.g., subsequent large sells or rug outcomes). This leaves open the possibility that the reported performance gains rest on circular or noisy labels rather than launch-time behavioral signals.
  2. [§5 (ML Benchmarks and Strategy Evaluation)] §5 (ML Benchmarks and Strategy Evaluation): The claim that integrating model predictions reduces investment loss significantly lacks an ablation that removes potential post-hoc label information or compares against baselines using only pre-launch features. Without such controls, it is unclear whether the loss reduction follows from the 122 behavioral features or from artifacts in the heterogeneous Solana transaction parsing.
minor comments (2)
  1. [Abstract] The abstract states that the strategy 'reduces investment loss significantly' but supplies no quantitative figures, confidence intervals, or baseline comparisons; adding these would strengthen the claim.
  2. [§3 (Behavioral Traces)] Clarify the exact heuristics used to construct bundle traces and link accounts controlled by the same entity; any assumptions about wallet clustering should be stated explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their thorough review and valuable suggestions that help clarify the strengths and limitations of our work. We address each major comment below and have revised the manuscript to improve methodological transparency where possible.

read point-by-point responses
  1. Referee: [§4 (Annotation and Labeling)] §4 (Annotation and Labeling): The risk-level annotations central to the supervised benchmarks and loss-reduction demonstration are described without reporting inter-annotator agreement, external validation against known rug-pull cases, or confirmation that labels exclude post-launch information (e.g., subsequent large sells or rug outcomes). This leaves open the possibility that the reported performance gains rest on circular or noisy labels rather than launch-time behavioral signals.

    Authors: We agree that greater detail on the annotation process is warranted. The risk labels were assigned by two domain experts who examined only behavioral traces observable at launch and within the first 24 hours, using criteria focused on coordination signals, wash-trade patterns, and initial supply distribution. We have revised §4 to expand the annotation guidelines, report inter-annotator agreement computed on a sampled subset, and explicitly confirm that no information from events after the 24-hour window (including later sells or rug outcomes) was used. A direct external validation against a complete registry of rug-pulls is not feasible given the absence of an authoritative ground-truth database for this emerging domain; however, we have added qualitative alignment checks against several publicly reported high-risk launches. revision: partial

  2. Referee: [§5 (ML Benchmarks and Strategy Evaluation)] §5 (ML Benchmarks and Strategy Evaluation): The claim that integrating model predictions reduces investment loss significantly lacks an ablation that removes potential post-hoc label information or compares against baselines using only pre-launch features. Without such controls, it is unclear whether the loss reduction follows from the 122 behavioral features or from artifacts in the heterogeneous Solana transaction parsing.

    Authors: We appreciate the call for stronger controls. In the revised §5 we have added an ablation that retrains all models using only features derivable from pre-launch and launch-time transactions, excluding any post-launch data. The loss-reduction result remains statistically significant under this restriction. We have also introduced a comparison against a baseline that uses only conventional pre-launch metrics (initial liquidity, holder count, and developer activity). Our full feature set outperforms this baseline, indicating that the reported gains are attributable to the behavioral traces rather than parsing artifacts. The transaction parsing pipeline is further documented to address potential heterogeneity concerns. revision: yes

Circularity Check

0 steps flagged

No significant circularity: dataset release with empirical benchmarks

full rationale

The paper introduces a parsed behavioral trace dataset (MELT) covering 41k+ launches and 200M+ transactions, supplies 122 features plus risk-level annotations, and reports standard ML benchmarks plus a downstream selection strategy that reduces loss. No derivation chain, equations, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or renamed patterns. The central claims rest on external empirical evaluation of the released resource rather than any self-referential loop; the work is therefore self-contained against its stated benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on the premise that raw heterogeneous blockchain transactions can be reliably parsed into typed behavioral records and that coordinated-account bundles can be inferred from transaction patterns without independent ground-truth verification.

axioms (2)
  • domain assumption Raw blockchain transaction logs contain sufficient structure to be parsed into distinct behavioral categories (swaps, wash trades, transfers, mints) without significant information loss.
    Invoked when the paper states that 200M+ transactions are parsed into typed behavioral records.
  • domain assumption Risk-level annotations provided with the dataset constitute reliable ground truth for supervised learning.
    Required for the ML benchmarking and loss-reduction claim to be meaningful.

pith-pipeline@v0.9.0 · 5784 in / 1349 out tokens · 38770 ms · 2026-05-22T10:36:34.587649+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    Program Derived Addresses

    [n.d.]. Program Derived Addresses. https://solana.com/docs/core/cpi#program- derived-addresses. Accessed: 2025-12-01

  2. [2]

    Memecoin Contagion: Irrationality, Illicit Behaviour, and Market Manipu- lation.Finance Research Letters(2025)

    2025. Memecoin Contagion: Irrationality, Illicit Behaviour, and Market Manipu- lation.Finance Research Letters(2025)

  3. [3]

    Abdulrahman Alhaidari, Bhavani Kalal, Balaji Palanisamy, and Shamik Sural

  4. [4]

    InProceedings of the Fifteenth ACM Conference on Data and Application Security and Privacy

    SolRPDS: A Dataset for Analyzing Rug Pulls in Solana Decentralized Finance. InProceedings of the Fifteenth ACM Conference on Data and Application Security and Privacy. 293–298

  5. [5]

    Shaojie Bai. 2018. An Empirical Evaluation of Generic Convolutional and Recur- rent Networks for Sequence Modeling.arXiv preprint arXiv:1803.01271(2018)

  6. [6]

    Ferenc Béres, István A Seres, András A Benczúr, and Mikerah Quintyne-Collins

  7. [7]

    In2021 IEEE international conference on decentralized applications and infrastructures (DAPPS)

    Blockchain is watching you: Profiling and deanonymizing ethereum users. In2021 IEEE international conference on decentralized applications and infrastructures (DAPPS). IEEE, 69–78

  8. [8]

    Leo Breiman. 2001. Random forests.Machine learning45, 1 (2001), 5–32

  9. [9]

    Federico Cernera, Massimo La Morgia, Alessandro Mei, and Francesco Sassi

  10. [10]

    In 32nd USENIX Security Symposium (USENIX Security 23)

    Token spammers, rug pulls, and sniper bots: An analysis of the ecosystem of tokens in ethereum and in the binance smart chain ( { { { { {BNB} } } } }). In 32nd USENIX Security Symposium (USENIX Security 23). 3349–3366

  11. [11]

    Tianqi Chen. 2016. XGBoost: A Scalable Tree Boosting System.Cornell University (2016)

  12. [12]

    Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014)

  13. [13]

    CoinDesk. 2024. Crypto Investors Lost Over $500M in Memecoin Rug Pulls and Scams in 2024. https://www.coindesk.com/business/2025/02/11/crypto- investors-lost-over-usd500m-in-memecoin-rug-pulls-and-scams-in-2024

  14. [14]

    CoinDesk. 2025. Pump.fun Hits Back at Report That Claimed 98% of Memecoins on the Platform Are Fraudulent.CoinDesk(7 May 2025). https://www.coindesk.com/business/2025/05/07/98-of-tokens-on-pump- fun-have-been-rug-pulls-or-an-act-of-fraud-new-report-says Accessed: 2025- 11-09

  15. [15]

    Thomas Conlon and Shaen Corbet. 2025. Memecoin contagion: Irrationality, illicit behaviour, and Cryptocurrency risk.Finance Research Letters(2025), 108264

  16. [16]

    Patrick Davison. 2012. The language of internet memes.The social media reader 120 (2012), 134

  17. [17]

    dYdX Foundation. 2024. What Are Bonding Curves, and How Do They Work? https://www.dydx.xyz/crypto-learning/bonding-curve. Accessed: 2025-12-08

  18. [18]

    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation9, 8 (1997), 1735–1780

  19. [19]

    Sihao Hu, Tiansheng Huang, Ka-Ho Chow, Wenqi Wei, Yanzhao Wu, and Ling Liu

  20. [20]

    InProceedings of the ACM Web Conference 2024

    Zipzap: Efficient training of language models for large-scale fraud detection on blockchain. InProceedings of the ACM Web Conference 2024. 2807–2816

  21. [21]

    Sihao Hu, Tiansheng Huang, Fatih Ilhan, Selim Furkan Tekin, Greg Eisenhauer, Margaret L Loper, and Ling Liu. 2025. Matching Accounts on Blockchain via Pseudo Fine-tuning of Language Models.ACM Transactions on Intelligent Systems and Technology(2025)

  22. [22]

    Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, and Ling Liu

  23. [23]

    In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA)

    Large language model-powered smart contract vulnerability detection: New perspectives. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 297–306

  24. [24]

    Sihao Hu, Zhen Zhang, Shengliang Lu, Bingsheng He, and Zhao Li. 2023. Sequence-based target coin prediction for cryptocurrency pump-and-dump.Pro- ceedings of the ACM on Management of Data1, 1 (2023), 1–19

  25. [25]

    Sihao Hu, Zhen Zhang, Bingqiao Luo, Shengliang Lu, Bingsheng He, and Ling Liu. 2023. Bert4eth: A pre-trained transformer for ethereum fraud detection. In Proceedings of the ACM Web Conference 2023. 2189–2197

  26. [26]

    Investopedia. 2024. The Hidden Dangers of Buying Meme Coins. https://www. investopedia.com/top-risks-of-buying-meme-coins-8782157

  27. [27]

    Jito Foundation. 2024. Jito MEV Documentation. https://jito-foundation.gitbook. io/mev. Accessed: 2025-12-08

  28. [28]

    Josh Kamps and Bennett Kleinberg. 2018. To the moon: defining and detecting cryptocurrency pump-and-dumps.Crime Science7, 1 (2018), 1–18

  29. [29]

    Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems30 (2017)

  30. [31]

    Massimo La Morgia, Alessandro Mei, Francesco Sassi, and Julinda Stefa. 2021. The doge of wall street: Analysis and detection of pump and dump cryptocurrency manipulations.arXiv preprint arXiv:2105.00733(2021)

  31. [32]

    Tao Li, Donghwa Shin, and Baolian Wang. 2021. Cryptocurrency pump-and- dump schemes.A vailable at SSRN 3267041(2021)

  32. [33]

    Yueyao Li, Nanjun Yao, Yuhui Huo, and Wei Cai. 2025. Trust Dynamics and Bot-Driven Responses: An Approach to Rug Pulls in Solana Meme Coin Markets. InProceedings of the 17th ACM Web Science Conference 2025. 106–116

  33. [34]

    Bruno Mazorra, Victor Adan, and Vanesa Daza. 2022. Do not rug on me: Zero- dimensional Scam Detection.arXiv preprint arXiv:2201.07220(2022)

  34. [35]

    Galaxy Research. 2025. The State of Memecoins. (September 2025). https: //www.galaxy.com/insights/research/memecoins-pump-fun-solana-kols Ac- cessed: 2025-11-07

  35. [36]

    Solana Documentation. [n.d.]. Accounts — Solana Core Concepts. https://solana. com/docs/core/accounts. Accessed: 2025-12-08

  36. [37]

    Solana Documentation. [n.d.]. Transactions — Solana Core Concepts. https: //solana.com/docs/core/transactions. Accessed: 2025-12-08

  37. [38]

    Uniswap Labs. 2018. Uniswap V1 Protocol Overview. https://docs.uniswap.org/ contracts/v1/overview. Accessed: 2025-12-08

  38. [39]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)

  39. [40]

    Dan Valeriu Voinea. 2025. Pump. Fun And Meme-Coins: A Case Study In The Legal Commodification Of Ponzi-Like Tokenomics.Annals of the University of Craiova for Journalism, Communication and Management11, 1 (2025), 15–38

  40. [41]

    Mark Weber, Giacomo Domeniconi, Jie Chen, Daniel Karl I Weidele, Claudio Bellei, Tom Robinson, and Charles E Leiserson. 2019. Anti-money laundering in bitcoin: Experimenting with graph convolutional networks for financial forensics. arXiv preprint arXiv:1908.02591(2019)

  41. [42]

    Yuexin Xiang, SM Rish, Qishuang Fu, Yuquan Li, Qin Wang, Tsz Hon Yuen, and Jiangshan Yu. 2025. Measuring Memecoin Fragility.arXiv preprint arXiv:2512.00377(2025)

  42. [43]

    Jiahua Xu and Benjamin Livshits. 2019. The Anatomy of a Cryptocurrency {Pump-and-Dump} Scheme. In28th USENIX Security Symposium (USENIX Secu- rity 19). 1609–1625

  43. [44]

    Dmitry Yaremus, Jianghai Li, Alisa Kalacheva, Igor Vodolazov, and Yury Yanovich

  44. [45]

    Detecting Rug Pulls in Decentralized Exchanges: Machine Learning Evi- dence from the TON Blockchain.arXiv preprint arXiv:2509.01168(2025). 10