MELT: A Behavioral Trace Dataset for High-Risk Memecoin Launch Detection
Pith reviewed 2026-05-22 10:36 UTC · model grok-4.3
The pith
A dataset of parsed behavioral traces from Solana memecoin launches enables detection of high-risk cases by exposing coordinated insider activity that standard methods miss.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MELT supplies the first behavioral trace dataset for high-risk memecoin launch detection on Solana by converting raw blockchain transactions into typed records and bundle traces that link accounts under common control, revealing average coordinated holdings of 36.5 percent of supply and providing 122 features with risk annotations that allow machine learning models to flag threats and cut losses when used in selection strategies.
What carries the argument
Bundle-trace data that links accounts controlled by the same entity and reveals concealed ownership concentration across the parsed transaction records.
If this is right
- Supervised models trained on the 122 features can operate at population scale to classify new launches by risk level.
- Incorporating model outputs into a simple buy-selection rule produces a concrete reduction in realized investment losses.
- Bundle traces expose that coordinated accounts systematically mask supply concentration from ordinary buyers.
Where Pith is reading between the lines
- The same trace-parsing approach could be adapted to flag coordination patterns in other Solana-based token activities.
- Real-time versions of these features might allow launchpad platforms to surface risk signals to users before trading opens.
- The dataset structure supports longitudinal studies of how launch mechanics affect the prevalence of bundled accounts.
Load-bearing premise
The parsed behavioral records and risk annotations accurately capture real insider coordination and launch risk without substantial errors from heterogeneous blockchain data or labeling mistakes.
What would settle it
Run the trained models on a fresh set of memecoin launches after the dataset cutoff and check whether launches flagged as high-risk produce measurably higher loss rates or rug events than those flagged as low-risk.
Figures
read the original abstract
Launchpads have become the dominant mechanism for issuing memecoins, exposing investors to a new class of high-risk launches that existing rug-pull detection methods cannot capture. We argue that detecting these threats requires structured behavioral traces that underlie raw heterogeneous blockchain data, i.e., how insiders accumulate, coordinate, and unwind positions. To enable such analysis, we introduce MELT (MEmecoin Launch Trace, the first behavioral trace dataset for analyzing and detecting high-risk memecoin launches on Solana. MELT covers 41k+ memecoin launches with 200M+ transactions parsed into typed behavioral records that distinguish swaps, wash trades, transfers, and mints. Beyond per-account behaviors, MELT contributes bundle-trace data that links accounts controlled by the same entity, revealing that, on average, 36.5% of token supply is held by coordinated accounts, a concealment strategy that disguises the true ownership concentration from unsuspecting buyers. On top of these traces, MELT provides 122 behavioral features and risk-level annotations, enabling supervised learning at a population scale. We benchmark representative ML models on the high-risk launch detection task. Integrating their predictions into a simple memecoin selection strategy reduces investment loss significantly, demonstrating that behavioral traces can be translated into risk mitigation. Our dataset and code is available at https://github.com/git-disl/MELT.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MELT, the first behavioral trace dataset for high-risk memecoin launch detection on Solana, covering 41k+ launches with 200M+ transactions parsed into typed records (swaps, wash trades, transfers, mints). It supplies bundle-trace data linking coordinated accounts (revealing on average 36.5% of supply held by such entities), 122 behavioral features, and risk-level annotations to support supervised learning. Representative ML models are benchmarked on high-risk detection, and their predictions are shown to reduce investment loss when integrated into a simple selection strategy.
Significance. If the annotations prove reliable, the work supplies a large-scale, publicly released resource (with code at the linked GitHub) that enables population-level study of insider coordination and behavioral risk signals in memecoins, a domain where existing rug-pull detectors fall short. The coordinated-account statistic and the empirical loss-reduction result are concrete strengths that could inform both research and practical mitigation strategies.
major comments (2)
- [§4 (Annotation and Labeling)] §4 (Annotation and Labeling): The risk-level annotations central to the supervised benchmarks and loss-reduction demonstration are described without reporting inter-annotator agreement, external validation against known rug-pull cases, or confirmation that labels exclude post-launch information (e.g., subsequent large sells or rug outcomes). This leaves open the possibility that the reported performance gains rest on circular or noisy labels rather than launch-time behavioral signals.
- [§5 (ML Benchmarks and Strategy Evaluation)] §5 (ML Benchmarks and Strategy Evaluation): The claim that integrating model predictions reduces investment loss significantly lacks an ablation that removes potential post-hoc label information or compares against baselines using only pre-launch features. Without such controls, it is unclear whether the loss reduction follows from the 122 behavioral features or from artifacts in the heterogeneous Solana transaction parsing.
minor comments (2)
- [Abstract] The abstract states that the strategy 'reduces investment loss significantly' but supplies no quantitative figures, confidence intervals, or baseline comparisons; adding these would strengthen the claim.
- [§3 (Behavioral Traces)] Clarify the exact heuristics used to construct bundle traces and link accounts controlled by the same entity; any assumptions about wallet clustering should be stated explicitly.
Simulated Author's Rebuttal
We are grateful to the referee for their thorough review and valuable suggestions that help clarify the strengths and limitations of our work. We address each major comment below and have revised the manuscript to improve methodological transparency where possible.
read point-by-point responses
-
Referee: [§4 (Annotation and Labeling)] §4 (Annotation and Labeling): The risk-level annotations central to the supervised benchmarks and loss-reduction demonstration are described without reporting inter-annotator agreement, external validation against known rug-pull cases, or confirmation that labels exclude post-launch information (e.g., subsequent large sells or rug outcomes). This leaves open the possibility that the reported performance gains rest on circular or noisy labels rather than launch-time behavioral signals.
Authors: We agree that greater detail on the annotation process is warranted. The risk labels were assigned by two domain experts who examined only behavioral traces observable at launch and within the first 24 hours, using criteria focused on coordination signals, wash-trade patterns, and initial supply distribution. We have revised §4 to expand the annotation guidelines, report inter-annotator agreement computed on a sampled subset, and explicitly confirm that no information from events after the 24-hour window (including later sells or rug outcomes) was used. A direct external validation against a complete registry of rug-pulls is not feasible given the absence of an authoritative ground-truth database for this emerging domain; however, we have added qualitative alignment checks against several publicly reported high-risk launches. revision: partial
-
Referee: [§5 (ML Benchmarks and Strategy Evaluation)] §5 (ML Benchmarks and Strategy Evaluation): The claim that integrating model predictions reduces investment loss significantly lacks an ablation that removes potential post-hoc label information or compares against baselines using only pre-launch features. Without such controls, it is unclear whether the loss reduction follows from the 122 behavioral features or from artifacts in the heterogeneous Solana transaction parsing.
Authors: We appreciate the call for stronger controls. In the revised §5 we have added an ablation that retrains all models using only features derivable from pre-launch and launch-time transactions, excluding any post-launch data. The loss-reduction result remains statistically significant under this restriction. We have also introduced a comparison against a baseline that uses only conventional pre-launch metrics (initial liquidity, holder count, and developer activity). Our full feature set outperforms this baseline, indicating that the reported gains are attributable to the behavioral traces rather than parsing artifacts. The transaction parsing pipeline is further documented to address potential heterogeneity concerns. revision: yes
Circularity Check
No significant circularity: dataset release with empirical benchmarks
full rationale
The paper introduces a parsed behavioral trace dataset (MELT) covering 41k+ launches and 200M+ transactions, supplies 122 features plus risk-level annotations, and reports standard ML benchmarks plus a downstream selection strategy that reduces loss. No derivation chain, equations, or first-principles results are presented that reduce by construction to fitted inputs, self-citations, or renamed patterns. The central claims rest on external empirical evaluation of the released resource rather than any self-referential loop; the work is therefore self-contained against its stated benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Raw blockchain transaction logs contain sufficient structure to be parsed into distinct behavioral categories (swaps, wash trades, transfers, mints) without significant information loss.
- domain assumption Risk-level annotations provided with the dataset constitute reliable ground truth for supervised learning.
Reference graph
Works this paper leans on
-
[1]
[n.d.]. Program Derived Addresses. https://solana.com/docs/core/cpi#program- derived-addresses. Accessed: 2025-12-01
work page 2025
-
[2]
2025. Memecoin Contagion: Irrationality, Illicit Behaviour, and Market Manipu- lation.Finance Research Letters(2025)
work page 2025
-
[3]
Abdulrahman Alhaidari, Bhavani Kalal, Balaji Palanisamy, and Shamik Sural
-
[4]
InProceedings of the Fifteenth ACM Conference on Data and Application Security and Privacy
SolRPDS: A Dataset for Analyzing Rug Pulls in Solana Decentralized Finance. InProceedings of the Fifteenth ACM Conference on Data and Application Security and Privacy. 293–298
-
[5]
Shaojie Bai. 2018. An Empirical Evaluation of Generic Convolutional and Recur- rent Networks for Sequence Modeling.arXiv preprint arXiv:1803.01271(2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[6]
Ferenc Béres, István A Seres, András A Benczúr, and Mikerah Quintyne-Collins
-
[7]
In2021 IEEE international conference on decentralized applications and infrastructures (DAPPS)
Blockchain is watching you: Profiling and deanonymizing ethereum users. In2021 IEEE international conference on decentralized applications and infrastructures (DAPPS). IEEE, 69–78
-
[8]
Leo Breiman. 2001. Random forests.Machine learning45, 1 (2001), 5–32
work page 2001
-
[9]
Federico Cernera, Massimo La Morgia, Alessandro Mei, and Francesco Sassi
-
[10]
In 32nd USENIX Security Symposium (USENIX Security 23)
Token spammers, rug pulls, and sniper bots: An analysis of the ecosystem of tokens in ethereum and in the binance smart chain ( { { { { {BNB} } } } }). In 32nd USENIX Security Symposium (USENIX Security 23). 3349–3366
-
[11]
Tianqi Chen. 2016. XGBoost: A Scalable Tree Boosting System.Cornell University (2016)
work page 2016
-
[12]
Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078(2014)
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[13]
CoinDesk. 2024. Crypto Investors Lost Over $500M in Memecoin Rug Pulls and Scams in 2024. https://www.coindesk.com/business/2025/02/11/crypto- investors-lost-over-usd500m-in-memecoin-rug-pulls-and-scams-in-2024
work page 2024
-
[14]
CoinDesk. 2025. Pump.fun Hits Back at Report That Claimed 98% of Memecoins on the Platform Are Fraudulent.CoinDesk(7 May 2025). https://www.coindesk.com/business/2025/05/07/98-of-tokens-on-pump- fun-have-been-rug-pulls-or-an-act-of-fraud-new-report-says Accessed: 2025- 11-09
work page 2025
-
[15]
Thomas Conlon and Shaen Corbet. 2025. Memecoin contagion: Irrationality, illicit behaviour, and Cryptocurrency risk.Finance Research Letters(2025), 108264
work page 2025
-
[16]
Patrick Davison. 2012. The language of internet memes.The social media reader 120 (2012), 134
work page 2012
-
[17]
dYdX Foundation. 2024. What Are Bonding Curves, and How Do They Work? https://www.dydx.xyz/crypto-learning/bonding-curve. Accessed: 2025-12-08
work page 2024
-
[18]
Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory.Neural computation9, 8 (1997), 1735–1780
work page 1997
-
[19]
Sihao Hu, Tiansheng Huang, Ka-Ho Chow, Wenqi Wei, Yanzhao Wu, and Ling Liu
-
[20]
InProceedings of the ACM Web Conference 2024
Zipzap: Efficient training of language models for large-scale fraud detection on blockchain. InProceedings of the ACM Web Conference 2024. 2807–2816
work page 2024
-
[21]
Sihao Hu, Tiansheng Huang, Fatih Ilhan, Selim Furkan Tekin, Greg Eisenhauer, Margaret L Loper, and Ling Liu. 2025. Matching Accounts on Blockchain via Pseudo Fine-tuning of Language Models.ACM Transactions on Intelligent Systems and Technology(2025)
work page 2025
-
[22]
Sihao Hu, Tiansheng Huang, Fatih İlhan, Selim Furkan Tekin, and Ling Liu
-
[23]
Large language model-powered smart contract vulnerability detection: New perspectives. In2023 5th IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA). IEEE, 297–306
-
[24]
Sihao Hu, Zhen Zhang, Shengliang Lu, Bingsheng He, and Zhao Li. 2023. Sequence-based target coin prediction for cryptocurrency pump-and-dump.Pro- ceedings of the ACM on Management of Data1, 1 (2023), 1–19
work page 2023
-
[25]
Sihao Hu, Zhen Zhang, Bingqiao Luo, Shengliang Lu, Bingsheng He, and Ling Liu. 2023. Bert4eth: A pre-trained transformer for ethereum fraud detection. In Proceedings of the ACM Web Conference 2023. 2189–2197
work page 2023
-
[26]
Investopedia. 2024. The Hidden Dangers of Buying Meme Coins. https://www. investopedia.com/top-risks-of-buying-meme-coins-8782157
work page 2024
-
[27]
Jito Foundation. 2024. Jito MEV Documentation. https://jito-foundation.gitbook. io/mev. Accessed: 2025-12-08
work page 2024
-
[28]
Josh Kamps and Bennett Kleinberg. 2018. To the moon: defining and detecting cryptocurrency pump-and-dumps.Crime Science7, 1 (2018), 1–18
work page 2018
-
[29]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. Lightgbm: A highly efficient gradient boosting decision tree.Advances in neural information processing systems30 (2017)
work page 2017
- [31]
-
[32]
Tao Li, Donghwa Shin, and Baolian Wang. 2021. Cryptocurrency pump-and- dump schemes.A vailable at SSRN 3267041(2021)
work page 2021
-
[33]
Yueyao Li, Nanjun Yao, Yuhui Huo, and Wei Cai. 2025. Trust Dynamics and Bot-Driven Responses: An Approach to Rug Pulls in Solana Meme Coin Markets. InProceedings of the 17th ACM Web Science Conference 2025. 106–116
work page 2025
- [34]
-
[35]
Galaxy Research. 2025. The State of Memecoins. (September 2025). https: //www.galaxy.com/insights/research/memecoins-pump-fun-solana-kols Ac- cessed: 2025-11-07
work page 2025
-
[36]
Solana Documentation. [n.d.]. Accounts — Solana Core Concepts. https://solana. com/docs/core/accounts. Accessed: 2025-12-08
work page 2025
-
[37]
Solana Documentation. [n.d.]. Transactions — Solana Core Concepts. https: //solana.com/docs/core/transactions. Accessed: 2025-12-08
work page 2025
-
[38]
Uniswap Labs. 2018. Uniswap V1 Protocol Overview. https://docs.uniswap.org/ contracts/v1/overview. Accessed: 2025-12-08
work page 2018
-
[39]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need.Advances in neural information processing systems30 (2017)
work page 2017
-
[40]
Dan Valeriu Voinea. 2025. Pump. Fun And Meme-Coins: A Case Study In The Legal Commodification Of Ponzi-Like Tokenomics.Annals of the University of Craiova for Journalism, Communication and Management11, 1 (2025), 15–38
work page 2025
- [41]
- [42]
-
[43]
Jiahua Xu and Benjamin Livshits. 2019. The Anatomy of a Cryptocurrency {Pump-and-Dump} Scheme. In28th USENIX Security Symposium (USENIX Secu- rity 19). 1609–1625
work page 2019
-
[44]
Dmitry Yaremus, Jianghai Li, Alisa Kalacheva, Igor Vodolazov, and Yury Yanovich
- [45]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.