Recognition: 2 theorem links
· Lean TheoremPumpSense: Real-Time Detection and Target Extraction of Crypto Pump-and-Dumps on Telegram
Pith reviewed 2026-05-12 03:45 UTC · model grok-4.3
The pith
Analyzing individual Telegram messages detects cryptocurrency pump-and-dump schemes in real time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors assemble a corpus of more than 280,000 Telegram posts drawn from 39 pump-organizing groups, with every post manually reviewed to mark the 2,246 pump announcements along with the cryptocurrency and exchange each targets. They frame two tasks: classifying whether a given message window contains a pump announcement and extracting the manipulated coin and trading venue from it. Machine-learning classifiers applied to the text achieve real-time detection on commodity hardware, while large language models prove more reliable than rule-based string matching at resolving which coin and exchange are intended despite ticker overlaps and shorthand. The approach operates on the coordination,
What carries the argument
A manually labeled corpus of Telegram posts paired with classifiers that process single-message windows for pump detection and large language models for coin and exchange extraction.
If this is right
- Detection becomes possible at the moment the coordination message is sent rather than after trading volume spikes.
- Target extraction succeeds even when coin tickers are ambiguous or abbreviated in natural text.
- Responses to schemes can shift from reacting to market moves to intervening on the planning messages themselves.
- Future systems can be compared against a shared benchmark for both announcement spotting and attribute extraction.
Where Pith is reading between the lines
- Platform moderators could run lightweight versions of the detectors to flag or remove suspicious posts in real time.
- The labeled corpus could be extended to test detection across languages or on other messaging apps used for coordination.
- Pairing the message-based signals with live order-book data might allow earlier confirmation of whether a scheme is succeeding.
- Regulators might explore similar text-analysis pipelines to monitor other forms of social-media-driven market manipulation.
Load-bearing premise
The manual labels identifying pump announcements are accurate and representative, and the trained models will continue to work on new groups and messages without major drops in performance.
What would settle it
A pump announcement posted in a previously unseen Telegram group that the detection model labels as ordinary chat or that the extraction model assigns to the wrong coin or exchange.
Figures
read the original abstract
Cryptocurrency pump-and-dump schemes coordinated via Telegram threaten market integrity. However, existing research addressing this specific threat has not yet produced solutions that combine reliable results with fast response. This is in part due to the absence of publicly available, message-level labeled data, as well as design choices. In this paper, we address both issues. In particular, we introduce a corpus of over 280,000 Telegram posts from 39 pump-organizing groups, all manually reviewed to identify 2,246 pump announcements and their targeted cryptocurrency and exchange. Leveraging this dataset, we define two tasks: real-time pump-announcement detection and target cryptocurrency/exchange extraction. For detection, we compare two machine-learning models: a lightweight tree-based LightGBM classifier (F1=0.79, latency=9.4 s/sample) and a transformer-based BGE-M3 (F1=0.83, latency=50 ms/sample). With our proposed approach, we show that message analysis can achieve near-instant pump detection at the level of individual Telegram message windows. Unlike prior work that relies purely on market data and typically detects pumps tens of seconds after abnormal trading activity is observed, our method operates directly on the coordination messages themselves and can be evaluated in microseconds per window on commodity hardware. To our knowledge, we also establish the first benchmark for manipulated coin and exchange extraction. We demonstrate that traditional rule-based extraction methods, widely relied upon in prior literature, are ineffective due to ticker ambiguity. In contrast, LLMs achieve the highest accuracy with a score of 0.91.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a corpus of over 280,000 Telegram posts from 39 pump-organizing groups, manually reviewed to yield 2,246 labeled pump announcements with targeted coins and exchanges. It defines two tasks—real-time pump-announcement detection and target extraction—and evaluates LightGBM (F1=0.79, 9.4s latency) and BGE-M3 (F1=0.83, 50ms latency) for detection plus LLMs (0.91 accuracy) for extraction. The central claims are that message-level analysis enables near-instant detection superior to market-data baselines and that the work provides the first benchmark for coin/exchange extraction, with rule-based methods failing due to ticker ambiguity.
Significance. If the labels prove reliable and the models generalize, the work is significant for supplying the first public message-level labeled dataset for Telegram pump-and-dump research, enabling reproducible NLP-based detection at low latency. It directly contrasts message analysis with slower market-data approaches and demonstrates LLM superiority over rule-based extraction, addressing a practical market-integrity problem. The scale of the corpus and explicit model comparisons are strengths.
major comments (2)
- [Section 3] Dataset construction (Section 3): The manual review process that produced the 2,246 pump announcements provides no annotation guidelines, reviewer count, or inter-annotator agreement statistic (e.g., Cohen’s kappa). Pump identification involves subjective elements such as intent and coordination signals; without these details, label noise cannot be quantified and the reported F1 scores (0.79–0.83) and extraction accuracy (0.91) rest on an unverified foundation.
- [Section 5] Evaluation methodology (Section 5): The detection and extraction results omit the train/test split strategy (temporal vs. group-wise across the 39 groups to avoid leakage), cross-validation procedure, error analysis, and any statistical significance tests. These omissions directly affect confidence in the generalization claim to unseen groups and the assertion that the models outperform prior market-data methods.
minor comments (2)
- [Abstract] Abstract: The statement that the system can be 'evaluated in microseconds per window' conflicts with the reported 50 ms/sample latency for BGE-M3; clarify the end-to-end inference pipeline that would achieve microsecond performance.
- [Section 2] Related work: A brief discussion of any prior extraction benchmarks (even if not message-level) would strengthen the 'first benchmark' claim.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving transparency in dataset construction and evaluation methodology. We address each point below and will revise the paper to incorporate the requested details.
read point-by-point responses
-
Referee: [Section 3] Dataset construction (Section 3): The manual review process that produced the 2,246 pump announcements provides no annotation guidelines, reviewer count, or inter-annotator agreement statistic (e.g., Cohen’s kappa). Pump identification involves subjective elements such as intent and coordination signals; without these details, label noise cannot be quantified and the reported F1 scores (0.79–0.83) and extraction accuracy (0.91) rest on an unverified foundation.
Authors: We agree that the current manuscript omits key details on the annotation process, which limits assessment of label reliability. In the revised version, we will expand Section 3 to include the full annotation guidelines (focusing on explicit pump coordination language, target coin mentions, and exchange signals), the number of reviewers (two authors performed independent labeling), and how conflicts were resolved through discussion. We will also report inter-annotator agreement using Cohen’s kappa on a subset of the data. These additions will allow readers to better evaluate potential label noise. revision: yes
-
Referee: [Section 5] Evaluation methodology (Section 5): The detection and extraction results omit the train/test split strategy (temporal vs. group-wise across the 39 groups to avoid leakage), cross-validation procedure, error analysis, and any statistical significance tests. These omissions directly affect confidence in the generalization claim to unseen groups and the assertion that the models outperform prior market-data methods.
Authors: We acknowledge that the manuscript does not explicitly describe the evaluation protocol. To prevent leakage from repeated groups, we employed a group-wise split across the 39 Telegram channels (approximately 70/30 train/test allocation with no group overlap). In the revision, we will add: (1) explicit description of this group-wise strategy, (2) results from 5-fold cross-validation, (3) a dedicated error analysis section breaking down false positives/negatives by message characteristics, and (4) statistical significance tests (e.g., paired t-tests or McNemar’s test) comparing BGE-M3 and LightGBM against the market-data baselines. These changes will strengthen the generalization and performance claims. revision: yes
Circularity Check
No circularity: standard supervised ML pipeline on externally labeled data
full rationale
The paper collects 280k raw Telegram posts, manually reviews them to produce 2,246 labeled pump announcements (input ground truth), then trains and evaluates LightGBM, BGE-M3, and LLMs on held-out message windows for detection (F1 0.79-0.83) and extraction (accuracy 0.91). No equations, fitted parameters, or self-citations reduce the reported metrics to the inputs by construction. Manual labeling is an independent preprocessing step; model performance is measured against those fixed labels without loops or renaming of known results.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Human reviewers correctly identified all 2,246 pump announcements and their targets without systematic bias
- domain assumption The 39 pump-organizing groups are representative of the broader Telegram pump ecosystem
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use LightGBM, a tree-based machine-learning model, trained on TF–IDF representations of message windows... BGE-M3, a BERT-based Transformer encoder... LLMs achieve the highest accuracy with a score of 0.91.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we introduce a corpus of over 280,000 Telegram posts... 2,246 pump announcements... Cohen’s κ=0.96
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Cryptocurrency market manipulation - A systematic literature review,
F. Eigelshoven, A. Ullrich, and D. Parry, “Cryptocurrency market manipulation - A systematic literature review,” in ICIS. Association for Information Systems, 2021
work page 2021
-
[2]
Charting the landscape of online cryptocurrency manipulation,
L. Nizzoli, S. Tardelli, M. Avvenuti, S. Cresci, M. Tesconi, and E. Ferrara, “Charting the landscape of online cryptocurrency manipulation,”IEEE Access, vol. 8, pp. 113 230–113 245, 2020
work page 2020
-
[3]
The doge of wall street: Analysis and detection of pump and dump cryptocurrency manipulations,
M. La Morgia, A. Mei, F. Sassi, and J. Stefa, “The doge of wall street: Analysis and detection of pump and dump cryptocurrency manipulations,”ACM Transactions on Internet Technology, vol. 23, no. 1, pp. 1–28, 2023
work page 2023
-
[4]
Q. Liu, Q. Huang, F. Fan, H. Wu, and X. Tang, “Enhancing meme token market transparency: A multi- dimensional entity-linked address analysis for liquidity risk evaluation,” inProceedings of the IEEE International Conference on Blockchain and Cryptocurrency (ICBC). IEEE, 2025
work page 2025
-
[5]
What drives cryptocur- rency pump and dump schemes: Coin versus market fac- tors?
L. Charfeddine and A. Mahrous, “What drives cryptocur- rency pump and dump schemes: Coin versus market fac- tors?”Finance Research Letters, vol. 67, p. 105861, 2024
work page 2024
-
[6]
The anatomy of a cryptocurrency pump-and-dump scheme,
J. Xu and B. Livshits, “The anatomy of a cryptocurrency pump-and-dump scheme,” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 1609–1625
work page 2019
-
[7]
An examination of the cryp- tocurrency pump-and-dump ecosystem,
J. Hamrick, F. Rouhi, A. Mukherjee, A. Feder, N. Gandal, T. Moore, and M. Vasek, “An examination of the cryp- tocurrency pump-and-dump ecosystem,”Information Pro- cessing & Management, vol. 58, no. 4, p. 102506, 2021
work page 2021
-
[8]
A new wolf in town? pump-and-dump manipulation in cryptocurrency markets,
A. Dhawan and T. J. Putni n, š, “A new wolf in town? pump-and-dump manipulation in cryptocurrency markets,” Review of Finance, vol. 27, no. 3, pp. 935–975, 2023
work page 2023
-
[9]
Pump, dump, and then what? the long-term impact of cryptocurrency pump-and-dump schemes,
J. Clough and M. Edwards, “Pump, dump, and then what? the long-term impact of cryptocurrency pump-and-dump schemes,” ineCrime. IEEE, 2023, pp. 1–17
work page 2023
-
[10]
Tstr for financial fraud: Learning to detect manipulation without real data,
A. Mahrous and R. Di Pietro, “Tstr for financial fraud: Learning to detect manipulation without real data,” in The 6th ACM International Conference on AI in Finance, 2025, pp. 71–79
work page 2025
-
[11]
Lld: A low latency detection solution to thwart cryptocurrency pump & dumps,
A. S. Bello, J. Schneider, and R. Di Pietro, “Lld: A low latency detection solution to thwart cryptocurrency pump & dumps,” in2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). IEEE, 2023, pp. 1–9
work page 2023
-
[12]
Profit or deceit? mitigating pump and dump in defi via graph and contrastive learning,
C. Wu, J. Chen, J. Li, J. Xu, J. Jia, Y . Hu, Y . Feng, Y . Liu, and Y . Xiang, “Profit or deceit? mitigating pump and dump in defi via graph and contrastive learning,” IEEE Transactions on Information Forensics and Security, vol. 20, pp. 8994–9008, 2025
work page 2025
-
[13]
Identifying and analyzing cryptocurrency manipulations in social media,
M. Mirtaheri, S. Abu-El-Haija, F. Morstatter, G. Ver Steeg, and A. Galstyan, “Identifying and analyzing cryptocurrency manipulations in social media,” IEEE Transactions on Computational Social Systems, vol. 8, no. 3, pp. 607–617, 2021
work page 2021
-
[14]
Detecting cryptocurrency pump-and-dump frauds using market and social signals,
H. Nghiem, G. Muric, F. Morstatter, and E. Ferrara, “Detecting cryptocurrency pump-and-dump frauds using market and social signals,”Expert Systems with Applications, vol. 182, p. 115284, 2021
work page 2021
-
[15]
Detecting pump &dump stock market manipulation from online forums,
D. Nam and D. B. Skillicorn, “Detecting pump &dump stock market manipulation from online forums,”Digital Finance, vol. 7, no. 1, pp. 1–20, 2025
work page 2025
-
[16]
Sequence-based target coin prediction for cryptocurrency pump-and- dump,
S. Hu, Z. Zhang, S. Lu, B. He, and Z. Li, “Sequence-based target coin prediction for cryptocurrency pump-and- dump,”Proceedings of the ACM on Management of Data, vol. 1, no. 1, pp. 1–19, 2023
work page 2023
-
[17]
Machine learning- based detection of pump-and-dump schemes in real-time,
M. Bolz, K. Bründler, L. Kane, P. Patsias, L. Tessendorf, K. Gogol, T. Kim, and C. Tessone, “Machine learning- based detection of pump-and-dump schemes in real-time,” arXiv, 2024
work page 2024
-
[18]
Pump and dumps in the bitcoin era: Real time detection of cryptocurrency market manipulations,
M. La Morgia, A. Mei, F. Sassi, and J. Stefa, “Pump and dumps in the bitcoin era: Real time detection of cryptocurrency market manipulations,” inProceedings of International Conference on Computing and Communication Networks (ICCCN). IEEE, 2020, pp. 1–9
work page 2020
-
[19]
Twitter and cryptocurrency pump-and-dumps,
D. Ardia and K. Bluteau, “Twitter and cryptocurrency pump-and-dumps,”International Review of Financial Analysis, vol. 95, p. 103479, 2024
work page 2024
-
[20]
Pump-and-dump detection (master’s thesis) dataset,
T. Pugoev, “Pump-and-dump detection (master’s thesis) dataset,” https://github.com/TimurPugoev/pump-and-d ump-detection-marterthesis, 2024
work page 2024
-
[21]
The pushshift telegram dataset,
J. Baumgartner, S. Zannettou, M. Squire, and J. Blackburn, “The pushshift telegram dataset,” inProceedings of the international AAAI conference on web and social media, vol. 14, 2020, pp. 840–847
work page 2020
-
[22]
Tgdataset: Collecting and exploring the largest telegram channels dataset,
M. La Morgia, A. Mei, and A. M. Mongardini, “Tgdataset: Collecting and exploring the largest telegram channels dataset,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, vol. 1, 2025, pp. 2325–2334
work page 2025
-
[23]
Scalable blockchain analytics: an llm-powered approach,
M. Chegenizadeh, Z. Pang, J. Cao, and L. Azemi, “Scalable blockchain analytics: an llm-powered approach,” in2025 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). IEEE, 2025, pp. 1–5
work page 2025
-
[24]
Telegram, “Telegram terms of service,” https://telegram.org/tos
-
[25]
Telegram api terms of service,
Telegram, “Telegram api terms of service,” https://core.telegram.org/api/terms
-
[26]
Article 85 gdpr: Processing and freedom of expression and information,
European Parliament and of the Council, “Article 85 gdpr: Processing and freedom of expression and information,” https://gdpr-text.com/read/article-85/, 2016
work page 2016
-
[27]
Regulation (eu) 2016/679 of the european parliament and of the council— article 6(1)(f),
E. Parliament and of the Council, “Regulation (eu) 2016/679 of the european parliament and of the council— article 6(1)(f),” 2016
work page 2016
-
[28]
Eight men indicted for $114 million securities fraud scheme orchestrated through social media,
U.S. Department of Justice, Office of Public Affairs, “Eight men indicted for $114 million securities fraud scheme orchestrated through social media,” Dec. 2022, press Release Number: 22-1353. [Online]. Available: https://www.justice.gov/archives/opa/pr/eight-men-ind icted-114-million-securities-fraud-scheme-orchestrate d-through-social-media
work page 2022
-
[29]
PumpSense ICBC 2026 Reproducibility,
A. Mahrous, “PumpSense ICBC 2026 Reproducibility,” 2026. [Online]. Available: https://doi.org/10.5281/zenodo.19616802
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.