pith. machine review for the scientific record. sign in

arxiv: 2605.09431 · v1 · submitted 2026-05-10 · 💻 cs.CL

Recognition: 2 theorem links

· Lean Theorem

PumpSense: Real-Time Detection and Target Extraction of Crypto Pump-and-Dumps on Telegram

Authors on Pith no claims yet

Pith reviewed 2026-05-12 03:45 UTC · model grok-4.3

classification 💻 cs.CL
keywords cryptocurrencypump and dumpTelegramreal-time detectionmachine learningtarget extractionmarket manipulationsocial media analysis
0
0 comments X

The pith

Analyzing individual Telegram messages detects cryptocurrency pump-and-dump schemes in real time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates a large set of labeled Telegram posts from groups that organize pump schemes and uses it to train models that spot the coordination messages as they appear. By examining the text of posts directly instead of waiting for unusual trading patterns in markets, the method targets detection at the scale of single message windows. It also tests ways to pull out the specific coin and exchange being promoted from those messages, showing that language models handle the ambiguities better than simple pattern matching. A reader would care if this holds because catching the planning step could let platforms or regulators act before the coordinated buying and selling distorts prices. The work sets up the first public benchmarks for both spotting the announcements and identifying their targets.

Core claim

The authors assemble a corpus of more than 280,000 Telegram posts drawn from 39 pump-organizing groups, with every post manually reviewed to mark the 2,246 pump announcements along with the cryptocurrency and exchange each targets. They frame two tasks: classifying whether a given message window contains a pump announcement and extracting the manipulated coin and trading venue from it. Machine-learning classifiers applied to the text achieve real-time detection on commodity hardware, while large language models prove more reliable than rule-based string matching at resolving which coin and exchange are intended despite ticker overlaps and shorthand. The approach operates on the coordination,

What carries the argument

A manually labeled corpus of Telegram posts paired with classifiers that process single-message windows for pump detection and large language models for coin and exchange extraction.

If this is right

  • Detection becomes possible at the moment the coordination message is sent rather than after trading volume spikes.
  • Target extraction succeeds even when coin tickers are ambiguous or abbreviated in natural text.
  • Responses to schemes can shift from reacting to market moves to intervening on the planning messages themselves.
  • Future systems can be compared against a shared benchmark for both announcement spotting and attribute extraction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platform moderators could run lightweight versions of the detectors to flag or remove suspicious posts in real time.
  • The labeled corpus could be extended to test detection across languages or on other messaging apps used for coordination.
  • Pairing the message-based signals with live order-book data might allow earlier confirmation of whether a scheme is succeeding.
  • Regulators might explore similar text-analysis pipelines to monitor other forms of social-media-driven market manipulation.

Load-bearing premise

The manual labels identifying pump announcements are accurate and representative, and the trained models will continue to work on new groups and messages without major drops in performance.

What would settle it

A pump announcement posted in a previously unseen Telegram group that the detection model labels as ordinary chat or that the extraction model assigns to the wrong coin or exchange.

Figures

Figures reproduced from arXiv: 2605.09431 by Ahmed Mahrous, Roberto Di Pietro.

Figure 1
Figure 1. Figure 1: Left: Telegram group announcements regarding a [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Confusion matrices for pump-start detection. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Cryptocurrency pump-and-dump schemes coordinated via Telegram threaten market integrity. However, existing research addressing this specific threat has not yet produced solutions that combine reliable results with fast response. This is in part due to the absence of publicly available, message-level labeled data, as well as design choices. In this paper, we address both issues. In particular, we introduce a corpus of over 280,000 Telegram posts from 39 pump-organizing groups, all manually reviewed to identify 2,246 pump announcements and their targeted cryptocurrency and exchange. Leveraging this dataset, we define two tasks: real-time pump-announcement detection and target cryptocurrency/exchange extraction. For detection, we compare two machine-learning models: a lightweight tree-based LightGBM classifier (F1=0.79, latency=9.4 s/sample) and a transformer-based BGE-M3 (F1=0.83, latency=50 ms/sample). With our proposed approach, we show that message analysis can achieve near-instant pump detection at the level of individual Telegram message windows. Unlike prior work that relies purely on market data and typically detects pumps tens of seconds after abnormal trading activity is observed, our method operates directly on the coordination messages themselves and can be evaluated in microseconds per window on commodity hardware. To our knowledge, we also establish the first benchmark for manipulated coin and exchange extraction. We demonstrate that traditional rule-based extraction methods, widely relied upon in prior literature, are ineffective due to ticker ambiguity. In contrast, LLMs achieve the highest accuracy with a score of 0.91.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a corpus of over 280,000 Telegram posts from 39 pump-organizing groups, manually reviewed to yield 2,246 labeled pump announcements with targeted coins and exchanges. It defines two tasks—real-time pump-announcement detection and target extraction—and evaluates LightGBM (F1=0.79, 9.4s latency) and BGE-M3 (F1=0.83, 50ms latency) for detection plus LLMs (0.91 accuracy) for extraction. The central claims are that message-level analysis enables near-instant detection superior to market-data baselines and that the work provides the first benchmark for coin/exchange extraction, with rule-based methods failing due to ticker ambiguity.

Significance. If the labels prove reliable and the models generalize, the work is significant for supplying the first public message-level labeled dataset for Telegram pump-and-dump research, enabling reproducible NLP-based detection at low latency. It directly contrasts message analysis with slower market-data approaches and demonstrates LLM superiority over rule-based extraction, addressing a practical market-integrity problem. The scale of the corpus and explicit model comparisons are strengths.

major comments (2)
  1. [Section 3] Dataset construction (Section 3): The manual review process that produced the 2,246 pump announcements provides no annotation guidelines, reviewer count, or inter-annotator agreement statistic (e.g., Cohen’s kappa). Pump identification involves subjective elements such as intent and coordination signals; without these details, label noise cannot be quantified and the reported F1 scores (0.79–0.83) and extraction accuracy (0.91) rest on an unverified foundation.
  2. [Section 5] Evaluation methodology (Section 5): The detection and extraction results omit the train/test split strategy (temporal vs. group-wise across the 39 groups to avoid leakage), cross-validation procedure, error analysis, and any statistical significance tests. These omissions directly affect confidence in the generalization claim to unseen groups and the assertion that the models outperform prior market-data methods.
minor comments (2)
  1. [Abstract] Abstract: The statement that the system can be 'evaluated in microseconds per window' conflicts with the reported 50 ms/sample latency for BGE-M3; clarify the end-to-end inference pipeline that would achieve microsecond performance.
  2. [Section 2] Related work: A brief discussion of any prior extraction benchmarks (even if not message-level) would strengthen the 'first benchmark' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving transparency in dataset construction and evaluation methodology. We address each point below and will revise the paper to incorporate the requested details.

read point-by-point responses
  1. Referee: [Section 3] Dataset construction (Section 3): The manual review process that produced the 2,246 pump announcements provides no annotation guidelines, reviewer count, or inter-annotator agreement statistic (e.g., Cohen’s kappa). Pump identification involves subjective elements such as intent and coordination signals; without these details, label noise cannot be quantified and the reported F1 scores (0.79–0.83) and extraction accuracy (0.91) rest on an unverified foundation.

    Authors: We agree that the current manuscript omits key details on the annotation process, which limits assessment of label reliability. In the revised version, we will expand Section 3 to include the full annotation guidelines (focusing on explicit pump coordination language, target coin mentions, and exchange signals), the number of reviewers (two authors performed independent labeling), and how conflicts were resolved through discussion. We will also report inter-annotator agreement using Cohen’s kappa on a subset of the data. These additions will allow readers to better evaluate potential label noise. revision: yes

  2. Referee: [Section 5] Evaluation methodology (Section 5): The detection and extraction results omit the train/test split strategy (temporal vs. group-wise across the 39 groups to avoid leakage), cross-validation procedure, error analysis, and any statistical significance tests. These omissions directly affect confidence in the generalization claim to unseen groups and the assertion that the models outperform prior market-data methods.

    Authors: We acknowledge that the manuscript does not explicitly describe the evaluation protocol. To prevent leakage from repeated groups, we employed a group-wise split across the 39 Telegram channels (approximately 70/30 train/test allocation with no group overlap). In the revision, we will add: (1) explicit description of this group-wise strategy, (2) results from 5-fold cross-validation, (3) a dedicated error analysis section breaking down false positives/negatives by message characteristics, and (4) statistical significance tests (e.g., paired t-tests or McNemar’s test) comparing BGE-M3 and LightGBM against the market-data baselines. These changes will strengthen the generalization and performance claims. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised ML pipeline on externally labeled data

full rationale

The paper collects 280k raw Telegram posts, manually reviews them to produce 2,246 labeled pump announcements (input ground truth), then trains and evaluates LightGBM, BGE-M3, and LLMs on held-out message windows for detection (F1 0.79-0.83) and extraction (accuracy 0.91). No equations, fitted parameters, or self-citations reduce the reported metrics to the inputs by construction. Manual labeling is an independent preprocessing step; model performance is measured against those fixed labels without loops or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central results rest on the accuracy of human labeling of pump announcements and standard assumptions of supervised ML generalization from the collected groups to future messages.

axioms (2)
  • domain assumption Human reviewers correctly identified all 2,246 pump announcements and their targets without systematic bias
    The dataset creation step relies entirely on manual review as ground truth.
  • domain assumption The 39 pump-organizing groups are representative of the broader Telegram pump ecosystem
    Generalization claims depend on this sampling assumption.

pith-pipeline@v0.9.0 · 5584 in / 1352 out tokens · 112713 ms · 2026-05-12T03:45:20.815915+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages

  1. [1]

    Cryptocurrency market manipulation - A systematic literature review,

    F. Eigelshoven, A. Ullrich, and D. Parry, “Cryptocurrency market manipulation - A systematic literature review,” in ICIS. Association for Information Systems, 2021

  2. [2]

    Charting the landscape of online cryptocurrency manipulation,

    L. Nizzoli, S. Tardelli, M. Avvenuti, S. Cresci, M. Tesconi, and E. Ferrara, “Charting the landscape of online cryptocurrency manipulation,”IEEE Access, vol. 8, pp. 113 230–113 245, 2020

  3. [3]

    The doge of wall street: Analysis and detection of pump and dump cryptocurrency manipulations,

    M. La Morgia, A. Mei, F. Sassi, and J. Stefa, “The doge of wall street: Analysis and detection of pump and dump cryptocurrency manipulations,”ACM Transactions on Internet Technology, vol. 23, no. 1, pp. 1–28, 2023

  4. [4]

    Enhancing meme token market transparency: A multi- dimensional entity-linked address analysis for liquidity risk evaluation,

    Q. Liu, Q. Huang, F. Fan, H. Wu, and X. Tang, “Enhancing meme token market transparency: A multi- dimensional entity-linked address analysis for liquidity risk evaluation,” inProceedings of the IEEE International Conference on Blockchain and Cryptocurrency (ICBC). IEEE, 2025

  5. [5]

    What drives cryptocur- rency pump and dump schemes: Coin versus market fac- tors?

    L. Charfeddine and A. Mahrous, “What drives cryptocur- rency pump and dump schemes: Coin versus market fac- tors?”Finance Research Letters, vol. 67, p. 105861, 2024

  6. [6]

    The anatomy of a cryptocurrency pump-and-dump scheme,

    J. Xu and B. Livshits, “The anatomy of a cryptocurrency pump-and-dump scheme,” in28th USENIX Security Symposium (USENIX Security 19), 2019, pp. 1609–1625

  7. [7]

    An examination of the cryp- tocurrency pump-and-dump ecosystem,

    J. Hamrick, F. Rouhi, A. Mukherjee, A. Feder, N. Gandal, T. Moore, and M. Vasek, “An examination of the cryp- tocurrency pump-and-dump ecosystem,”Information Pro- cessing & Management, vol. 58, no. 4, p. 102506, 2021

  8. [8]

    A new wolf in town? pump-and-dump manipulation in cryptocurrency markets,

    A. Dhawan and T. J. Putni n, š, “A new wolf in town? pump-and-dump manipulation in cryptocurrency markets,” Review of Finance, vol. 27, no. 3, pp. 935–975, 2023

  9. [9]

    Pump, dump, and then what? the long-term impact of cryptocurrency pump-and-dump schemes,

    J. Clough and M. Edwards, “Pump, dump, and then what? the long-term impact of cryptocurrency pump-and-dump schemes,” ineCrime. IEEE, 2023, pp. 1–17

  10. [10]

    Tstr for financial fraud: Learning to detect manipulation without real data,

    A. Mahrous and R. Di Pietro, “Tstr for financial fraud: Learning to detect manipulation without real data,” in The 6th ACM International Conference on AI in Finance, 2025, pp. 71–79

  11. [11]

    Lld: A low latency detection solution to thwart cryptocurrency pump & dumps,

    A. S. Bello, J. Schneider, and R. Di Pietro, “Lld: A low latency detection solution to thwart cryptocurrency pump & dumps,” in2023 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). IEEE, 2023, pp. 1–9

  12. [12]

    Profit or deceit? mitigating pump and dump in defi via graph and contrastive learning,

    C. Wu, J. Chen, J. Li, J. Xu, J. Jia, Y . Hu, Y . Feng, Y . Liu, and Y . Xiang, “Profit or deceit? mitigating pump and dump in defi via graph and contrastive learning,” IEEE Transactions on Information Forensics and Security, vol. 20, pp. 8994–9008, 2025

  13. [13]

    Identifying and analyzing cryptocurrency manipulations in social media,

    M. Mirtaheri, S. Abu-El-Haija, F. Morstatter, G. Ver Steeg, and A. Galstyan, “Identifying and analyzing cryptocurrency manipulations in social media,” IEEE Transactions on Computational Social Systems, vol. 8, no. 3, pp. 607–617, 2021

  14. [14]

    Detecting cryptocurrency pump-and-dump frauds using market and social signals,

    H. Nghiem, G. Muric, F. Morstatter, and E. Ferrara, “Detecting cryptocurrency pump-and-dump frauds using market and social signals,”Expert Systems with Applications, vol. 182, p. 115284, 2021

  15. [15]

    Detecting pump &dump stock market manipulation from online forums,

    D. Nam and D. B. Skillicorn, “Detecting pump &dump stock market manipulation from online forums,”Digital Finance, vol. 7, no. 1, pp. 1–20, 2025

  16. [16]

    Sequence-based target coin prediction for cryptocurrency pump-and- dump,

    S. Hu, Z. Zhang, S. Lu, B. He, and Z. Li, “Sequence-based target coin prediction for cryptocurrency pump-and- dump,”Proceedings of the ACM on Management of Data, vol. 1, no. 1, pp. 1–19, 2023

  17. [17]

    Machine learning- based detection of pump-and-dump schemes in real-time,

    M. Bolz, K. Bründler, L. Kane, P. Patsias, L. Tessendorf, K. Gogol, T. Kim, and C. Tessone, “Machine learning- based detection of pump-and-dump schemes in real-time,” arXiv, 2024

  18. [18]

    Pump and dumps in the bitcoin era: Real time detection of cryptocurrency market manipulations,

    M. La Morgia, A. Mei, F. Sassi, and J. Stefa, “Pump and dumps in the bitcoin era: Real time detection of cryptocurrency market manipulations,” inProceedings of International Conference on Computing and Communication Networks (ICCCN). IEEE, 2020, pp. 1–9

  19. [19]

    Twitter and cryptocurrency pump-and-dumps,

    D. Ardia and K. Bluteau, “Twitter and cryptocurrency pump-and-dumps,”International Review of Financial Analysis, vol. 95, p. 103479, 2024

  20. [20]

    Pump-and-dump detection (master’s thesis) dataset,

    T. Pugoev, “Pump-and-dump detection (master’s thesis) dataset,” https://github.com/TimurPugoev/pump-and-d ump-detection-marterthesis, 2024

  21. [21]

    The pushshift telegram dataset,

    J. Baumgartner, S. Zannettou, M. Squire, and J. Blackburn, “The pushshift telegram dataset,” inProceedings of the international AAAI conference on web and social media, vol. 14, 2020, pp. 840–847

  22. [22]

    Tgdataset: Collecting and exploring the largest telegram channels dataset,

    M. La Morgia, A. Mei, and A. M. Mongardini, “Tgdataset: Collecting and exploring the largest telegram channels dataset,” inProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, vol. 1, 2025, pp. 2325–2334

  23. [23]

    Scalable blockchain analytics: an llm-powered approach,

    M. Chegenizadeh, Z. Pang, J. Cao, and L. Azemi, “Scalable blockchain analytics: an llm-powered approach,” in2025 IEEE International Conference on Blockchain and Cryptocurrency (ICBC). IEEE, 2025, pp. 1–5

  24. [24]

    Telegram terms of service,

    Telegram, “Telegram terms of service,” https://telegram.org/tos

  25. [25]

    Telegram api terms of service,

    Telegram, “Telegram api terms of service,” https://core.telegram.org/api/terms

  26. [26]

    Article 85 gdpr: Processing and freedom of expression and information,

    European Parliament and of the Council, “Article 85 gdpr: Processing and freedom of expression and information,” https://gdpr-text.com/read/article-85/, 2016

  27. [27]

    Regulation (eu) 2016/679 of the european parliament and of the council— article 6(1)(f),

    E. Parliament and of the Council, “Regulation (eu) 2016/679 of the european parliament and of the council— article 6(1)(f),” 2016

  28. [28]

    Eight men indicted for $114 million securities fraud scheme orchestrated through social media,

    U.S. Department of Justice, Office of Public Affairs, “Eight men indicted for $114 million securities fraud scheme orchestrated through social media,” Dec. 2022, press Release Number: 22-1353. [Online]. Available: https://www.justice.gov/archives/opa/pr/eight-men-ind icted-114-million-securities-fraud-scheme-orchestrate d-through-social-media

  29. [29]

    PumpSense ICBC 2026 Reproducibility,

    A. Mahrous, “PumpSense ICBC 2026 Reproducibility,” 2026. [Online]. Available: https://doi.org/10.5281/zenodo.19616802