TMRugPull: A Temporally Sound Multimodal Dataset for Early RugPull Detection
Pith reviewed 2026-05-21 12:37 UTC · model grok-4.3
The pith
TM-RugPull supplies a strictly time-bound multimodal dataset of 1000 projects with manual full-lifespan labels for early rug pull detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Existing datasets for rug pull identification contain temporal leakage, post-collapse indicators, narrow modality coverage, and ambiguous labels; TM-RugPull corrects these flaws by acquiring three modalities with strict time bounds and supplying labels derived from manual investigation across the entire lifespan of each of the 1000 projects.
What carries the argument
Strict temporal validation of data collection across on-chain, smart-contract, and OSINT modalities together with manual full-lifespan labeling.
If this is right
- Detection models can be trained and tested using only signals that exist before any rug pull occurs.
- Fusion of on-chain, contract metadata, and OSINT signals becomes feasible for improved early warning.
- The dataset spans DeFi, meme, NFT, and celebrity projects, supporting category-aware detection methods.
- Public release of data and codebase enables direct comparison and extension of detection techniques.
Where Pith is reading between the lines
- The curation approach could be replicated for other blockchain threats such as honeypots or exit scams.
- Real-time systems could ingest the same modality streams to issue live alerts on ongoing projects.
- Longitudinal expansion of the dataset would reveal whether rug pull patterns shift with market cycles.
Load-bearing premise
Manual review of each project's complete lifespan yields accurate labels that correctly flag rug pulls without hindsight bias or subjective error.
What would settle it
A detection model trained solely on pre-collapse data from the TM-RugPull dataset and then evaluated on new projects using only information available before collapse would show accuracy no better than random chance.
read the original abstract
Rug pull is a critical attack in the world of blockchain technology. Despite this, the absence of sufficient time-bound and well-structured datasets is considered one of the significant issues faced while identifying early detection. Existing datasets do not provide the solution to this challenge because of temporal leakage or use of post-collapse indicators, insufficient modality coverage, and confusing or partial labels, especially with regards to DeFi tokens. To solve these problems, we present a highly curated and strictly time-bound dataset called TM-RugPull containing 1,000 projects, which include DeFi, meme, NFT, and celebrity token projects. We achieve temporal validation of the dataset by acquiring all three modalities, namely on-chain behavior, smart contract metadata, and OSINT signals. The project labels are provided based on manual investigation for the entire project's lifespan and its collapse. Also, we make our dataset publicly available together with its codebase for data acquisition and feature extraction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents TM-RugPull, a curated multimodal dataset of 1,000 blockchain projects (DeFi, meme, NFT, and celebrity tokens) that includes on-chain behavior, smart contract metadata, and OSINT signals. Labels are assigned via manual investigation of each project's full lifespan and collapse. The central claim is that this dataset is strictly time-bound and solves prior problems of temporal leakage, post-collapse indicators, insufficient modality coverage, and confusing labels in existing rug-pull datasets. The authors release the dataset and associated codebase publicly.
Significance. If the manual labeling process can be shown to produce accurate, reproducible labels without hindsight bias or post-collapse leakage, the dataset would constitute a useful benchmark resource for early rug-pull detection research. Public release of both data and acquisition code is a positive contribution that could facilitate reproducible experiments in blockchain security.
major comments (2)
- Abstract: The assertion that the dataset is 'strictly time-bound' and free of temporal leakage rests entirely on the manual full-lifespan labeling procedure, yet no explicit decision criteria (e.g., liquidity-removal thresholds, developer-wallet heuristics, or pre-collapse cutoff rules), blinding protocol, or inter-annotator agreement statistics are supplied. Without these, it is impossible to verify that labels avoid hindsight bias.
- Abstract: No baseline comparisons, leakage tests, or quantitative validation of the 1,000 labels are reported, leaving the central claim that TM-RugPull solves the temporal and labeling problems of prior datasets unverified from the provided description.
minor comments (1)
- The abstract refers to 'confusing or partial labels' in existing work but does not cite specific examples or demonstrate how the new labeling scheme resolves those ambiguities.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. The comments highlight important aspects of reproducibility and validation that strengthen the manuscript. We respond to each major comment below and have revised the paper accordingly to address the concerns.
read point-by-point responses
-
Referee: Abstract: The assertion that the dataset is 'strictly time-bound' and free of temporal leakage rests entirely on the manual full-lifespan labeling procedure, yet no explicit decision criteria (e.g., liquidity-removal thresholds, developer-wallet heuristics, or pre-collapse cutoff rules), blinding protocol, or inter-annotator agreement statistics are supplied. Without these, it is impossible to verify that labels avoid hindsight bias.
Authors: We agree that explicit documentation of the labeling protocol is required to allow independent verification of temporal soundness and to rule out hindsight bias. The manuscript describes the use of full-lifespan manual investigation but does not enumerate the precise decision rules or agreement metrics. In the revised version we will add a dedicated subsection on labeling methodology that specifies liquidity-removal thresholds, developer-wallet heuristics, pre-collapse data cutoffs, the blinding procedure followed by annotators, and inter-annotator agreement statistics. revision: yes
-
Referee: Abstract: No baseline comparisons, leakage tests, or quantitative validation of the 1,000 labels are reported, leaving the central claim that TM-RugPull solves the temporal and labeling problems of prior datasets unverified from the provided description.
Authors: The dataset's primary value lies in its construction: labels are assigned only after exhaustive review of each project's complete history while restricting all input features to information available before collapse. This design directly targets the temporal-leakage and post-collapse-indicator problems identified in earlier collections. To make this claim more verifiable, the revised manuscript will include a new validation section containing (i) explicit leakage tests confirming that no post-collapse signals entered the feature set or label assignment, (ii) quantitative comparison of label distributions and modality coverage against representative prior datasets, and (iii) baseline early-detection experiments that demonstrate utility on the released data. revision: yes
Circularity Check
No circularity: paper constructs dataset via external manual process, no internal derivation reduces to inputs
full rationale
The manuscript presents TM-RugPull as a curated collection of 1,000 projects with on-chain, smart-contract, and OSINT modalities plus manual full-lifespan labels. No equations, fitted parameters, predictions, or first-principles derivations appear. The central claim of temporal soundness rests on the data-acquisition and labeling procedure itself, which is described as an external manual investigation rather than any self-referential reduction or self-citation chain. This matches the expected non-finding for a dataset-construction paper whose value is in the artifact, not in a derived quantity.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Manual investigation of each project's full lifespan yields accurate and unbiased rug-pull labels.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
A project is labeled as a rug pull if and only if it satisfies all three of the following operational conditions for at least 72 consecutive hours: Zero liquidity, Zero transactional activity, Undefined price/volume.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.