TMRugPull: A Temporally Sound Multimodal Dataset for Early RugPull Detection

Bert Lagaisse; Fatemeh Shoaei; Mohammad Pishdar; Mojtaba Karami; Mozafar Bag-Mohammadi

arxiv: 2602.21529 · v3 · pith:ZRUSOXO6new · submitted 2026-02-25 · 💻 cs.CR

TMRugPull: A Temporally Sound Multimodal Dataset for Early RugPull Detection

Fatemeh Shoaei , Mohammad Pishdar , Mozafar Bag-Mohammadi , Mojtaba Karami , Bert Lagaisse This is my paper

Pith reviewed 2026-05-21 12:37 UTC · model grok-4.3

classification 💻 cs.CR

keywords rug pull detectionblockchain securityDeFimultimodal datasettemporal leakagesmart contractsOSINTearly detection

0 comments

The pith

TM-RugPull supplies a strictly time-bound multimodal dataset of 1000 projects with manual full-lifespan labels for early rug pull detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces TM-RugPull as a dataset of 1000 DeFi, meme, NFT, and celebrity token projects to support early rug pull detection. It collects on-chain behavior, smart contract metadata, and OSINT signals under strict temporal constraints so that all information precedes any collapse. Labels result from manual review of each project's complete lifespan rather than partial or post-event observations. Existing datasets allow models to exploit future knowledge or unclear signals, which this construction avoids. The authors release the dataset and associated code for data acquisition and feature extraction.

Core claim

Existing datasets for rug pull identification contain temporal leakage, post-collapse indicators, narrow modality coverage, and ambiguous labels; TM-RugPull corrects these flaws by acquiring three modalities with strict time bounds and supplying labels derived from manual investigation across the entire lifespan of each of the 1000 projects.

What carries the argument

Strict temporal validation of data collection across on-chain, smart-contract, and OSINT modalities together with manual full-lifespan labeling.

If this is right

Detection models can be trained and tested using only signals that exist before any rug pull occurs.
Fusion of on-chain, contract metadata, and OSINT signals becomes feasible for improved early warning.
The dataset spans DeFi, meme, NFT, and celebrity projects, supporting category-aware detection methods.
Public release of data and codebase enables direct comparison and extension of detection techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The curation approach could be replicated for other blockchain threats such as honeypots or exit scams.
Real-time systems could ingest the same modality streams to issue live alerts on ongoing projects.
Longitudinal expansion of the dataset would reveal whether rug pull patterns shift with market cycles.

Load-bearing premise

Manual review of each project's complete lifespan yields accurate labels that correctly flag rug pulls without hindsight bias or subjective error.

What would settle it

A detection model trained solely on pre-collapse data from the TM-RugPull dataset and then evaluated on new projects using only information available before collapse would show accuracy no better than random chance.

read the original abstract

Rug pull is a critical attack in the world of blockchain technology. Despite this, the absence of sufficient time-bound and well-structured datasets is considered one of the significant issues faced while identifying early detection. Existing datasets do not provide the solution to this challenge because of temporal leakage or use of post-collapse indicators, insufficient modality coverage, and confusing or partial labels, especially with regards to DeFi tokens. To solve these problems, we present a highly curated and strictly time-bound dataset called TM-RugPull containing 1,000 projects, which include DeFi, meme, NFT, and celebrity token projects. We achieve temporal validation of the dataset by acquiring all three modalities, namely on-chain behavior, smart contract metadata, and OSINT signals. The project labels are provided based on manual investigation for the entire project's lifespan and its collapse. Also, we make our dataset publicly available together with its codebase for data acquisition and feature extraction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents TM-RugPull, a curated multimodal dataset of 1,000 blockchain projects (DeFi, meme, NFT, and celebrity tokens) that includes on-chain behavior, smart contract metadata, and OSINT signals. Labels are assigned via manual investigation of each project's full lifespan and collapse. The central claim is that this dataset is strictly time-bound and solves prior problems of temporal leakage, post-collapse indicators, insufficient modality coverage, and confusing labels in existing rug-pull datasets. The authors release the dataset and associated codebase publicly.

Significance. If the manual labeling process can be shown to produce accurate, reproducible labels without hindsight bias or post-collapse leakage, the dataset would constitute a useful benchmark resource for early rug-pull detection research. Public release of both data and acquisition code is a positive contribution that could facilitate reproducible experiments in blockchain security.

major comments (2)

Abstract: The assertion that the dataset is 'strictly time-bound' and free of temporal leakage rests entirely on the manual full-lifespan labeling procedure, yet no explicit decision criteria (e.g., liquidity-removal thresholds, developer-wallet heuristics, or pre-collapse cutoff rules), blinding protocol, or inter-annotator agreement statistics are supplied. Without these, it is impossible to verify that labels avoid hindsight bias.
Abstract: No baseline comparisons, leakage tests, or quantitative validation of the 1,000 labels are reported, leaving the central claim that TM-RugPull solves the temporal and labeling problems of prior datasets unverified from the provided description.

minor comments (1)

The abstract refers to 'confusing or partial labels' in existing work but does not cite specific examples or demonstrate how the new labeling scheme resolves those ambiguities.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. The comments highlight important aspects of reproducibility and validation that strengthen the manuscript. We respond to each major comment below and have revised the paper accordingly to address the concerns.

read point-by-point responses

Referee: Abstract: The assertion that the dataset is 'strictly time-bound' and free of temporal leakage rests entirely on the manual full-lifespan labeling procedure, yet no explicit decision criteria (e.g., liquidity-removal thresholds, developer-wallet heuristics, or pre-collapse cutoff rules), blinding protocol, or inter-annotator agreement statistics are supplied. Without these, it is impossible to verify that labels avoid hindsight bias.

Authors: We agree that explicit documentation of the labeling protocol is required to allow independent verification of temporal soundness and to rule out hindsight bias. The manuscript describes the use of full-lifespan manual investigation but does not enumerate the precise decision rules or agreement metrics. In the revised version we will add a dedicated subsection on labeling methodology that specifies liquidity-removal thresholds, developer-wallet heuristics, pre-collapse data cutoffs, the blinding procedure followed by annotators, and inter-annotator agreement statistics. revision: yes
Referee: Abstract: No baseline comparisons, leakage tests, or quantitative validation of the 1,000 labels are reported, leaving the central claim that TM-RugPull solves the temporal and labeling problems of prior datasets unverified from the provided description.

Authors: The dataset's primary value lies in its construction: labels are assigned only after exhaustive review of each project's complete history while restricting all input features to information available before collapse. This design directly targets the temporal-leakage and post-collapse-indicator problems identified in earlier collections. To make this claim more verifiable, the revised manuscript will include a new validation section containing (i) explicit leakage tests confirming that no post-collapse signals entered the feature set or label assignment, (ii) quantitative comparison of label distributions and modality coverage against representative prior datasets, and (iii) baseline early-detection experiments that demonstrate utility on the released data. revision: yes

Circularity Check

0 steps flagged

No circularity: paper constructs dataset via external manual process, no internal derivation reduces to inputs

full rationale

The manuscript presents TM-RugPull as a curated collection of 1,000 projects with on-chain, smart-contract, and OSINT modalities plus manual full-lifespan labels. No equations, fitted parameters, predictions, or first-principles derivations appear. The central claim of temporal soundness rests on the data-acquisition and labeling procedure itself, which is described as an external manual investigation rather than any self-referential reduction or self-citation chain. This matches the expected non-finding for a dataset-construction paper whose value is in the artifact, not in a derived quantity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The dataset's utility rests on the reliability of manual labeling and the feasibility of collecting three modalities without temporal leakage; no free parameters or invented entities are introduced.

axioms (1)

domain assumption Manual investigation of each project's full lifespan yields accurate and unbiased rug-pull labels.
Abstract states labels are 'provided based on manual investigation for the entire project's lifespan and its collapse'.

pith-pipeline@v0.9.0 · 5709 in / 1362 out tokens · 70885 ms · 2026-05-21T12:37:51.390920+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A project is labeled as a rug pull if and only if it satisfies all three of the following operational conditions for at least 72 consecutive hours: Zero liquidity, Zero transactional activity, Undefined price/volume.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.