pith. sign in

arxiv: 2605.16656 · v1 · pith:A5GFGOONnew · submitted 2026-05-15 · 💻 cs.CR · cs.CY

Read This Paper to Get 50 Million:* An Analysis of Mobile Messaging Scams Using Reddit Data

Pith reviewed 2026-05-20 15:58 UTC · model grok-4.3

classification 💻 cs.CR cs.CY
keywords mobile messaging scamsSMS fraudreply-based scamsclick-based scamsscam detection toolsReddit user reportsphishing trendscybersecurity measurement
0
0 comments X

The pith

Reply-based mobile messaging scams grow nearly twice as fast as click-based ones and evade off-the-shelf detectors.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper gathers 175430 user-reported mobile messaging scams posted on Reddit from June 2020 through December 2025. It separates them into reply-based scams, which ask the recipient to text back, and click-based scams, which direct users to links or calls. Reply-based scams make up half the reports yet expand at a 99.98 percent compound annual growth rate, almost double the 57.29 percent rate for click-based scams. Even though messages within each category share consistent text patterns and phone-number origins, commercial and open-source detectors perform worst on the reply-based group. The results point to the need for detectors that handle this faster-growing category more effectively.

Core claim

Analysis of the Reddit dataset shows that reply-based scams constitute 50 percent of reports and exhibit a compound annual growth rate of 99.98 percent, nearly twice that of click-based scams at 57.29 percent, while current off-the-shelf detection tools achieve their lowest performance on reply-based messages despite measurable similarities in text content and phone-number sources within categories.

What carries the argument

Large-scale collection and categorization of Reddit user reports into reply-based versus click-based mobile messaging scams, followed by measurement of compound annual growth rates and direct testing of commercial and open-source detector accuracy on shared attributes such as text phrasing and originating phone numbers.

If this is right

  • Reply-based scams require prioritized attention because their faster growth is shifting the overall threat composition.
  • Consistent text and phone-number patterns within each scam category offer usable signals for improved detection rules.
  • Existing commercial and open-source tools leave measurable gaps that allow reply-based campaigns to succeed at higher rates.
  • The measured growth rates imply that scam operations are scaling rapidly and will continue to outpace static detectors without updates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Security teams could group campaigns by reply versus click behavior to allocate blocking resources more efficiently.
  • Detection systems may gain accuracy by modeling short reply conversations rather than treating each message in isolation.
  • Public awareness campaigns could emphasize caution with any message that requests a direct response, given the steeper rise in that category.

Load-bearing premise

Reddit posts supply a representative and correctly labeled sample of actual mobile messaging scams with little selection bias or confusion between reply-based and click-based types.

What would settle it

A carrier-level or large-scale user survey of real SMS and messaging traffic over the same years that finds materially different growth rates between reply-based and click-based scams or markedly higher detector success rates on reply-based examples.

Figures

Figures reproduced from arXiv: 2605.16656 by Allison Lu, Bernardo B. P. Medeiros, Kevin R. B. Butler, Patrick Traynor.

Figure 1
Figure 1. Figure 1: Numerous scam campaigns originate from scam [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Diagram of our data selection and processing [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Year-by-year message volume by scam category, showing an increasing trend for all scams. The most common scam [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Among persistent click-based categories, Account Pay [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Numerous global shipping/delivery companies are [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 5
Figure 5. Figure 5: • Reddit Modmail Scam: These scams are a part of the Prize/Gift scam category, containing the least categorical similarity, but still following similar scripts. This scam leads Reddit users to a dating site using a link, typically ending with the phrase “this is not a scam." The introduction of highly templated scam campaigns across scam categories shows that, even as tactics evolve, scammers con￾tinue to … view at source ↗
Figure 6
Figure 6. Figure 6: Fraudulent phone number origin distribution. Most numbers are from the U.S. and Canada, but the origin distribution [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: The click-based E-Commerce and Account Paymen [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: LLMs exhibit mixed performance when classifying [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Domains are diverse and messages often use link [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
read the original abstract

Mobile messaging scams--fraudulent messages delivered over SMS and other mobile applications--have become a persistent and evolving security threat, yet the attributes underlying these campaigns remain unclear. This study seeks to address this gap by examining trends in mobile messaging scams and testing the effectiveness of commercial and open-source off-the-shelf detection tools. We characterize mobile messaging scam operations, focusing on how phone numbers, URLs, and text content are used across campaigns. To achieve this objective, we collect and measure a dataset of 175,430 user-reported mobile messaging scams from Reddit between June 2020 and December 2025. While reply-based scams constitute only 50% of our dataset, their compound annual growth rate (99.98%) is nearly twice that of click-based scams (57.29%). Critically, reply-based scams also show the lowest detector performance--despite identifiable similarities in text content and phone number origin within categories--indicating that current off-the-shelf tools are ineffective. These results suggest that further development of detectors is necessary to defend against this rapidly changing ecosystem. By examining a range of message attributes, this work provides new insights into mobile messaging scams, informing the design of more targeted and robust detection methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper collects and analyzes a dataset of 175,430 user-reported mobile messaging scams from Reddit spanning June 2020 to December 2025. It partitions the reports into reply-based and click-based categories, finding that reply-based scams comprise 50% of the data yet exhibit a compound annual growth rate of 99.98% (nearly double the 57.29% for click-based scams). The study further tests commercial and open-source off-the-shelf detectors, reports the lowest performance on reply-based scams despite observable similarities in text content and phone-number origins within categories, and concludes that current tools are ineffective, motivating improved detection methods.

Significance. If the empirical trends and detector comparisons hold after validation, the work supplies a large-scale, directly measured view of mobile scam evolution that could guide targeted detector design. The scale of the Reddit corpus and the explicit CAGR comparison between scam types constitute measurable, falsifiable observations that other researchers could replicate or refute with independent data sources.

major comments (2)
  1. [Data collection and classification] Data collection and classification section: The central claims (50% share, 99.98% vs. 57.29% CAGR, and lowest detector performance for reply-based scams) rest on the accuracy of partitioning the 175,430 reports into reply-based versus click-based categories. The manuscript provides no explicit classification rules, inter-rater reliability statistics, or ground-truth validation against actual message flows or carrier logs; without these, systematic mislabeling or selection effects cannot be excluded and directly weaken the comparative growth-rate and ineffectiveness conclusions.
  2. [Detector evaluation] Detector evaluation subsection: The assertion that reply-based scams exhibit the lowest detector performance is load-bearing for the recommendation to develop new tools, yet the text does not report per-category quantitative metrics (precision, recall, or F1), the exact set of tools and versions tested, or the decision thresholds applied. This omission prevents assessment of whether the performance gap is statistically or practically significant.
minor comments (2)
  1. [Abstract] The abstract states the dataset spans 'June 2020 and December 2025' but does not clarify whether the endpoint is inclusive or how partial-year data for 2025 were annualized for CAGR computation.
  2. [Figures and tables] Figure captions and table headers should explicitly define 'reply-based' and 'click-based' to avoid reader ambiguity when interpreting the growth-rate and detector results.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript. We address each of the major comments below and describe the changes we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Data collection and classification] Data collection and classification section: The central claims (50% share, 99.98% vs. 57.29% CAGR, and lowest detector performance for reply-based scams) rest on the accuracy of partitioning the 175,430 reports into reply-based versus click-based categories. The manuscript provides no explicit classification rules, inter-rater reliability statistics, or ground-truth validation against actual message flows or carrier logs; without these, systematic mislabeling or selection effects cannot be excluded and directly weaken the comparative growth-rate and ineffectiveness conclusions.

    Authors: We agree that explicit classification rules should have been included. The reports were classified as reply-based if the scam message requested the recipient to respond via SMS or call a provided phone number, and as click-based if it directed the user to click a link, visit a website, or provide information through a form. We will add a new subsection in the revised manuscript detailing these rules with illustrative examples from the dataset. For inter-rater reliability, the classification was conducted by the lead author following these rules; we will note this as a limitation and, if feasible, have a second author independently classify a random sample of 500 reports to compute Cohen's kappa. Regarding ground-truth validation using carrier logs or actual message flows, this is not possible in our setting due to privacy laws and the absence of access to such proprietary data. We will expand the limitations section to discuss potential selection biases inherent in Reddit-reported data. revision: partial

  2. Referee: [Detector evaluation] Detector evaluation subsection: The assertion that reply-based scams exhibit the lowest detector performance is load-bearing for the recommendation to develop new tools, yet the text does not report per-category quantitative metrics (precision, recall, or F1), the exact set of tools and versions tested, or the decision thresholds applied. This omission prevents assessment of whether the performance gap is statistically or practically significant.

    Authors: We concur that detailed metrics are necessary for a rigorous evaluation. In the revised version, we will add a table presenting precision, recall, and F1 scores broken down by scam category (reply-based and click-based) for each detector. We will also explicitly list the commercial and open-source tools tested along with their versions and describe the decision thresholds or classification criteria used by each tool. This will enable a clear evaluation of the performance differences. revision: yes

standing simulated objections not resolved
  • Ground-truth validation against actual message flows or carrier logs cannot be provided, as we do not have access to such data and it would violate user privacy regulations.

Circularity Check

0 steps flagged

No significant circularity in direct empirical measurement of Reddit scam reports

full rationale

The paper collects a dataset of 175430 user-reported mobile messaging scams from Reddit over a fixed time window and performs direct statistical computations on observable attributes such as temporal distribution for CAGR, text/phone-number similarities within categories, and detector accuracy on the collected messages. No modeling equations, fitted parameters presented as predictions, self-referential definitions, or load-bearing self-citations appear in the derivation chain. All central claims (50% reply-based share, 99.98% vs 57.29% CAGR, lowest detector performance) are straightforward summaries of the raw collected data without reduction to inputs by construction. The analysis remains self-contained against external benchmarks because it relies on measurable message attributes rather than any theoretical loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis is an empirical measurement study whose conclusions depend on the assumption that Reddit user reports are a reliable proxy for scam prevalence and characteristics; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Reddit user reports accurately reflect real-world mobile messaging scam activities and can be reliably partitioned into reply-based and click-based categories
    The growth-rate calculations and detector-performance claims rest entirely on this data source and classification step.

pith-pipeline@v0.9.0 · 5760 in / 1319 out tokens · 70051 ms · 2026-05-20T15:58:57.421345+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

113 extracted references · 113 canonical work pages · 3 internal anchors

  1. [1]

    [n. d.]. North American Numbering Plan General Management and Oversight | Federal Communications Commission. https://www.fcc.gov/north-american- numbering-plan-general-management-and-oversight

  2. [2]

    [n. d.]. Twilio help center. https://help.twilio.com/articles/11587910480155- A2P-10DLC-Campaign-Vetting-Changes-January-2023

  3. [3]

    [n. d.]. VirusTotal. https://www.virustotal.com/gui/home/upload Accessed: 2025-11-13

  4. [4]

    Mistral-7B-LLM-Fraud-Detection

    2023. Mistral-7B-LLM-Fraud-Detection. https://huggingface.co/Bilic/Mistral- 7B-LLM-Fraud-Detection Accessed: 2025-11-13

  5. [5]

    Smishing Triad

    2023. "Smishing Triad" Targeted USPS And US Citizens For Data Theft. https: //www.resecurity.com/blog/article/smishing-triad-targeted-usps-and-us- citizens-for-data-theft Accessed: 2025-11-13

  6. [6]

    Reddit Terms of Service

    2025. Reddit Terms of Service. https://redditinc.com/policies/user-agreement Accessed: 2025-11-13

  7. [7]

    Bhupendra Acharya and Thorsten Holz. 2024. An Explorative Study of Pig Butchering Scams. arXiv:2412.15423 [cs.CR] https://arxiv.org/abs/2412.15423

  8. [8]

    Olivia Acland. 2025. I was tricked, tortured, finally freed: Inside a Burmese scam farm. https://www.thetimes.com/world/asia/article/scam-farms-burma- chinese-l35j7jz8g

  9. [9]

    Sadia Afroz and Rachel Greenstadt. 2011. Phishzoo: Detecting phishing websites by looking at them. InProceedings of the 2011 IEEE Fifth International Conference on Semantic Computing. IEEE, 368–375

  10. [10]

    Sharad Agarwal, Emma Harvey, Enrico Mariconti, Guillermo Suarez-Tangil, Marie Vasek, et al. 2025. ‘Hey mum, I dropped my phone down the toilet’: Investigating Hi Mum and Dad SMS Scams in the United Kingdom. InUSENIX Security Symposium

  11. [11]

    Sharad Agarwal, Guillermo Suarez-Tangil, and Marie Vasek. 2025. An Overview of 7726 User Reports: Uncovering SMS Scams and Scammer Strategies. arXiv:2508.05276 [cs.CR] https://arxiv.org/abs/2508.05276

  12. [12]

    Majid Hameed Ahmed, Sabrina Tiun, Nazlia Omar, and Nor Samsiah Sani

  13. [13]

    Applied Sciences13, 1 (2023)

    Short Text Clustering Algorithms, Application and Challenges: A Survey. Applied Sciences13, 1 (2023). doi:10.3390/app13010342

  14. [14]

    Almeida, José María G

    Tiago A. Almeida, José María G. Hidalgo, and Akebo Yamakami. 2011. Con- tributions to the study of SMS spam filtering: new collection and results. In Proceedings of the 11th ACM Symposium on Document Engineering(Mountain View, California, USA)(DocEng ’11). Association for Computing Machinery, New York, NY, USA, 259–262. doi:10.1145/2034691.2034742

  15. [15]

    Arctic Shift. 2024. Arctic Shift Reddit API. https://arctic- shif t.photon- reddit.com. Accessed: 2026-03-25

  16. [16]

    Tom Bartlett. 2025. ‘The Worst Internet-Research Ethics Violation I Have Ever Seen’. https://www.theatlantic.com/technology/archive/2025/05/reddit-ai- persuasion-experiment-ethics/682676/

  17. [17]

    Marzieh Bitaab, Haehyun Cho, Adam Oest, Zhuoer Lyu, Wei Wang, Jorij Abraham, Ruoyu Wang, Tiffany Bao, Yan Shoshitaishvili, and Adam Doupé

  18. [18]

    alien traces

    Beyond Phish: Toward Detecting Fraudulent e-Commerce Websites at Scale. In2023 IEEE Symposium on Security and Privacy (SP). 2566–2583. doi:10.1109/SP46215.2023.10179461

  19. [19]

    Bitdefender. [n. d.]. The anatomy of Illuminati scams: We spoke to the grand masters so you don’t have to. https://www.bitdefender.com/en-us/blog/hotfor security/the-anatomy-of-illuminati-scams-we-spoke-to-the-grand-masters- so-you-dont-have-to Accessed: 2025-11-13

  20. [20]

    It was honestly just gambling

    Elijah Bouma-Sims, Hiba Hassan, Alexandra Nisenoff, Lorrie Faith Cranor, and Nicolas Christin. 2024. "It was honestly just gambling": Investigating the Experiences of Teenage Cryptocurrency Users on Reddit. InTwentieth Symposium on Usable Privacy and Security (SOUPS 2024). USENIX Association, Philadelphia, PA, 333–352. https://www.usenix.org/conference/so...

  21. [21]

    Is this a scam?

    Elijah Bouma-Sims, Mandy Lanyon, and Lorrie Faith Cranor. 2025. “Is this a scam?”: The Nature and Quality of Reddit Discussion about Scams(CCS ’25). Association for Computing Machinery, New York, NY, USA

  22. [22]

    Danielle K Brown, Yee Man Margaret Ng, Martin J Riedl, and Ivan Lacasa-Mas

  23. [23]

    Social Media + Society

    Reddit’s veil of anonymity: Predictors of engagement and participation in media environments with hostile reputations."Social Media + Society"4, 4 (2018)

  24. [24]

    Eshwar Chandrasekharan, Mattia Samory, Shagun Jhaver, Hunter Charvat, Amy Bruckman, Cliff Lampe, Jacob Eisenstein, and Eric Gilbert. 2018. The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales.Proc. ACM Hum.-Comput. Interact.2, CSCW, Article 32 (Nov. 2018), 25 pages. doi:10.1145/3274301

  25. [25]

    Bill Chappell. 2024. FBI warns Americans to keep their text messages secure: What to know. https://www.npr.org/2024/12/17/nx-s1-5223490/text- messaging-security-fbi-chinese-hackers-security-encryption

  26. [26]

    Wenbin Chen and Changqing Chen. 2025. Deep Learning-Based Model for Detecting Fraudulent SMS Messages. InProceedings of the 2024 2nd Interna- tional Conference on Information Education and Artificial Intelligence (ICIEAI ’24). Association for Computing Machinery, New York, NY, USA, 346–350. doi:10.1145/3724504.3724561

  27. [27]

    Kevin Collier. 2025. Text scams warning of unpaid road tolls fueled by cybercrim- inal salesmen on Telegram. https://www.nbcnews.com/tech/security/unpaid- toll-bill-e-zpass-text-scams-fueled-telegram-salesmen-rcna196347 Accessed: 2025-11-13

  28. [28]

    Anna Coluccia, Andrea Pozza, Fabio Ferretti, Fulvio Carabellese, Alessandra Masti, and Giacomo Gualtieri. 2020. Online Romance Scams: Relational Dynam- ics and Psychological Characteristics of the Victims and Scammers. A Scoping Review.Clinical practice and epidemiology in mental health: CP & EMH16 (2020), 24

  29. [29]

    and Gómez Hidalgo, José María and Sánz, Enrique Puertas

    Cormack, Gordon V. and Gómez Hidalgo, José María and Sánz, Enrique Puertas

  30. [30]

    InProceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management(Lisbon, Portugal)(CIKM ’07)

    Spam filtering for short messages. InProceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management(Lisbon, Portugal)(CIKM ’07). Association for Computing Machinery, New York, NY, USA, 313–320. doi:10.1145/1321440.1321486

  31. [31]

    Ben Cost. 2025. ‘Relentless’ scammers are trying to rip off people by asking to use their pictures for fake ‘art project’ — here’s how. https://nypost.com/2025/ 07/28/lifestyle/fraudsters-target-bank-details-with-fake-art-project-scam/

  32. [32]

    Andrei Costin, Jelena Isacenkova, Marco Balduzzi, Aurélien Francillon, and Davide Balzarotti. 2013. The role of phone numbers in understanding cyber- crime schemes. In2013 Eleventh Annual Conference on Privacy, Security and Trust. 213–220. doi:10.1109/PST.2013.6596056

  33. [33]

    Greta Cross. 2025. Don’t click that link: Authorities warn of new DMV scam texts. https://www.usatoday.com/story/tech/news/2025/05/30/dmv-text- message-scam/83944066007/

  34. [34]

    Tobias Dam, Lukas Daniel Klausner, Damjan Buhov, and Sebastian Schrittwieser

  35. [35]

    InProceed- ings of the 14th International Conference on A vailability, Reliability and Security (Canterbury, CA, United Kingdom)(ARES ’19)

    Large-Scale Analysis of Pop-Up Scam on Typosquatting URLs. InProceed- ings of the 14th International Conference on A vailability, Reliability and Security (Canterbury, CA, United Kingdom)(ARES ’19). Association for Computing Ma- chinery, New York, NY, USA, Article 53, 9 pages. doi:10.1145/3339252.3340332

  36. [36]

    Sarah Jane Delany, Mark Buckley, and Derek Greene. 2012. SMS spam filtering: Methods and data.Expert Systems with Applications39, 10 (2012), 9899–9908. doi:10.1016/j.eswa.2012.02.053

  37. [37]

    Estqlal Hammad Dhah, Mohammed Abdullah Naser, and Suhad A. Ali. 2019. Spam Email Image Classification Based on Text and Image Features. In2019 First International Conference of Computer and Applied Sciences (CAS). 148–153. doi:10.1109/CAS47993.2019.9075725

  38. [38]

    Brian Eyler, Allison Pytlak, Courtney Weatherby, and Shreya Lad. 2024. To Protect Americans, Prioritize Countering Cyber Scam Operations in the Indo- Pacific. https://www.stimson.org/2024/to-protect-americans-prioritize- countering-cyber-scam-operations-in-the-indo-pacific/

  39. [39]

    Polra Victor Falade. 2023. Analysis of 419 Scams: The Trends and New Variants in Emerging Types.Int. J. Sci. Res. in Computer Science and Engineering Vol11, 5 (2023)

  40. [40]

    Casey Fiesler, Michael Zimmer, Nicholas Proferes, Sarah Gilbert, and Naiyan Jones. 2024. Remember the Human: A Systematic Review of Ethical Consider- ations in Reddit Research.Proc. ACM Hum.-Comput. Interact.8, GROUP (Feb. 2024). doi:10.1145/3633070

  41. [41]

    Emily Fishbein. 2024. ‘A Global Monster’: Myanmar-Based Cyber Scams Widen the Net. https://pulitzercenter.org/stories/global-monster-myanmar-based- cyber-scams-widen-net Allison Lu, Bernardo B. P. Medeiros, Kevin R. B. Butler, and Patrick Traynor

  42. [42]

    Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia, and Alok N Choudhary

  43. [43]

    InNDSS, Vol

    Towards Online Spam Filtering in Social Networks. InNDSS, Vol. 12. 1–16

  44. [44]

    Maria Glenski, Emily Saldanha, and Svitlana Volkova. 2019. Characterizing Speed and Scale of Cryptocurrency Discussion Spread on Reddit. InThe World Wide Web Conference(San Francisco, CA, USA)(WWW ’19). Association for Computing Machinery, New York, NY, USA, 560–570. doi:10.1145/3308558.33 13702

  45. [45]

    Karthik Gopalakrishnan, Behnam Hedayatnia, Qinlang Chen, Anna Gottardi, Sanjeev Kwatra, Anu Venkatesh, Raefer Gabriel, and Dilek Hakkani-Tur. 2023. Topical-Chat: Towards Knowledge-Grounded Open-Domain Conversations. arXiv:2308.11995 [cs.CL] https://arxiv.org/abs/2308.11995

  46. [46]

    Yael Grauer. 2025. Text Message Scam Attempts Have Increased by 50 Percent, a Consumer Reports Survey Finds. https://www.consumerreports.org/mo ney/scams-fraud/texting-and-messaging-scam-attempts-increased-by-50- percent-a1001405682/

  47. [47]

    Maarten Grootendorst. 2022. BERTopic: Neural topic modeling with a class- based TF-IDF procedure.arXiv preprint arXiv:2203.05794(2022)

  48. [48]

    Yuting Guo and Abeed Sarker. 2025. Benchmarking Open-Source Large Lan- guage Models on Healthcare Text Classification Tasks. arXiv:2503.15169 [cs.CL] https://arxiv.org/abs/2503.15169

  49. [49]

    Mehul Gupta, Aditya Bakliwal, Shubhangi Agarwal, and Pulkit Mehndiratta

  50. [50]

    In2018 Eleventh International Conference on Contemporary Comput- ing (IC3)

    A Comparative Study of Spam SMS Detection Using Machine Learning Classifiers. In2018 Eleventh International Conference on Contemporary Comput- ing (IC3). 1–7. doi:10.1109/IC3.2018.8530469

  51. [51]

    2023.Individual frauds in China: exploring the impact and response to telecommunication network fraud and pig butchering scams

    Bing Han. 2023.Individual frauds in China: exploring the impact and response to telecommunication network fraud and pig butchering scams. Ph. D. Dissertation. University of Portsmouth Portsmouth, UK

  52. [52]

    Nathan Hart. 2024. Spoofing scams: How to recognize and protect yourself from fake numbers. https://www.dispatch.com/story/news/state/2024/11/07/s poofing-text-message-scams-how-to-block-spam-phone-calls/76112609007/

  53. [53]

    Voelker, and David Wagner

    Grant Ho, Asaf Cidon, Lior Gavish, Marco Schweighauser, Vern Paxson, Stefan Savage, Geoffrey M. Voelker, and David Wagner. 2019. Detecting and Charac- terizing Lateral Phishing at Scale. InProceedings of the 28th USENIX Security Symposium (USENIX Security 19). USENIX Association, Santa Clara, CA, 1273–

  54. [54]

    https://www.usenix.org/conference/usenixsecurity19/presentation/ho

  55. [55]

    Mohamed Houtti, Abhishek Roy, Venkata Narsi Reddy Gangula, and Ashley Walker. 2024. A Survey of Scam Exposure, Victimization, Types, Vectors, and Reporting in 12 Countries.Journal of Online Trust and Safety2, 4 (2024)

  56. [56]

    Christian Hudspeth. 2024. More smishing: Beware of a USPS text messaging scam circulating this holiday season. https://www.ktnv.com/news/more-smishi ng-beware-of-a-usps-text-messaging-scam-circulating-this-holiday-season

  57. [57]

    Liming Jiang. 2024. Detecting Scams Using Large Language Models. arXiv:2402.03147 [cs.CR] https://arxiv.org/abs/2402.03147

  58. [58]

    Chandrasekar V

    Chinthan Kambar, Mrinalini K, and Dr. Chandrasekar V. 2023. Content Based SMS Fraud Detection Using Supervised Learning Approach. https://api.sema nticscholar.org/CorpusID:259923225

  59. [59]

    Michael Kan. 2025. Beware the friendly texts from strangers: US sanctions web host tied to $200m in online scam losses. https://www.pcmag.com/news/treas ury-dept-sanctions-funnull-pig-butchering-fbi-scam-texts

  60. [60]

    Kelly Kendall. 2025. Watch out for unpaid toll text SCAM, NC officials warn. https://www.wxii12.com/article/unpaid-toll-text-scam-targeting-massive- number-of-people-in-nc/64178713

  61. [61]

    Mahmoud Khonji, Youssef Iraqi, and Andrew Jones. 2013. Phishing Detection: A Literature Survey.IEEE Communications Surveys & Tutorials15, 4 (2013), 2091–2121. doi:10.1109/SURV.2013.032213.00009

  62. [62]

    Brian Krebs. 2025. China-based SMS phishing triad pivots to Banks. https://kreb sonsecurity.com/2025/04/china-based-sms-phishing-triad-pivots-to-banks/

  63. [63]

    Alfirna Rizqi Lahitani, Adhistya Erna Permanasari, and Noor Akhmad Setiawan

  64. [64]

    InProceedings of the 2016 4th International Conference on Cyber and IT Service Management

    Cosine similarity to determine similarity measure: Study case in online essay assessment. InProceedings of the 2016 4th International Conference on Cyber and IT Service Management. 1–6. doi:10.1109/CITSM.2016.7577578

  65. [65]

    Medeiros, Kevin Butler, and Patrick Traynor

    Seth Layton, Bernardo B.P. Medeiros, Kevin Butler, and Patrick Traynor. 2026. AI Wrote My Paper and All I Got Was This False Negative: Measuring the Efficacy of Commercial AI Text Detectors. In47th IEEE Symposium on Security and Privacy (SP 2026)

  66. [66]

    Bochmann, Jason Flood, and Iosif-Viorel Onut

    Sophie Le Page, Guy-Vincent Jourdan, Gregor V. Bochmann, Jason Flood, and Iosif-Viorel Onut. 2018. Using URL shorteners to compare phishing and malware attacks. In2018 APWG Symposium on Electronic Crime Research (eCrime). 1–13. doi:10.1109/ECRIME.2018.8376215

  67. [67]

    Kiho Lee, Kyungchan Lim, Hyoungshick Kim, Yonghwi Kwon, and Doowon Kim. 2025. 7 Days Later: Analyzing Phishing-Site Lifespan After Detected. In Proceedings of the ACM on Web Conference 2025(Sydney NSW, Australia)(WWW ’25). Association for Computing Machinery, New York, NY, USA, 945–956. doi:10 .1145/3696410.3714678

  68. [68]

    Rui Li, Yongzheng Zhang, Yupeng Tuo, and Peng Chang. 2018. A Novel Method for Detecting Telecom Fraud User. InProceedings of the 2018 3rd International Conference on Information Systems Engineering (ICISE). 46–50. doi:10.1109/ICIS E.2018.00016

  69. [69]

    Xigao Li, Amir Rahmati, and Nick Nikiforakis. 2024. Like, comment, get scammed: Characterizing comment scams on media platforms. Network and Distributed System Security (NDSS) Symposium

  70. [70]

    Xigao Li, Amir Rahmati, and Nick Nikiforakis. 2024. Like, Comment, Get Scammed: Characterizing Comment Scams on Media Platforms. InProceedings of the 31st Network and Distributed Systems Security (NDSS) Symposium. doi:10 .14722/ndss.2024.24060

  71. [71]

    Zhehui Liao, Maria Antoniak, Inyoung Cheong, Evie Yu-Yen Cheng, Ai-Heng Lee, Kyle Lo, Joseph Chee Chang, and Amy X. Zhang. 2024. LLMs as Re- search Tools: A Large Scale Survey of Researchers’ Usage and Perceptions. arXiv:2411.05025 [cs.CL] https://arxiv.org/abs/2411.05025

  72. [72]

    Mingxuan Liu, Yiming Zhang, Baojun Liu, Zhou Li, Haixin Duan, and Donghong Sun. 2021. Detecting and Characterizing SMS Spearphishing Attacks. InPro- ceedings of the 37th Annual Computer Security Applications Conference (ACSAC) (Virtual Event, USA)(ACSAC ’21). Association for Computing Machinery, New York, NY, USA, 930–943. doi:10.1145/3485832.3488012

  73. [73]

    Bernard Marr. 2023. A Short History Of ChatGPT: How We Got To Where We Are Today. https://www.forbes.com/sites/bernardmarr/2023/05/19/a-short- history-of-chatgpt-how-we-got-to-where-we-are-today/

  74. [74]

    Leland McInnes, John Healy, Steve Astels, et al. 2017. hdbscan: Hierarchical density based clustering.J. Open Source Softw.2, 11 (2017), 205

  75. [75]

    Leland McInnes, John Healy, and James Melville. 2018. Umap: Uniform man- ifold approximation and projection for dimension reduction.arXiv preprint arXiv:1802.03426(2018)

  76. [76]

    Alexey N Medvedev, Renaud Lambiotte, and Jean-Charles Delvenne. 2017. The anatomy of Reddit: An overview of academic research.Dynamics on and of Complex Networks III, 183–204

  77. [77]

    Sandhya Mishra and Devpriya Soni. 2020. Smishing Detector: A security model to detect smishing through SMS content analysis and URL behavior analysis. Future Generation Computer Systems108 (2020), 803–815

  78. [78]

    Morium Akter Munny, Mahbub Alam, Sonjoy Kumar Paul, Daniel Timko, Muhammad Lutfor Rahman, and Nitesh Saxena. 2025. Infrastructure Patterns in Toll Scam Domains: A Comprehensive Analysis of Cybercriminal Registration and Hosting Strategies. In2025 APWG Symposium on Electronic Crime Research (eCrime). 1–13. doi:10.1109/eCrime66972.2025.11327851

  79. [79]

    Aleksandr Nahapetyan, Sathvik Prasad, Kevin Childs, Adam Oest, Yeganeh Ladwig, Alexandros Kapravelos, and Bradley Reaves. 2024. On SMS Phishing Tactics and Infrastructure. In2024 IEEE Symposium on Security and Privacy (SP). 1–16. doi:10.1109/SP54263.2024.00169

  80. [80]

    David Nield. 2025. How to Spot and Guard Against Wrong Number Scams. https://www.wired.com/story/how-to-spot-and-guard-against-wrong- number-scams/

Showing first 80 references.