CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments
Pith reviewed 2026-05-18 20:31 UTC · model grok-4.3
The pith
A conversational AI agent gathers detailed scam intelligence from users to increase enforcement actions by 21% on digital payment platforms.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CASE is an Agentic AI framework that deploys a conversational agent to interview potential victims and elicit detailed scam intelligence in the form of conversation transcripts. These transcripts are processed by another AI system to extract structured data, which augments existing features and results in a 21% uplift in the volume of scam enforcements on Google Pay India.
What carries the argument
The CASE framework, consisting of a conversational agent that proactively interviews users to collect unstructured scam details and a downstream AI extractor that converts them into structured data for enforcement.
If this is right
- Scam prevention can incorporate intelligence gathered directly from user conversations about external scam methodologies.
- Automated enforcement mechanisms gain from structured data extracted from natural language interviews.
- The framework provides a scalable way to manage user scam feedback without manual intervention.
- Similar systems could be built for other sensitive domains to collect intelligence safely.
Where Pith is reading between the lines
- Integrating this with real-time transaction monitoring could allow interrupting scams mid-process.
- Over time, the collected data might reveal evolving scam patterns for predictive modeling.
- User trust in the AI interviewer will be crucial for the quality of elicited information.
Load-bearing premise
That the conversational agent can safely elicit accurate and useful scam intelligence from potential victims without causing distress or receiving unreliable responses that fail to improve enforcement.
What would settle it
Running the CASE system on a payment platform and observing no increase in scam enforcement volume or finding that the collected intelligence is mostly inaccurate or unusable.
Figures
read the original abstract
The proliferation of digital payment platforms has transformed commerce, offering unmatched convenience and accessibility globally. However, this growth has also attracted malicious actors, leading to a corresponding increase in sophisticated social engineering scams. These scams are often initiated and orchestrated on multiple surfaces outside the payment platform, making user and transaction-based signals insufficient for a complete understanding of the scam's methodology and underlying patterns, without which it is very difficult to prevent it in a timely manner. This paper presents CASE (Conversational Agent for Scam Elucidation), a novel Agentic AI framework that addresses this problem by collecting and managing user scam feedback in a safe and scalable manner. A conversational agent is uniquely designed to proactively interview potential victims to elicit intelligence in the form of a detailed conversation. The conversation transcripts are then consumed by another AI system that extracts information and converts it into structured data for downstream usage in automated and manual enforcement mechanisms. Using Google's Gemini family of LLMs, we implemented this framework on Google Pay (GPay) India. By augmenting our existing features with this new intelligence, we have observed a 21% uplift in the volume of scam enforcements. The architecture and its robust evaluation framework are highly generalizable, offering a blueprint for building similar AI-driven systems to collect and manage scam intelligence in other sensitive domains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents CASE (Conversational Agent for Scam Elucidation), an agentic AI framework that deploys a conversational agent to proactively interview potential scam victims on digital payment platforms such as GPay India. The resulting transcripts are processed by a second AI system to extract structured scam intelligence, which is then integrated into existing enforcement mechanisms; the authors report that this augmentation produced a 21% uplift in the volume of scam enforcements.
Significance. A rigorously validated version of this framework could meaningfully advance scam prevention by capturing intelligence from surfaces outside the payment platform itself. The architecture is presented as generalizable to other sensitive domains and leverages production LLMs, which are strengths if the empirical attribution holds. At present, however, the lack of evaluation details prevents a strong assessment of significance.
major comments (1)
- Abstract: the headline claim of a '21% uplift in the volume of scam enforcements' is presented without any description of the measurement methodology, pre/post time windows, baseline definition, control cohorts, or statistical tests. This is load-bearing for the central contribution and leaves the attribution to CASE unsupported.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address the single major comment below and will incorporate revisions to improve the clarity of our central empirical claim.
read point-by-point responses
-
Referee: Abstract: the headline claim of a '21% uplift in the volume of scam enforcements' is presented without any description of the measurement methodology, pre/post time windows, baseline definition, control cohorts, or statistical tests. This is load-bearing for the central contribution and leaves the attribution to CASE unsupported.
Authors: We agree that the abstract, in its current form, does not supply sufficient methodological context for the reported 21% uplift and that this information is necessary to allow readers to assess the claim. In the revised manuscript we will expand the abstract with a concise description of the evaluation approach, including the pre- and post-deployment observation windows, the definition of the baseline enforcement volume, and the nature of the comparison performed. The detailed evaluation framework, including any limitations arising from the production deployment setting, is already elaborated in the body of the paper; the abstract revision will simply surface the key elements of that framework at the front of the document. revision: yes
Circularity Check
No circularity: empirical observation from deployed system with no derivations or self-referential fits
full rationale
The paper introduces the CASE framework as a conversational agent for eliciting scam intelligence and reports a direct observational result: augmenting existing features produced a 21% uplift in enforcement volume on GPay India. No equations, parameter fittings, predictions, or uniqueness theorems appear in the provided text. The central claim is framed as an outcome measured after deployment rather than a quantity derived from or defined in terms of itself. Self-citations, if present, are not load-bearing for any derivation chain, and the architecture is described as generalizable without smuggling ansatzes or renaming known results. This is a standard self-contained systems paper whose result stands or falls on external evidence quality, not internal circular reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Current large language models such as Google's Gemini can reliably conduct interviews and extract structured information from conversations about scams.
Forward citations
Cited by 1 Pith paper
-
ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage
ORACLE is a new agentic framework using adaptive context consolidation and teacher-student distillation to detect emerging scam patterns from incomplete, long-horizon app usage streams across 12 scam types.
Reference graph
Works this paper leans on
-
[1]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team and et al., “Gemini: a family of highly capable multimodal models,” arXiv:2312.11805, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[2]
National payments corporation of india, https://www.npci.org.in/, Au- gust 2025
work page 2025
-
[3]
S. Rogers, “International scammers steal over $1 trillion in 12 Months in global state of scams report 2024, GASA,” Nov 2024. [Online]. Available: https://www.gasa.org/post/global-state-of-scams-report-2024- 1-trillion-stolen-in-12-months-gasa-feedzai
work page 2024
-
[4]
L. Weidinger et al., “Holistic safety and responsibility evaluations of advanced AI models,” arXiv preprint arXiv:2404.14068 , 2024
-
[5]
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
D. Ganguli et al., “Red Teaming language models to reduce harms: methods, scaling behaviors, and lessons learned,” arXiv preprint arXiv:2209.07858, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[6]
A survey of information extraction based on deep learning,
Y . Yang, Z. Wu, Y . Yang, S. Lian, F. Guo and Z. Wang, “A survey of information extraction based on deep learning,” Applied Sciences vol. 12, no. 19, p. 9691, 2022
work page 2022
-
[7]
Using large language models for goal-oriented dialogue systems,
L. Legashev, A. Shukhman, V . Badikov and V . Kurynov, “Using large language models for goal-oriented dialogue systems,” Appl. Sci. , vol. 15, no. 9, 4687, 2025, doi: 10.3390/app15094687
-
[8]
Artificial Intelligence and machine learning in fraud detection for digital payments,
A. Davitaia, “Artificial Intelligence and machine learning in fraud detection for digital payments,” International Journal of Science and Research Archive, vol. 15, no. 3, pp. 714-719, 2025
work page 2025
-
[9]
Trust & Safety of LLMs and LLMs in Trust & Safety,
D. You and D. Chon, “Trust & Safety of LLMs and LLMs in Trust & Safety,” arXiv preprint arXiv:2412.02113 , 2025
-
[10]
L. S. Kushwah, “Enhancing Payment Ecosystems with AI/ML: Real- Time Analytics for Fraud Prevention and User Insights,” World Journal of Advanced Research and Reviews, vol. 26, no. 1, pp. 2124-2132, 2025
work page 2025
-
[11]
Safety by measurement: A systematic literature review of AI safety evaluation methods,
M. Grey and C. Segerie, “Safety by measurement: A systematic literature review of AI safety evaluation methods,” arXiv preprint arXiv:2505.05541, 2025
-
[12]
Foundational Autoraters: Taming Large Language Models for better automatic evaluation,
T. Vu, K. Krishna, S. Alzubi, C. Tar, M. Faruqui and Y . Sung, “Foundational Autoraters: Taming Large Language Models for better automatic evaluation,” arXiv preprint arXiv:2407.10817 , 2024
-
[13]
Responsible artificial intelligence governance: A review and research framework,
E. Papagiannidis, P. Mikalef and K. Conboy, “Responsible artificial intelligence governance: A review and research framework,”The Journal of Strategic Information Systems , vol. 34, no. 2, p. 101885, 2025
work page 2025
-
[14]
Digital payments and GDP growth: A behavioural quantitative analysis,
A. Birigozzi, C. De Silva and P. Luitel, “Digital payments and GDP growth: A behavioural quantitative analysis,” Research in International Business and Finance , vol. 75, p. 102768, 2025
work page 2025
-
[15]
India’s UPI revolution: over 18 billion transactions every month, a global leader in fast payments,
Press Information Bureau, Government of India, “India’s UPI revolution: over 18 billion transactions every month, a global leader in fast payments,” Jul. 20, 2025. [Online]. Available: https://www.pib.gov.in/PressNoteDetails.aspx?NoteId=154912
work page 2025
-
[16]
B. Abeysinghe and R. Circi, “The challenges of evaluating LLM appli- cations: An analysis of automated, human, and LLM-based approaches,” arXiv preprint arXiv:2406.03339 , 2024
-
[17]
A comprehensive survey of cybercrimes in India over the last decade,
S. S. Tripathy, “A comprehensive survey of cybercrimes in India over the last decade,” International Journal of Science and Research Archive, vol. 13, no. 1, pp. 2360–2374, 2024
work page 2024
-
[18]
Scams and frauds in the digital age: ML-based detection and prevention strategies,
S. V . J. Kolupuri, A. Paul, R. S. Bhowmick and I. Ganguli, “Scams and frauds in the digital age: ML-based detection and prevention strategies,”26th International Conference on Distributed Computing and Networking (ICDCN ’25), pp. 340–345, 2025
work page 2025
-
[19]
S. Agarwal, G. Suarez-Tangil and M. Vasek, “An overview of 7726 user reports: uncovering SMS scams and scammer strategies,” arXiv preprint arXiv:2508.05276, 2025
-
[20]
E. S. Kasim, S. Muda, N. M. Zin, H. M. Padil, N. Ismail and S. N. S. Yusuf, “Combating investment scams: insights from law enforcement and civil society toward a prevention framework,” Journal of Crimino- logical Research, Policy and Practice , 2025, doi: 10.1108/JCRPP-04- 2025-0030
-
[21]
G. Graham, T. M. Nisar, G. Prabhakar, R. Meriton and S. Malik, “Chatbots in customer service within banking and finance: Do chatbots herald the start of an AI revolution in the corporate world?,” Computers in Human Behavior , vol. 165, p. 108570, 2025
work page 2025
-
[22]
Decoding user concerns in AI health chatbots: An exploration of security and privacy in app reviews,
M. Hassan, A. Ghani, M. F. Zaffar and M. Bashir, “Decoding user concerns in AI health chatbots: An exploration of security and privacy in app reviews,” arXiv preprint arXiv:2502.00067 , 2025
-
[23]
Enhancing trust and safety in digital pay- ments: an LLM-powered approach,
D. Dahiphale et al., “Enhancing trust and safety in digital pay- ments: an LLM-powered approach,” 2024 IEEE International Confer- ence on Big Data (BigData) , 2024, pp. 4854–4863, doi: 10.1109/Big- Data62323.2024.10825105
-
[24]
Large Language Models for generative information extraction: A survey,
D. Xu et al., “Large Language Models for generative information extraction: A survey,” arXiv preprint arXiv:2312.17617 , 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.