pith. sign in

arxiv: 2508.19932 · v2 · submitted 2025-08-27 · 💻 cs.AI

CASE: An Agentic AI Framework for Enhancing Scam Intelligence in Digital Payments

Pith reviewed 2026-05-18 20:31 UTC · model grok-4.3

classification 💻 cs.AI
keywords conversational AIscam detectiondigital paymentsagentic AIfraud intelligenceLLM frameworkenforcement uplift
0
0 comments X

The pith

A conversational AI agent gathers detailed scam intelligence from users to increase enforcement actions by 21% on digital payment platforms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CASE, a new agentic AI framework designed to collect scam feedback through proactive conversations with potential victims. Traditional methods using only user reports or transaction data fall short because many scams originate outside the payment platform. The system uses one AI to conduct interviews and another to structure the information for enforcement teams. When implemented on Google Pay in India with Gemini models, this added intelligence led to 21% more scam enforcements. The architecture is presented as generalizable to other domains needing sensitive information collection.

Core claim

CASE is an Agentic AI framework that deploys a conversational agent to interview potential victims and elicit detailed scam intelligence in the form of conversation transcripts. These transcripts are processed by another AI system to extract structured data, which augments existing features and results in a 21% uplift in the volume of scam enforcements on Google Pay India.

What carries the argument

The CASE framework, consisting of a conversational agent that proactively interviews users to collect unstructured scam details and a downstream AI extractor that converts them into structured data for enforcement.

If this is right

  • Scam prevention can incorporate intelligence gathered directly from user conversations about external scam methodologies.
  • Automated enforcement mechanisms gain from structured data extracted from natural language interviews.
  • The framework provides a scalable way to manage user scam feedback without manual intervention.
  • Similar systems could be built for other sensitive domains to collect intelligence safely.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Integrating this with real-time transaction monitoring could allow interrupting scams mid-process.
  • Over time, the collected data might reveal evolving scam patterns for predictive modeling.
  • User trust in the AI interviewer will be crucial for the quality of elicited information.

Load-bearing premise

That the conversational agent can safely elicit accurate and useful scam intelligence from potential victims without causing distress or receiving unreliable responses that fail to improve enforcement.

What would settle it

Running the CASE system on a payment platform and observing no increase in scam enforcement volume or finding that the collected intelligence is mostly inaccurate or unusable.

Figures

Figures reproduced from arXiv: 2508.19932 by Aviral Suri, Bill Cheung, Diksha Bansal, Jose Estevez, Lorenzo Gatto, Nitish Jaipuria, Ramanan Balakrishnan, Shankey Poddar, Zijun Kan.

Figure 1
Figure 1. Figure 1: The process is divided into two primary phases: a [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 1
Figure 1. Figure 1: Overall System Flow [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Conversation Agent Flow used to initiate the flow, this and other payment details are not passed to the LLM. D. Conversational Agent: Safety Filter LLM Given the sensitive nature of financial scam discussions, the CASE framework incorporates a robust, multi-layered safety architecture. This architecture consists of the following layers, which work in concert to protect the user: • Base Model Safeguards: Th… view at source ↗
Figure 3
Figure 3. Figure 3: Conversation Transcript to Information Extractor Structured Output Examples [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Evaluation Process Flow Comparison Pre and Post Production. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Distribution of user sessions based on the number of LLM queries [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

The proliferation of digital payment platforms has transformed commerce, offering unmatched convenience and accessibility globally. However, this growth has also attracted malicious actors, leading to a corresponding increase in sophisticated social engineering scams. These scams are often initiated and orchestrated on multiple surfaces outside the payment platform, making user and transaction-based signals insufficient for a complete understanding of the scam's methodology and underlying patterns, without which it is very difficult to prevent it in a timely manner. This paper presents CASE (Conversational Agent for Scam Elucidation), a novel Agentic AI framework that addresses this problem by collecting and managing user scam feedback in a safe and scalable manner. A conversational agent is uniquely designed to proactively interview potential victims to elicit intelligence in the form of a detailed conversation. The conversation transcripts are then consumed by another AI system that extracts information and converts it into structured data for downstream usage in automated and manual enforcement mechanisms. Using Google's Gemini family of LLMs, we implemented this framework on Google Pay (GPay) India. By augmenting our existing features with this new intelligence, we have observed a 21% uplift in the volume of scam enforcements. The architecture and its robust evaluation framework are highly generalizable, offering a blueprint for building similar AI-driven systems to collect and manage scam intelligence in other sensitive domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper presents CASE (Conversational Agent for Scam Elucidation), an agentic AI framework that deploys a conversational agent to proactively interview potential scam victims on digital payment platforms such as GPay India. The resulting transcripts are processed by a second AI system to extract structured scam intelligence, which is then integrated into existing enforcement mechanisms; the authors report that this augmentation produced a 21% uplift in the volume of scam enforcements.

Significance. A rigorously validated version of this framework could meaningfully advance scam prevention by capturing intelligence from surfaces outside the payment platform itself. The architecture is presented as generalizable to other sensitive domains and leverages production LLMs, which are strengths if the empirical attribution holds. At present, however, the lack of evaluation details prevents a strong assessment of significance.

major comments (1)
  1. Abstract: the headline claim of a '21% uplift in the volume of scam enforcements' is presented without any description of the measurement methodology, pre/post time windows, baseline definition, control cohorts, or statistical tests. This is load-bearing for the central contribution and leaves the attribution to CASE unsupported.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback. We address the single major comment below and will incorporate revisions to improve the clarity of our central empirical claim.

read point-by-point responses
  1. Referee: Abstract: the headline claim of a '21% uplift in the volume of scam enforcements' is presented without any description of the measurement methodology, pre/post time windows, baseline definition, control cohorts, or statistical tests. This is load-bearing for the central contribution and leaves the attribution to CASE unsupported.

    Authors: We agree that the abstract, in its current form, does not supply sufficient methodological context for the reported 21% uplift and that this information is necessary to allow readers to assess the claim. In the revised manuscript we will expand the abstract with a concise description of the evaluation approach, including the pre- and post-deployment observation windows, the definition of the baseline enforcement volume, and the nature of the comparison performed. The detailed evaluation framework, including any limitations arising from the production deployment setting, is already elaborated in the body of the paper; the abstract revision will simply surface the key elements of that framework at the front of the document. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observation from deployed system with no derivations or self-referential fits

full rationale

The paper introduces the CASE framework as a conversational agent for eliciting scam intelligence and reports a direct observational result: augmenting existing features produced a 21% uplift in enforcement volume on GPay India. No equations, parameter fittings, predictions, or uniqueness theorems appear in the provided text. The central claim is framed as an outcome measured after deployment rather than a quantity derived from or defined in terms of itself. Self-citations, if present, are not load-bearing for any derivation chain, and the architecture is described as generalizable without smuggling ansatzes or renaming known results. This is a standard self-contained systems paper whose result stands or falls on external evidence quality, not internal circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard assumptions about AI capabilities rather than introducing new free parameters or entities.

axioms (1)
  • domain assumption Current large language models such as Google's Gemini can reliably conduct interviews and extract structured information from conversations about scams.
    The framework depends on the performance of LLMs for the core tasks of interviewing and data extraction.

pith-pipeline@v0.9.0 · 5790 in / 1331 out tokens · 48090 ms · 2026-05-18T20:31:56.895825+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. ORACLE: Anticipating Scams from Partial Trajectories in Streaming App Usage

    cs.LG 2026-05 unverdicted novelty 6.0

    ORACLE is a new agentic framework using adaptive context consolidation and teacher-student distillation to detect emerging scam patterns from incomplete, long-horizon app usage streams across 12 scam types.

Reference graph

Works this paper leans on

24 extracted references · 24 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team and et al., “Gemini: a family of highly capable multimodal models,” arXiv:2312.11805, 2025

  2. [2]

    National payments corporation of india, https://www.npci.org.in/, Au- gust 2025

  3. [3]

    International scammers steal over $1 trillion in 12 Months in global state of scams report 2024, GASA,

    S. Rogers, “International scammers steal over $1 trillion in 12 Months in global state of scams report 2024, GASA,” Nov 2024. [Online]. Available: https://www.gasa.org/post/global-state-of-scams-report-2024- 1-trillion-stolen-in-12-months-gasa-feedzai

  4. [4]

    Weidinger, J

    L. Weidinger et al., “Holistic safety and responsibility evaluations of advanced AI models,” arXiv preprint arXiv:2404.14068 , 2024

  5. [5]

    Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned

    D. Ganguli et al., “Red Teaming language models to reduce harms: methods, scaling behaviors, and lessons learned,” arXiv preprint arXiv:2209.07858, 2022

  6. [6]

    A survey of information extraction based on deep learning,

    Y . Yang, Z. Wu, Y . Yang, S. Lian, F. Guo and Z. Wang, “A survey of information extraction based on deep learning,” Applied Sciences vol. 12, no. 19, p. 9691, 2022

  7. [7]

    Using large language models for goal-oriented dialogue systems,

    L. Legashev, A. Shukhman, V . Badikov and V . Kurynov, “Using large language models for goal-oriented dialogue systems,” Appl. Sci. , vol. 15, no. 9, 4687, 2025, doi: 10.3390/app15094687

  8. [8]

    Artificial Intelligence and machine learning in fraud detection for digital payments,

    A. Davitaia, “Artificial Intelligence and machine learning in fraud detection for digital payments,” International Journal of Science and Research Archive, vol. 15, no. 3, pp. 714-719, 2025

  9. [9]

    Trust & Safety of LLMs and LLMs in Trust & Safety,

    D. You and D. Chon, “Trust & Safety of LLMs and LLMs in Trust & Safety,” arXiv preprint arXiv:2412.02113 , 2025

  10. [10]

    Enhancing Payment Ecosystems with AI/ML: Real- Time Analytics for Fraud Prevention and User Insights,

    L. S. Kushwah, “Enhancing Payment Ecosystems with AI/ML: Real- Time Analytics for Fraud Prevention and User Insights,” World Journal of Advanced Research and Reviews, vol. 26, no. 1, pp. 2124-2132, 2025

  11. [11]

    Safety by measurement: A systematic literature review of AI safety evaluation methods,

    M. Grey and C. Segerie, “Safety by measurement: A systematic literature review of AI safety evaluation methods,” arXiv preprint arXiv:2505.05541, 2025

  12. [12]

    Foundational Autoraters: Taming Large Language Models for better automatic evaluation,

    T. Vu, K. Krishna, S. Alzubi, C. Tar, M. Faruqui and Y . Sung, “Foundational Autoraters: Taming Large Language Models for better automatic evaluation,” arXiv preprint arXiv:2407.10817 , 2024

  13. [13]

    Responsible artificial intelligence governance: A review and research framework,

    E. Papagiannidis, P. Mikalef and K. Conboy, “Responsible artificial intelligence governance: A review and research framework,”The Journal of Strategic Information Systems , vol. 34, no. 2, p. 101885, 2025

  14. [14]

    Digital payments and GDP growth: A behavioural quantitative analysis,

    A. Birigozzi, C. De Silva and P. Luitel, “Digital payments and GDP growth: A behavioural quantitative analysis,” Research in International Business and Finance , vol. 75, p. 102768, 2025

  15. [15]

    India’s UPI revolution: over 18 billion transactions every month, a global leader in fast payments,

    Press Information Bureau, Government of India, “India’s UPI revolution: over 18 billion transactions every month, a global leader in fast payments,” Jul. 20, 2025. [Online]. Available: https://www.pib.gov.in/PressNoteDetails.aspx?NoteId=154912

  16. [16]

    The challenges of evaluating LLM appli- cations: An analysis of automated, human, and LLM-based approaches,

    B. Abeysinghe and R. Circi, “The challenges of evaluating LLM appli- cations: An analysis of automated, human, and LLM-based approaches,” arXiv preprint arXiv:2406.03339 , 2024

  17. [17]

    A comprehensive survey of cybercrimes in India over the last decade,

    S. S. Tripathy, “A comprehensive survey of cybercrimes in India over the last decade,” International Journal of Science and Research Archive, vol. 13, no. 1, pp. 2360–2374, 2024

  18. [18]

    Scams and frauds in the digital age: ML-based detection and prevention strategies,

    S. V . J. Kolupuri, A. Paul, R. S. Bhowmick and I. Ganguli, “Scams and frauds in the digital age: ML-based detection and prevention strategies,”26th International Conference on Distributed Computing and Networking (ICDCN ’25), pp. 340–345, 2025

  19. [19]

    An overview of 7726 user reports: uncovering sms scams and scammer strategies.arXiv preprint arXiv:2508.05276, 2025

    S. Agarwal, G. Suarez-Tangil and M. Vasek, “An overview of 7726 user reports: uncovering SMS scams and scammer strategies,” arXiv preprint arXiv:2508.05276, 2025

  20. [20]

    Combating investment scams: insights from law enforcement and civil society toward a prevention framework,

    E. S. Kasim, S. Muda, N. M. Zin, H. M. Padil, N. Ismail and S. N. S. Yusuf, “Combating investment scams: insights from law enforcement and civil society toward a prevention framework,” Journal of Crimino- logical Research, Policy and Practice , 2025, doi: 10.1108/JCRPP-04- 2025-0030

  21. [21]

    Chatbots in customer service within banking and finance: Do chatbots herald the start of an AI revolution in the corporate world?,

    G. Graham, T. M. Nisar, G. Prabhakar, R. Meriton and S. Malik, “Chatbots in customer service within banking and finance: Do chatbots herald the start of an AI revolution in the corporate world?,” Computers in Human Behavior , vol. 165, p. 108570, 2025

  22. [22]

    Decoding user concerns in AI health chatbots: An exploration of security and privacy in app reviews,

    M. Hassan, A. Ghani, M. F. Zaffar and M. Bashir, “Decoding user concerns in AI health chatbots: An exploration of security and privacy in app reviews,” arXiv preprint arXiv:2502.00067 , 2025

  23. [23]

    Enhancing trust and safety in digital pay- ments: an LLM-powered approach,

    D. Dahiphale et al., “Enhancing trust and safety in digital pay- ments: an LLM-powered approach,” 2024 IEEE International Confer- ence on Big Data (BigData) , 2024, pp. 4854–4863, doi: 10.1109/Big- Data62323.2024.10825105

  24. [24]

    Large Language Models for generative information extraction: A survey,

    D. Xu et al., “Large Language Models for generative information extraction: A survey,” arXiv preprint arXiv:2312.17617 , 2024