Telephony Voice Agent for Banking Services

Harshadkumar B. Prajapati; Nitya Dhagat; Vipul K. Dabhi; Zankhana J. Barad

arxiv: 2606.28779 · v1 · pith:ROCCFSP4new · submitted 2026-06-27 · 💻 cs.HC

Telephony Voice Agent for Banking Services

Nitya Dhagat , Vipul K. Dabhi , Harshadkumar B. Prajapati , Zankhana J. Barad This is my paper

Pith reviewed 2026-06-30 08:55 UTC · model grok-4.3

classification 💻 cs.HC

keywords voice agentbanking servicesDialogflow CXtelephonyconversational AIPIN authenticationlive agent handoffcloud storage

0 comments

The pith

A Dialogflow CX voice agent supports banking tasks over phone and hands off to humans for complex queries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper describes a telephony voice system for banking that lets users check balances, view transaction history, activate cards, and use PINs to authenticate sensitive actions. Built with Google Conversational Agent and Dialogflow CX, it routes out-of-scope questions to live agents while keeping data in the cloud. Tests under long calls, high concurrency, and noisy conditions showed the setup stayed responsive and resilient. This matters for users who need phone access without apps or websites.

Core claim

The system supports essential banking functions such as balance inquiries, transaction history retrieval, card activations, PIN-based authentication of sensitive tasks, smooth live agent handoff for complex and out-of-scope queries, and ensures seamless handover to human agents when required. These tests were performed with high-duration calls, high concurrency, and noisy environments; the system proved to be scalable, responsive, and resilient. All the data used is safely stored in the cloud environment for efficiency and security in real-time voice interactions.

What carries the argument

Dialogflow CX conversational agent that manages voice flows and routes complex queries to live agents.

If this is right

Routine banking tasks become available to users who lack smartphone apps or reliable internet.
PIN authentication protects sensitive operations before they proceed.
Out-of-scope or complex requests transfer automatically to human agents without call disruption.
Performance holds during extended calls or peak usage in noisy settings.
Cloud storage enables real-time secure handling of voice data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same architecture could extend to phone-based services in insurance or utility companies.
Adding support for more languages or accents might increase the share of queries handled without handoff.
Measuring real customer completion times against traditional phone menus would quantify any efficiency gain.
Pairing the agent with voice biometrics could strengthen security for higher-value transactions.

Load-bearing premise

Tests with high-duration calls, high concurrency, and noisy environments are sufficient to establish real-world scalability and resilience without quantitative metrics or baselines.

What would settle it

A deployment study that records task completion rates, error counts, call drop rates, and user satisfaction scores during actual customer calls would show whether the claimed scalability and resilience hold.

Figures

Figures reproduced from arXiv: 2606.28779 by Harshadkumar B. Prajapati, Nitya Dhagat, Vipul K. Dabhi, Zankhana J. Barad.

**Figure 2.** Figure 2: Twilio Studio Flow that defines how conversational agents respond to different user inputs and manage interactions. Playbooks allow for automating decision-making processes, creating dynamic conversation flows, and managing various tasks effectively. Task playbooks are the worker playbooks, in layman’s terms, whereas Routine playbooks are the master playbooks. To elaborate: 1) Task playbook: They are opti… view at source ↗

**Figure 3.** Figure 3: Conversational Voice Agent IV. SYSTEM IMPLEMENTATION The system was implemented as a modular and scalable cloud-native solution, integrating Twilio for real-time telephony and Dialogflow CX for voice-based conversational flows. Figures 2 and 3 illustrate the detailed structure of the telephony and conversational agent workflows. A. Call Flow Design with Twilio Studio [PITH_FULL_IMAGE:figures/full_fig_p00… view at source ↗

read the original abstract

This paper proposes a voice-powered AI-based banking system based on Google Conversational Agent, Dialogflow CX, which provides safe and convenient banking by phone. The system supports essential banking functions such as balance inquiries, transaction history retrieval, card activations, PIN-based authentication of sensitive tasks, smooth live agent handoff for complex and out-of-scope queries, and ensures seamless handover to human agents when required. These tests were performed with high-duration calls, high concurrency, and noisy environments; the system proved to be scalable, responsive, and resilient. All the data used is safely stored in the cloud environment for efficiency and security in real-time voice interactions. A voice-based banking solution that is efficient and easy to use can be provided through this.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a plain system description of a Dialogflow CX voice agent for routine banking tasks, with no new methods and no data behind the scalability claims.

read the letter

The paper walks through building a phone-based banking agent on Google Conversational Agent and Dialogflow CX. It covers balance checks, transaction history, card activation, PIN authentication for sensitive steps, and handoff to a live agent for anything out of scope. That is the whole contribution.

Nothing here is technically new. The tools are off-the-shelf, the architecture is the standard one for Dialogflow CX, and the features are what any competent implementer would add for a banking use case. No new algorithm, no new framework, no fresh insight into voice interfaces or security.

The practical description is clear enough. Someone who needs to wire up similar flows could copy the high-level structure and the handoff logic without much trouble. The cloud storage note is also straightforward.

The soft spot is the evaluation section. The abstract states that tests under long calls, high concurrency, and noisy conditions proved the system scalable, responsive, and resilient. No success rates, latency numbers, error counts, concurrency limits, or baseline comparisons appear. Without those figures the claim cannot be checked, so the main empirical statement stays unsupported.

This is useful to a developer or product team who wants a quick template for a voice banking prototype. It is not useful to researchers who need either a new technique or verifiable results. The work does not rise to the level that would justify sending it out for peer review in a research venue.

Referee Report

1 major / 0 minor

Summary. The manuscript describes a telephony voice agent for banking services implemented with Google Conversational Agent and Dialogflow CX. It supports balance inquiries, transaction history retrieval, card activations, PIN-based authentication for sensitive tasks, and seamless handoff to live agents for complex queries. The authors state that tests conducted under high-duration calls, high concurrency, and noisy environments demonstrated that the system is scalable, responsive, and resilient, with all data stored securely in the cloud.

Significance. If supported by data, the work would illustrate a practical voice interface for routine banking operations with secure authentication and graceful escalation. As presented, however, the contribution is primarily a system description whose central empirical claim lacks any quantitative backing, reducing its significance for an HCI or systems venue.

major comments (1)

[Abstract] Abstract: The assertion that tests with high-duration calls, high concurrency, and noisy environments 'proved' the system to be scalable, responsive, and resilient is unsupported; no success rates, latency figures, error rates, concurrency limits, failure modes, or baseline comparisons are reported anywhere in the manuscript.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the need for stronger empirical grounding. We agree that the current manuscript overstates the results of the described tests and will revise the abstract (and any related text) to remove unsupported claims.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that tests with high-duration calls, high concurrency, and noisy environments 'proved' the system to be scalable, responsive, and resilient is unsupported; no success rates, latency figures, error rates, concurrency limits, failure modes, or baseline comparisons are reported anywhere in the manuscript.

Authors: We acknowledge that the manuscript contains no quantitative metrics supporting the claim that the tests 'proved' scalability, responsiveness, or resilience. The statement in the abstract is therefore unsupported. In the revised manuscript we will delete the clause 'the system proved to be scalable, responsive, and resilient' and replace it with a neutral description of the test conditions (high-duration calls, high concurrency, noisy environments) without asserting that these conditions demonstrated the listed properties. We will also ensure the body text does not repeat the unsupported assertion. revision: yes

Circularity Check

0 steps flagged

No circularity: system description with no derivations or equations

full rationale

The paper is a descriptive account of a telephony voice agent built on Google Dialogflow CX for banking tasks. It lists supported functions and asserts that unspecified tests under high-duration, high-concurrency, and noisy conditions 'proved' scalability, responsiveness, and resilience. No equations, parameters, fitted models, uniqueness theorems, or derivation steps appear anywhere in the provided text. The enumerated circularity patterns (self-definitional claims, fitted inputs renamed as predictions, self-citation load-bearing, ansatz smuggling, renaming of known results) require a mathematical or logical chain that reduces to its own inputs; none exists here. The absence of quantitative metrics is a separate evidentiary weakness, not a circularity issue. The derivation chain is empty, so the paper is self-contained against the circularity criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work contains no mathematical derivations, theoretical models, or novel entities; it is a description of a commercial tool integration.

pith-pipeline@v0.9.1-grok · 5662 in / 1036 out tokens · 22775 ms · 2026-06-30T08:55:48.314758+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 3 canonical work pages · 1 internal anchor

[1]

How conversational ai can drive banking relation- ships,

G. F. Technologies, “How conversational ai can drive banking relation- ships,” https://www.paymentsjournal.com/how-conversational-ai-can-d rive-banking-relationships/, 2024

2024
[2]

Conversational ai in banking,

Aisera, “Conversational ai in banking,” https://aisera.com/blog/conversa tional-ai-banking/, 2024

2024
[3]

Conversational ai in financial services,

Clerk.Chat, “Conversational ai in financial services,” https://clerk.chat/b log/conversational-ai-in-financial-services/, 2024

2024
[4]

Conversational ai in banking,

K2View, “Conversational ai in banking,” https://www.k2view.com/blog/ conversational-ai-in-banking/, 2025

2025
[5]

Challenges and opportunities for conversational ai in banking today,

Finextra, “Challenges and opportunities for conversational ai in banking today,” https://www.finextra.com/blogposting/27046/challenges-and-o pportunities-for-conversational-ai-in-banking-today, 2025

2025
[6]

Chatbots in consumer finance,

C. F. P. Bureau, “Chatbots in consumer finance,” https://www.consum erfinance.gov/data-research/research-reports/chatbots-in-consumer-fin ance/chatbots-in-consumer-finance/, 2024

2024
[7]

Debiasing strategies for conversational ai: Improving privacy and security decision-making,

D. S. for Conversational AI: Improving Privacy and S. Decision-Making, “Debiasing strategies for conversational ai: Improving privacy and security decision-making,” https://www.researchgate.net/publication /373800214 Debiasing Strategies for Conversational AI Improving Privacy and Security Decision-Making, 2023

2023
[8]

V oice bots on the frontline: V oice-based interfaces enhance flow-like consumer experiences & boost service outcomes,

N. Zierau, C. Hildebrand, A. Bergner, F. Busquet, A. Schmitt, and J. Marco Leimeister, “V oice bots on the frontline: V oice-based interfaces enhance flow-like consumer experiences & boost service outcomes,” Journal of the Academy of Marketing Science, vol. 51, no. 4, pp. 823– 842, 2023

2023
[9]

Where are the customers’ bots? the ai paradigm shift in retail banking,

D. G. Birch and K. Rutter, “Where are the customers’ bots? the ai paradigm shift in retail banking,”Journal of Digital Banking, vol. 8, no. 2, pp. 132–140, 2023

2023
[10]

Real time system for handling customer queries using twilio, assembly ai and nlp,

P. K, P. R. D, S. Samundeswari, and M. J, “Real time system for handling customer queries using twilio, assembly ai and nlp,” in2022 1st International Conference on Computational Science and Technology (ICCST), 2022, pp. 111–115

2022
[11]

Real time avatar base speech to speech conversational ai tutor on ai pc,

M. S. Lai, E. G. Ooi, I. X. Goh, K. L. Teoh, T. T. Nee Pragasam, S. W. Lim, J. S. Ru Teh, L. J. Tang, and S. C. Tan, “Real time avatar base speech to speech conversational ai tutor on ai pc,” in2025 IEEE 15th Symposium on Computer Applications & Industrial Electronics (ISCAIE), 2025, pp. 108–113

2025
[12]

Finlingo: A conversational ai for enhancing financial literacy educa- tion in africa,

J. K. Mursi, H. Nach, A. Odera, B. Mwende, D. Dhol, and F. Mwikali, “Finlingo: A conversational ai for enhancing financial literacy educa- tion in africa,” in2024 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD), 2024, pp. 1–7

2024
[13]

Ai-enhanced bilingual banking assis- tant,

M. S. Bhatia and S. Khetarpaul, “Ai-enhanced bilingual banking assis- tant,”Scientific Reports, vol. 15, no. 1, p. 37526, 2025

2025
[14]

Virtual bank assistance: An ai based voice bot for better banking,

S. C. Oruganti, “Virtual bank assistance: An ai based voice bot for better banking,”International Journal of Research, vol. 9, no. 1, pp. 177–183, 2020

2020
[15]

Twilio, https://www.twilio.com/en-us
[16]

G. S. to-text Docs, https://cloud.google.com/speech-to-text/docs
[17]

Robust Speech Recognition via Large-Scale Weak Supervision

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04356

work page internal anchor Pith review Pith/arXiv arXiv 2022
[18]

models, https://alphacephei.com/vosk/models

V . models, https://alphacephei.com/vosk/models
[19]

newer models, https://cloud.google.com/blog/products/ai-machine-l earning/google-cloud-updates-speech-api-models-for-improved-accur acy

G. newer models, https://cloud.google.com/blog/products/ai-machine-l earning/google-cloud-updates-speech-api-models-for-improved-accur acy
[20]

G. P. parameters, https://docs.cloud.google.com/dialogflow/cx/docs/co ncept/playbook/parameter
[21]

G. T. to-speech Docs, https://cloud.google.com/text-to-speech/docs
[22]

Soundstream: An end-to-end neural audio codec,

N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, and M. Tagliasacchi, “Soundstream: An end-to-end neural audio codec,” 2021. [Online]. Available: https://arxiv.org/abs/2107.03312

work page arXiv 2021
[23]

Audiolm: a language modeling approach to audio generation,

Z. Borsos, R. Marinier, D. Vincent, E. Kharitonov, O. Pietquin, M. Sharifi, D. Roblek, O. Teboul, D. Grangier, M. Tagliasacchi, and N. Zeghidour, “Audiolm: a language modeling approach to audio generation,” 2023. [Online]. Available: https://arxiv.org/abs/2209.03143

work page arXiv 2023

[1] [1]

How conversational ai can drive banking relation- ships,

G. F. Technologies, “How conversational ai can drive banking relation- ships,” https://www.paymentsjournal.com/how-conversational-ai-can-d rive-banking-relationships/, 2024

2024

[2] [2]

Conversational ai in banking,

Aisera, “Conversational ai in banking,” https://aisera.com/blog/conversa tional-ai-banking/, 2024

2024

[3] [3]

Conversational ai in financial services,

Clerk.Chat, “Conversational ai in financial services,” https://clerk.chat/b log/conversational-ai-in-financial-services/, 2024

2024

[4] [4]

Conversational ai in banking,

K2View, “Conversational ai in banking,” https://www.k2view.com/blog/ conversational-ai-in-banking/, 2025

2025

[5] [5]

Challenges and opportunities for conversational ai in banking today,

Finextra, “Challenges and opportunities for conversational ai in banking today,” https://www.finextra.com/blogposting/27046/challenges-and-o pportunities-for-conversational-ai-in-banking-today, 2025

2025

[6] [6]

Chatbots in consumer finance,

C. F. P. Bureau, “Chatbots in consumer finance,” https://www.consum erfinance.gov/data-research/research-reports/chatbots-in-consumer-fin ance/chatbots-in-consumer-finance/, 2024

2024

[7] [7]

Debiasing strategies for conversational ai: Improving privacy and security decision-making,

D. S. for Conversational AI: Improving Privacy and S. Decision-Making, “Debiasing strategies for conversational ai: Improving privacy and security decision-making,” https://www.researchgate.net/publication /373800214 Debiasing Strategies for Conversational AI Improving Privacy and Security Decision-Making, 2023

2023

[8] [8]

V oice bots on the frontline: V oice-based interfaces enhance flow-like consumer experiences & boost service outcomes,

N. Zierau, C. Hildebrand, A. Bergner, F. Busquet, A. Schmitt, and J. Marco Leimeister, “V oice bots on the frontline: V oice-based interfaces enhance flow-like consumer experiences & boost service outcomes,” Journal of the Academy of Marketing Science, vol. 51, no. 4, pp. 823– 842, 2023

2023

[9] [9]

Where are the customers’ bots? the ai paradigm shift in retail banking,

D. G. Birch and K. Rutter, “Where are the customers’ bots? the ai paradigm shift in retail banking,”Journal of Digital Banking, vol. 8, no. 2, pp. 132–140, 2023

2023

[10] [10]

Real time system for handling customer queries using twilio, assembly ai and nlp,

P. K, P. R. D, S. Samundeswari, and M. J, “Real time system for handling customer queries using twilio, assembly ai and nlp,” in2022 1st International Conference on Computational Science and Technology (ICCST), 2022, pp. 111–115

2022

[11] [11]

Real time avatar base speech to speech conversational ai tutor on ai pc,

M. S. Lai, E. G. Ooi, I. X. Goh, K. L. Teoh, T. T. Nee Pragasam, S. W. Lim, J. S. Ru Teh, L. J. Tang, and S. C. Tan, “Real time avatar base speech to speech conversational ai tutor on ai pc,” in2025 IEEE 15th Symposium on Computer Applications & Industrial Electronics (ISCAIE), 2025, pp. 108–113

2025

[12] [12]

Finlingo: A conversational ai for enhancing financial literacy educa- tion in africa,

J. K. Mursi, H. Nach, A. Odera, B. Mwende, D. Dhol, and F. Mwikali, “Finlingo: A conversational ai for enhancing financial literacy educa- tion in africa,” in2024 IEEE International Conference on Technology Management, Operations and Decisions (ICTMOD), 2024, pp. 1–7

2024

[13] [13]

Ai-enhanced bilingual banking assis- tant,

M. S. Bhatia and S. Khetarpaul, “Ai-enhanced bilingual banking assis- tant,”Scientific Reports, vol. 15, no. 1, p. 37526, 2025

2025

[14] [14]

Virtual bank assistance: An ai based voice bot for better banking,

S. C. Oruganti, “Virtual bank assistance: An ai based voice bot for better banking,”International Journal of Research, vol. 9, no. 1, pp. 177–183, 2020

2020

[15] [15]

Twilio, https://www.twilio.com/en-us

[16] [16]

G. S. to-text Docs, https://cloud.google.com/speech-to-text/docs

[17] [17]

Robust Speech Recognition via Large-Scale Weak Supervision

A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” 2022. [Online]. Available: https://arxiv.org/abs/2212.04356

work page internal anchor Pith review Pith/arXiv arXiv 2022

[18] [18]

models, https://alphacephei.com/vosk/models

V . models, https://alphacephei.com/vosk/models

[19] [19]

newer models, https://cloud.google.com/blog/products/ai-machine-l earning/google-cloud-updates-speech-api-models-for-improved-accur acy

G. newer models, https://cloud.google.com/blog/products/ai-machine-l earning/google-cloud-updates-speech-api-models-for-improved-accur acy

[20] [20]

G. P. parameters, https://docs.cloud.google.com/dialogflow/cx/docs/co ncept/playbook/parameter

[21] [21]

G. T. to-speech Docs, https://cloud.google.com/text-to-speech/docs

[22] [22]

Soundstream: An end-to-end neural audio codec,

N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, and M. Tagliasacchi, “Soundstream: An end-to-end neural audio codec,” 2021. [Online]. Available: https://arxiv.org/abs/2107.03312

work page arXiv 2021

[23] [23]

Audiolm: a language modeling approach to audio generation,

Z. Borsos, R. Marinier, D. Vincent, E. Kharitonov, O. Pietquin, M. Sharifi, D. Roblek, O. Teboul, D. Grangier, M. Tagliasacchi, and N. Zeghidour, “Audiolm: a language modeling approach to audio generation,” 2023. [Online]. Available: https://arxiv.org/abs/2209.03143

work page arXiv 2023