arxiv: 2603.05743 · v3 · submitted 2026-03-05 · 💻 cs.CL

Recognition: no theorem link

Designing Explainable Conversational Agentic Systems for Guaran\'i Speakers

Samantha Adorno , Akshata Kishore Moharir , Ratna Kandala

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:43 UTC · model grok-4.3

classification 💻 cs.CL

keywords Guaraníconversational AIoral languagesindigenous data sovereigntymulti-agent systemsHCIdiglossiaexplainable AI

0 comments

The pith

AI must treat spoken conversation as a first-class design requirement for languages like Guaraní instead of adapting them to text-centric systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper examines how AI and HCI systems, designed primarily around text, fail to serve primarily oral languages and indigenous communities. Using Guaraní, spoken in Paraguay, as a case study, it argues that language support is insufficient without aligning with lived oral practices. The authors propose an oral-first multi-agent architecture that decouples natural language understanding from agents handling conversation state and community-led governance. This framework aims to respect indigenous data sovereignty and address diglossia by emphasizing turn-taking, repair, and shared context in interactions. If correct, this shift would ensure digital ecosystems empower rather than overlook diverse linguistic practices.

Core claim

The paper claims that an oral-first multi-agent architecture, achieved by decoupling Guaraní natural language understanding from dedicated agents for conversation state and community-led governance, provides a technical framework that respects indigenous data sovereignty and handles diglossia, moving beyond recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction.

What carries the argument

Oral-first multi-agent architecture that separates Guaraní natural language understanding from conversation state management and community governance agents.

Load-bearing premise

That decoupling natural language understanding from state and governance agents will effectively respect data sovereignty and manage diglossia, without any implementation details or validation.

What would settle it

A concrete implementation of the proposed multi-agent architecture tested with Guaraní speakers that either successfully maintains community control over conversational data while handling spoken turn-taking or fails to do so would settle the claim.

read the original abstract

Although artificial intelligence (AI) and Human-Computer Interaction (HCI) systems are often presented as universal solutions, their design remains predominantly text-first, underserving primarily oral languages and indigenous communities. This position paper uses Guaran\'i, an official and widely spoken language of Paraguay, as a case study to argue that language support in AI remains insufficient unless it aligns with lived oral practices. We propose an alternative to the standard "text-to-speech" pipeline, proposing instead an oral-first multi-agent architecture. By decoupling Guaran\'i natural language understanding from dedicated agents for conversation state and community-led governance, we demonstrate a technical framework that respects indigenous data sovereignty and diglossia. Our work moves beyond mere recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction. We conclude that for AI to be truly culturally grounded, it must shift from adapting oral languages to text-centric systems to treating spoken conversation as a first-class design requirement, ensuring digital ecosystems empower rather than overlook diverse linguistic practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a conceptual position paper on an oral-first multi-agent setup for Guaraní that flags real gaps in current AI design but stops short of showing how the proposed decoupling would actually work.

read the letter

The main thing here is a call to treat spoken Guaraní interaction as the starting point rather than bolting speech onto text pipelines. The authors point out that standard systems ignore diglossia and community control, and they sketch a multi-agent split where one piece handles Guaraní understanding, another tracks conversation state, and a third brings in community governance. That framing is new enough for the specific language and setting, and it correctly ties the technical choice to sovereignty concerns that often get lip service elsewhere. The emphasis on turn-taking and repair as core rather than afterthoughts is a useful reminder for anyone building conversational tools for oral languages. What the paper does well is name the problem clearly and link it to lived practice in Paraguay without overclaiming universality. The soft spot is that the architecture stays at the level of assertion. The abstract and proposal describe decoupling the agents but give no protocols for routing spoken input, no way to avoid transcription, no sketch of how the governance agent would actually constrain data use, and no example of handling repair or shared context in practice. Without those pieces the central claim—that this setup respects sovereignty and manages diglossia better than text-first systems—rests on hope rather than demonstration. Readers working on HCI for indigenous languages or on multi-agent conversational systems will find the argument worth discussing. It is not a finished technical contribution, but it surfaces constraints that empirical work in this area should address. I would send it to peer review so the authors can get concrete feedback on what a minimal working version would need to show.

Referee Report

2 major / 1 minor

Summary. The paper claims that AI/HCI systems remain predominantly text-first and thus underserve oral languages such as Guaraní; it proposes an oral-first multi-agent architecture that decouples Guaraní natural-language understanding from separate agents handling conversation state and community-led governance, thereby respecting indigenous data sovereignty, managing diglossia, and treating spoken turn-taking, repair, and shared context as the primary interaction locus rather than text intermediaries.

Significance. If the proposed decoupling can be shown to function without text intermediaries while preserving sovereignty, the work would usefully highlight a design principle for culturally grounded conversational systems and could influence future HCI and NLP research on indigenous languages. The manuscript itself supplies only a high-level conceptual sketch with no implementation, validation, or falsifiable predictions.

major comments (2)

[Abstract] Abstract: the statement that the architecture 'demonstrates a technical framework that respects indigenous data sovereignty and diglossia' is unsupported; the manuscript asserts decoupling of NLU from conversation-state and governance agents but supplies neither interaction protocols, data-flow specifications, nor any representation of spoken input that avoids transcription.
[Description of the oral-first multi-agent architecture] Description of the oral-first multi-agent architecture: the central claim that spoken conversation is treated as first-class rests on the unelaborated decoupling; no mechanism is given for routing spoken turns, performing repair, or maintaining shared context across agents while satisfying the sovereignty and diglossia conditions.

minor comments (1)

The LaTeX escape sequence Guaran'i appears in the title and abstract; it should be rendered as the proper diacritic Guaraní in the published version.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. As a position paper, our manuscript advances a conceptual argument rather than presenting an implemented system; we address each major comment below and indicate where revisions will clarify scope without overstating the contribution.

read point-by-point responses

Referee: [Abstract] Abstract: the statement that the architecture 'demonstrates a technical framework that respects indigenous data sovereignty and diglossia' is unsupported; the manuscript asserts decoupling of NLU from conversation-state and governance agents but supplies neither interaction protocols, data-flow specifications, nor any representation of spoken input that avoids transcription.

Authors: We agree that 'demonstrates' is too strong for a position paper that offers only a high-level conceptual sketch. The decoupling is proposed precisely to keep Guaraní NLU under community control and separate from state and governance agents, thereby addressing sovereignty and diglossia by limiting data exposure and avoiding mandatory transcription to Spanish or English. No interaction protocols or spoken-input representations are supplied because the paper focuses on design principles rather than engineering specifications. We will revise the abstract to replace 'demonstrates' with 'proposes' and add a clarifying sentence on the conceptual scope. revision: partial
Referee: [Description of the oral-first multi-agent architecture] Description of the oral-first multi-agent architecture: the central claim that spoken conversation is treated as first-class rests on the unelaborated decoupling; no mechanism is given for routing spoken turns, performing repair, or maintaining shared context across agents while satisfying the sovereignty and diglossia conditions.

Authors: The central claim rests on the architectural separation itself: spoken turns are routed first to the Guaraní NLU agent, which processes oral features directly; only abstracted intents or dialogue acts are forwarded to the conversation-state and governance agents. This separation is intended to satisfy sovereignty (raw audio or community-specific data stays within the NLU agent) and diglossia (no forced reduction to text in a dominant language). Repair and shared context would be handled inside the conversation-state agent using oral turn-taking norms. We acknowledge that concrete mechanisms and data-flow diagrams are not provided, consistent with the position-paper format. We will expand the architecture description with a high-level flow outline to make the separation more explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual position paper with no derivations or self-referential reductions

full rationale

The manuscript is a position paper advancing a conceptual proposal for an oral-first multi-agent architecture. It contains no equations, fitted parameters, predictions, or derivation chains. The central claim—that decoupling Guaraní NLU from conversation-state and governance agents respects sovereignty and handles diglossia—is presented as a design recommendation rather than a result derived from prior inputs. No self-citations function as load-bearing premises, and no step reduces to its own definition or fitted data by construction. The argument rests on identified gaps in existing text-centric systems and is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the domain assumption about current AI limitations and the ad hoc proposal of the new architecture, with no free parameters or independent evidence for the invented framework.

axioms (2)

domain assumption AI and HCI systems are predominantly text-first and underserve oral languages and indigenous communities
This is the foundational premise stated in the opening of the abstract.
ad hoc to paper An oral-first multi-agent architecture can align with lived oral practices and respect data sovereignty
Proposed as the solution without supporting evidence or prior validation in the abstract.

invented entities (1)

oral-first multi-agent architecture with decoupled NLU, conversation state, and community governance agents no independent evidence
purpose: To enable conversational AI that prioritizes turn-taking, repair, and shared context for Guaraní speakers
This specific architecture is introduced in the paper as an alternative to standard text-to-speech pipelines.

pith-pipeline@v0.9.0 · 5481 in / 1500 out tokens · 82596 ms · 2026-05-15T15:43:36.616216+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Essam Alghamdi, Martin Halvey, and Emma Nicol. 2024. System and User Strategies to Repair Conversational Breakdowns of Spoken Dialogue Systems: A Scoping Review. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (Luxembourg, Luxembourg) (CUI ’24). Association for Computing Machinery, New York, NY, USA, Article 28, 13 pages. doi:10...

work page doi:10.1145/3640794.3665558 2024
[2]

Rosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and Gregor We- ber. 2020. Common Voice: A Massively-Multilingual Speech Corpus. In Pro- ceedings of the Twelfth Language Resources and Evaluation Conference , Nico- letta Calzolari, Frédéric Béchet, Philippe Blache, Khal...

work page 2020
[3]

Jonas Becker. 2024. Multi-Agent Large Language Models for Conversational Task- Solving. arXiv preprint arXiv:2410.22932v1. https://arxiv.org/abs/2410.22932

work page arXiv 2024
[4]

Stephanie Russo Carroll et al. 2020. The CARE Principles for Indigenous Data Governance. Data Science Journal (2020)

work page 2020
[5]

Luis Chiruzzo, Santiago Góngora, Aldo Alvarez, Gustavo Giménez -Lugo, Marvin Agüero-Torales, and Yliana Rodríguez. 2022. Jojajovai: A Parallel Guarani -Spanish Corpus for MT Benchmarking. In Proceedings of the Thirteenth Language Resources and Evaluation Conference , Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Designing Explainable Conversation...

work page 2022
[6]

Clark and Susan E

Herbert H. Clark and Susan E. Brennan. 1991. Grounding in Communication. In Perspectives on Socially Shared Cognition. American Psychological Association. https://www.cs.cmu.edu/~illah/CLASSDOCS/Clark91.pdf

work page 1991
[7]

Bruno Estigarribia. 2015. Jopará and Guaraní in Paraguay (discussion of contact and mixed speech)

work page 2015
[8]

Bruno Estigarribia. 2020. A Grammar of Paraguayan Guaraní. UCL Press. https: //uclpress.co.uk/book/a -grammar-of-paraguayan -guarani/

work page 2020
[9]

Ferguson

Charles A. Ferguson. 1959. Diglossia. Word 15, 2 (1959), 325–340. doi:10.1080/ 00437956.1959.11659702

work page arXiv 1959
[10]

Joshua A. Fishman. 1967. Bilingualism with and without Diglossia; Diglossia with and without Bilingualism. Journal of Social Issues 23, 2 (1967), 29 –38. doi:10. 1111/j.1540-4560.1967.tb00573.x

work page arXiv 1967
[11]

Santiago Góngora, Nicolás Giossa, and Luis Chiruzzo. 2021. Experiments on a Guarani Corpus of News and Social Media. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, and Katharina Kann (Eds.). Asso...

work page doi:10.18653/v1/2021.americasnlp-1.16 2021
[12]

Ben Hutchinson. 2025. A partnership with The University of Western Australia to improve speech technology for Aboriginal and Torres Strait Islander people’s voices. Google Australia Blog. https://blog.google/intl/en-au/company- news/technology/a-partnership-to-improve-speech-technology-for-first- nations-voices/

work page 2025
[13]

Instituto Nacional de Estadística (INE), Paraguay. 2024. Día Inter- nacional de la Lengua Materna: Diversidad lingüística en Paraguay. https://www.ine.gov.py/noticias/2298/dia-internacional-de-la-lengua- materna-diversidad-linguistica-en-paraguay

work page 2024
[14]

Instituto Nacional de Estadística (INE), Paraguay. 2025. 8 de cada 10 personas utiliza internet en Paraguay (EPH 2017–2024)

work page 2025
[15]

H. Ito. 2012. With Spanish, Guaraní lives: a sociolinguistic analysis of bilingual education in Paraguay. Multilingual Education 2, 1 (2012), 6. doi:10.1186/2191- 5059-2-6

work page doi:10.1186/2191- 2012
[16]

JournalismAI. 2025. Guarani AI: When building language tech means building community.https://www.journalismai.info/blog/ 5fcm6ayykhqq7564kbvt9nw92wwmy9

work page 2025
[17]

Olga Kellert and Nemika Tyagi. 2025. Where and How Do Languages Mix? A Study of Spanish-Guaraní Code-Switching in Paraguay. In Proceedings of the Workshop on Computational Approaches to Linguistic Code-Switching. Association for Computational Linguistics

work page 2025
[18]

Katherine Mortimer. 2006. Guaraní Académico or Jopará? Educator Perspectives and Ideological Debate in Paraguayan Bilingual Education

work page 2006
[19]

Organization of American States (OAS). 1992. Paraguay’s Constitution of 1992 with Amendments through 2011. PDF. https://www.oas.org/ext/Portals/33/Files/ Member-States/Parag_intro_textfun_eng_1.pdf

work page 1992
[20]

Sagar Sapkota et al. 2025. Multi-Party Conversational Agents: A Survey. arXiv preprint arXiv:2505.18845v1. https://arxiv.org/abs/2505.18845

work page arXiv 2025
[21]

Secretaría de Políticas Lingüísticas. [n. d.]. Academia de la Lengua Guaraní. https://spl.gov.py/es/academia-de-la-lengua-guarani/

work page
[22]

Secretaría de Políticas Lingüísticas (Paraguay). 2010. Ley No 4251/2010: Ley de Lenguas (texto bilingüe). PDF. https://spl.gov.py/files/legal/Ley%204251%20- %20bilingue.pdf

work page 2010
[23]

Jahanzeb Sherwani, Nosheen Ali, Carolyn Penstein Rosé, and Roni Rosen- feld. 2009. Orality-Grounded HCID: Understanding the Oral User. Infor- mation Technologies & International Development 5, 4 (2009), 37–49. https: //itidjournal.org/index.php/itid/article/download/422/422 -1096-2-PB.pdf

work page 2009
[24]

Gabriel Skantze. 2021. Turn-taking in conversational systems and human -robot interaction: A review. Computer Speech & Language 67 (2021), 101178. doi:10. 1016/j.csl.2020.101178

work page arXiv 2021
[25]

Tanya Stivers, N. J. Enfield, Penelope Brown, Christina Englert, Makoto Hayashi, Trine Heinemann, G. Hoymann, Federico Rossano, Jan P. de Ruiter, Kyung -Eun Yoon, and Stephen C. Levinson. 2009. Universals and cultural variation in turn - taking in conversation. Proceedings of the National Academy of Sciences 106, 26 (2009), 10587–10592. doi:10.1073/pnas.0...

work page doi:10.1073/pnas.0903616106 2009
[26]

The University of Western Australia. 2025. First Nations peo - ple to benefit from inclusive technology partnership. UWA News. https://www.uwa.edu.au/news/article/2025/february/first-nations-people-to- benefit-from-inclusive-technology-partnership

work page 2025
[27]

Mark Turin. 2012. Voices of vanishing worlds: Endangered lan- guages, orality, and cognition. Análise Social 205, 47 (2012). https: //www.researchgate.net/publication/262778986_Voices_of_vanishing_worlds_ Endangered_languages_orality_and_cognition

work page arXiv 2012