Recognition: no theorem link
Designing Explainable Conversational Agentic Systems for Guaran\'i Speakers
Pith reviewed 2026-05-15 15:43 UTC · model grok-4.3
The pith
AI must treat spoken conversation as a first-class design requirement for languages like Guaraní instead of adapting them to text-centric systems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that an oral-first multi-agent architecture, achieved by decoupling Guaraní natural language understanding from dedicated agents for conversation state and community-led governance, provides a technical framework that respects indigenous data sovereignty and handles diglossia, moving beyond recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction.
What carries the argument
Oral-first multi-agent architecture that separates Guaraní natural language understanding from conversation state management and community governance agents.
Load-bearing premise
That decoupling natural language understanding from state and governance agents will effectively respect data sovereignty and manage diglossia, without any implementation details or validation.
What would settle it
A concrete implementation of the proposed multi-agent architecture tested with Guaraní speakers that either successfully maintains community control over conversational data while handling spoken turn-taking or fails to do so would settle the claim.
read the original abstract
Although artificial intelligence (AI) and Human-Computer Interaction (HCI) systems are often presented as universal solutions, their design remains predominantly text-first, underserving primarily oral languages and indigenous communities. This position paper uses Guaran\'i, an official and widely spoken language of Paraguay, as a case study to argue that language support in AI remains insufficient unless it aligns with lived oral practices. We propose an alternative to the standard "text-to-speech" pipeline, proposing instead an oral-first multi-agent architecture. By decoupling Guaran\'i natural language understanding from dedicated agents for conversation state and community-led governance, we demonstrate a technical framework that respects indigenous data sovereignty and diglossia. Our work moves beyond mere recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction. We conclude that for AI to be truly culturally grounded, it must shift from adapting oral languages to text-centric systems to treating spoken conversation as a first-class design requirement, ensuring digital ecosystems empower rather than overlook diverse linguistic practices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that AI/HCI systems remain predominantly text-first and thus underserve oral languages such as Guaraní; it proposes an oral-first multi-agent architecture that decouples Guaraní natural-language understanding from separate agents handling conversation state and community-led governance, thereby respecting indigenous data sovereignty, managing diglossia, and treating spoken turn-taking, repair, and shared context as the primary interaction locus rather than text intermediaries.
Significance. If the proposed decoupling can be shown to function without text intermediaries while preserving sovereignty, the work would usefully highlight a design principle for culturally grounded conversational systems and could influence future HCI and NLP research on indigenous languages. The manuscript itself supplies only a high-level conceptual sketch with no implementation, validation, or falsifiable predictions.
major comments (2)
- [Abstract] Abstract: the statement that the architecture 'demonstrates a technical framework that respects indigenous data sovereignty and diglossia' is unsupported; the manuscript asserts decoupling of NLU from conversation-state and governance agents but supplies neither interaction protocols, data-flow specifications, nor any representation of spoken input that avoids transcription.
- [Description of the oral-first multi-agent architecture] Description of the oral-first multi-agent architecture: the central claim that spoken conversation is treated as first-class rests on the unelaborated decoupling; no mechanism is given for routing spoken turns, performing repair, or maintaining shared context across agents while satisfying the sovereignty and diglossia conditions.
minor comments (1)
- The LaTeX escape sequence Guaran'i appears in the title and abstract; it should be rendered as the proper diacritic Guaraní in the published version.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. As a position paper, our manuscript advances a conceptual argument rather than presenting an implemented system; we address each major comment below and indicate where revisions will clarify scope without overstating the contribution.
read point-by-point responses
-
Referee: [Abstract] Abstract: the statement that the architecture 'demonstrates a technical framework that respects indigenous data sovereignty and diglossia' is unsupported; the manuscript asserts decoupling of NLU from conversation-state and governance agents but supplies neither interaction protocols, data-flow specifications, nor any representation of spoken input that avoids transcription.
Authors: We agree that 'demonstrates' is too strong for a position paper that offers only a high-level conceptual sketch. The decoupling is proposed precisely to keep Guaraní NLU under community control and separate from state and governance agents, thereby addressing sovereignty and diglossia by limiting data exposure and avoiding mandatory transcription to Spanish or English. No interaction protocols or spoken-input representations are supplied because the paper focuses on design principles rather than engineering specifications. We will revise the abstract to replace 'demonstrates' with 'proposes' and add a clarifying sentence on the conceptual scope. revision: partial
-
Referee: [Description of the oral-first multi-agent architecture] Description of the oral-first multi-agent architecture: the central claim that spoken conversation is treated as first-class rests on the unelaborated decoupling; no mechanism is given for routing spoken turns, performing repair, or maintaining shared context across agents while satisfying the sovereignty and diglossia conditions.
Authors: The central claim rests on the architectural separation itself: spoken turns are routed first to the Guaraní NLU agent, which processes oral features directly; only abstracted intents or dialogue acts are forwarded to the conversation-state and governance agents. This separation is intended to satisfy sovereignty (raw audio or community-specific data stays within the NLU agent) and diglossia (no forced reduction to text in a dominant language). Repair and shared context would be handled inside the conversation-state agent using oral turn-taking norms. We acknowledge that concrete mechanisms and data-flow diagrams are not provided, consistent with the position-paper format. We will expand the architecture description with a high-level flow outline to make the separation more explicit. revision: yes
Circularity Check
No circularity: conceptual position paper with no derivations or self-referential reductions
full rationale
The manuscript is a position paper advancing a conceptual proposal for an oral-first multi-agent architecture. It contains no equations, fitted parameters, predictions, or derivation chains. The central claim—that decoupling Guaraní NLU from conversation-state and governance agents respects sovereignty and handles diglossia—is presented as a design recommendation rather than a result derived from prior inputs. No self-citations function as load-bearing premises, and no step reduces to its own definition or fitted data by construction. The argument rests on identified gaps in existing text-centric systems and is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption AI and HCI systems are predominantly text-first and underserve oral languages and indigenous communities
- ad hoc to paper An oral-first multi-agent architecture can align with lived oral practices and respect data sovereignty
invented entities (1)
-
oral-first multi-agent architecture with decoupled NLU, conversation state, and community governance agents
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Essam Alghamdi, Martin Halvey, and Emma Nicol. 2024. System and User Strategies to Repair Conversational Breakdowns of Spoken Dialogue Systems: A Scoping Review. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (Luxembourg, Luxembourg) (CUI ’24). Association for Computing Machinery, New York, NY, USA, Article 28, 13 pages. doi:10...
-
[2]
Rosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and Gregor We- ber. 2020. Common Voice: A Massively-Multilingual Speech Corpus. In Pro- ceedings of the Twelfth Language Resources and Evaluation Conference , Nico- letta Calzolari, Frédéric Béchet, Philippe Blache, Khal...
work page 2020
- [3]
-
[4]
Stephanie Russo Carroll et al. 2020. The CARE Principles for Indigenous Data Governance. Data Science Journal (2020)
work page 2020
-
[5]
Luis Chiruzzo, Santiago Góngora, Aldo Alvarez, Gustavo Giménez -Lugo, Marvin Agüero-Torales, and Yliana Rodríguez. 2022. Jojajovai: A Parallel Guarani -Spanish Corpus for MT Benchmarking. In Proceedings of the Thirteenth Language Resources and Evaluation Conference , Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Designing Explainable Conversation...
work page 2022
-
[6]
Herbert H. Clark and Susan E. Brennan. 1991. Grounding in Communication. In Perspectives on Socially Shared Cognition. American Psychological Association. https://www.cs.cmu.edu/~illah/CLASSDOCS/Clark91.pdf
work page 1991
-
[7]
Bruno Estigarribia. 2015. Jopará and Guaraní in Paraguay (discussion of contact and mixed speech)
work page 2015
-
[8]
Bruno Estigarribia. 2020. A Grammar of Paraguayan Guaraní. UCL Press. https: //uclpress.co.uk/book/a -grammar-of-paraguayan -guarani/
work page 2020
- [9]
- [10]
-
[11]
Santiago Góngora, Nicolás Giossa, and Luis Chiruzzo. 2021. Experiments on a Guarani Corpus of News and Social Media. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, and Katharina Kann (Eds.). Asso...
-
[12]
Ben Hutchinson. 2025. A partnership with The University of Western Australia to improve speech technology for Aboriginal and Torres Strait Islander people’s voices. Google Australia Blog. https://blog.google/intl/en-au/company- news/technology/a-partnership-to-improve-speech-technology-for-first- nations-voices/
work page 2025
-
[13]
Instituto Nacional de Estadística (INE), Paraguay. 2024. Día Inter- nacional de la Lengua Materna: Diversidad lingüística en Paraguay. https://www.ine.gov.py/noticias/2298/dia-internacional-de-la-lengua- materna-diversidad-linguistica-en-paraguay
work page 2024
-
[14]
Instituto Nacional de Estadística (INE), Paraguay. 2025. 8 de cada 10 personas utiliza internet en Paraguay (EPH 2017–2024)
work page 2025
-
[15]
H. Ito. 2012. With Spanish, Guaraní lives: a sociolinguistic analysis of bilingual education in Paraguay. Multilingual Education 2, 1 (2012), 6. doi:10.1186/2191- 5059-2-6
-
[16]
JournalismAI. 2025. Guarani AI: When building language tech means building community.https://www.journalismai.info/blog/ 5fcm6ayykhqq7564kbvt9nw92wwmy9
work page 2025
-
[17]
Olga Kellert and Nemika Tyagi. 2025. Where and How Do Languages Mix? A Study of Spanish-Guaraní Code-Switching in Paraguay. In Proceedings of the Workshop on Computational Approaches to Linguistic Code-Switching. Association for Computational Linguistics
work page 2025
-
[18]
Katherine Mortimer. 2006. Guaraní Académico or Jopará? Educator Perspectives and Ideological Debate in Paraguayan Bilingual Education
work page 2006
-
[19]
Organization of American States (OAS). 1992. Paraguay’s Constitution of 1992 with Amendments through 2011. PDF. https://www.oas.org/ext/Portals/33/Files/ Member-States/Parag_intro_textfun_eng_1.pdf
work page 1992
- [20]
-
[21]
Secretaría de Políticas Lingüísticas. [n. d.]. Academia de la Lengua Guaraní. https://spl.gov.py/es/academia-de-la-lengua-guarani/
-
[22]
Secretaría de Políticas Lingüísticas (Paraguay). 2010. Ley No 4251/2010: Ley de Lenguas (texto bilingüe). PDF. https://spl.gov.py/files/legal/Ley%204251%20- %20bilingue.pdf
work page 2010
-
[23]
Jahanzeb Sherwani, Nosheen Ali, Carolyn Penstein Rosé, and Roni Rosen- feld. 2009. Orality-Grounded HCID: Understanding the Oral User. Infor- mation Technologies & International Development 5, 4 (2009), 37–49. https: //itidjournal.org/index.php/itid/article/download/422/422 -1096-2-PB.pdf
work page 2009
- [24]
-
[25]
Tanya Stivers, N. J. Enfield, Penelope Brown, Christina Englert, Makoto Hayashi, Trine Heinemann, G. Hoymann, Federico Rossano, Jan P. de Ruiter, Kyung -Eun Yoon, and Stephen C. Levinson. 2009. Universals and cultural variation in turn - taking in conversation. Proceedings of the National Academy of Sciences 106, 26 (2009), 10587–10592. doi:10.1073/pnas.0...
-
[26]
The University of Western Australia. 2025. First Nations peo - ple to benefit from inclusive technology partnership. UWA News. https://www.uwa.edu.au/news/article/2025/february/first-nations-people-to- benefit-from-inclusive-technology-partnership
work page 2025
- [27]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.