pith. machine review for the scientific record. sign in

arxiv: 2603.05743 · v3 · submitted 2026-03-05 · 💻 cs.CL

Recognition: no theorem link

Designing Explainable Conversational Agentic Systems for Guaran\'i Speakers

Authors on Pith no claims yet

Pith reviewed 2026-05-15 15:43 UTC · model grok-4.3

classification 💻 cs.CL
keywords Guaraníconversational AIoral languagesindigenous data sovereigntymulti-agent systemsHCIdiglossiaexplainable AI
0
0 comments X

The pith

AI must treat spoken conversation as a first-class design requirement for languages like Guaraní instead of adapting them to text-centric systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This position paper examines how AI and HCI systems, designed primarily around text, fail to serve primarily oral languages and indigenous communities. Using Guaraní, spoken in Paraguay, as a case study, it argues that language support is insufficient without aligning with lived oral practices. The authors propose an oral-first multi-agent architecture that decouples natural language understanding from agents handling conversation state and community-led governance. This framework aims to respect indigenous data sovereignty and address diglossia by emphasizing turn-taking, repair, and shared context in interactions. If correct, this shift would ensure digital ecosystems empower rather than overlook diverse linguistic practices.

Core claim

The paper claims that an oral-first multi-agent architecture, achieved by decoupling Guaraní natural language understanding from dedicated agents for conversation state and community-led governance, provides a technical framework that respects indigenous data sovereignty and handles diglossia, moving beyond recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction.

What carries the argument

Oral-first multi-agent architecture that separates Guaraní natural language understanding from conversation state management and community governance agents.

Load-bearing premise

That decoupling natural language understanding from state and governance agents will effectively respect data sovereignty and manage diglossia, without any implementation details or validation.

What would settle it

A concrete implementation of the proposed multi-agent architecture tested with Guaraní speakers that either successfully maintains community control over conversational data while handling spoken turn-taking or fails to do so would settle the claim.

read the original abstract

Although artificial intelligence (AI) and Human-Computer Interaction (HCI) systems are often presented as universal solutions, their design remains predominantly text-first, underserving primarily oral languages and indigenous communities. This position paper uses Guaran\'i, an official and widely spoken language of Paraguay, as a case study to argue that language support in AI remains insufficient unless it aligns with lived oral practices. We propose an alternative to the standard "text-to-speech" pipeline, proposing instead an oral-first multi-agent architecture. By decoupling Guaran\'i natural language understanding from dedicated agents for conversation state and community-led governance, we demonstrate a technical framework that respects indigenous data sovereignty and diglossia. Our work moves beyond mere recognition to focus on turn-taking, repair, and shared context as the primary locus of interaction. We conclude that for AI to be truly culturally grounded, it must shift from adapting oral languages to text-centric systems to treating spoken conversation as a first-class design requirement, ensuring digital ecosystems empower rather than overlook diverse linguistic practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that AI/HCI systems remain predominantly text-first and thus underserve oral languages such as Guaraní; it proposes an oral-first multi-agent architecture that decouples Guaraní natural-language understanding from separate agents handling conversation state and community-led governance, thereby respecting indigenous data sovereignty, managing diglossia, and treating spoken turn-taking, repair, and shared context as the primary interaction locus rather than text intermediaries.

Significance. If the proposed decoupling can be shown to function without text intermediaries while preserving sovereignty, the work would usefully highlight a design principle for culturally grounded conversational systems and could influence future HCI and NLP research on indigenous languages. The manuscript itself supplies only a high-level conceptual sketch with no implementation, validation, or falsifiable predictions.

major comments (2)
  1. [Abstract] Abstract: the statement that the architecture 'demonstrates a technical framework that respects indigenous data sovereignty and diglossia' is unsupported; the manuscript asserts decoupling of NLU from conversation-state and governance agents but supplies neither interaction protocols, data-flow specifications, nor any representation of spoken input that avoids transcription.
  2. [Description of the oral-first multi-agent architecture] Description of the oral-first multi-agent architecture: the central claim that spoken conversation is treated as first-class rests on the unelaborated decoupling; no mechanism is given for routing spoken turns, performing repair, or maintaining shared context across agents while satisfying the sovereignty and diglossia conditions.
minor comments (1)
  1. The LaTeX escape sequence Guaran'i appears in the title and abstract; it should be rendered as the proper diacritic Guaraní in the published version.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. As a position paper, our manuscript advances a conceptual argument rather than presenting an implemented system; we address each major comment below and indicate where revisions will clarify scope without overstating the contribution.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the statement that the architecture 'demonstrates a technical framework that respects indigenous data sovereignty and diglossia' is unsupported; the manuscript asserts decoupling of NLU from conversation-state and governance agents but supplies neither interaction protocols, data-flow specifications, nor any representation of spoken input that avoids transcription.

    Authors: We agree that 'demonstrates' is too strong for a position paper that offers only a high-level conceptual sketch. The decoupling is proposed precisely to keep Guaraní NLU under community control and separate from state and governance agents, thereby addressing sovereignty and diglossia by limiting data exposure and avoiding mandatory transcription to Spanish or English. No interaction protocols or spoken-input representations are supplied because the paper focuses on design principles rather than engineering specifications. We will revise the abstract to replace 'demonstrates' with 'proposes' and add a clarifying sentence on the conceptual scope. revision: partial

  2. Referee: [Description of the oral-first multi-agent architecture] Description of the oral-first multi-agent architecture: the central claim that spoken conversation is treated as first-class rests on the unelaborated decoupling; no mechanism is given for routing spoken turns, performing repair, or maintaining shared context across agents while satisfying the sovereignty and diglossia conditions.

    Authors: The central claim rests on the architectural separation itself: spoken turns are routed first to the Guaraní NLU agent, which processes oral features directly; only abstracted intents or dialogue acts are forwarded to the conversation-state and governance agents. This separation is intended to satisfy sovereignty (raw audio or community-specific data stays within the NLU agent) and diglossia (no forced reduction to text in a dominant language). Repair and shared context would be handled inside the conversation-state agent using oral turn-taking norms. We acknowledge that concrete mechanisms and data-flow diagrams are not provided, consistent with the position-paper format. We will expand the architecture description with a high-level flow outline to make the separation more explicit. revision: yes

Circularity Check

0 steps flagged

No circularity: conceptual position paper with no derivations or self-referential reductions

full rationale

The manuscript is a position paper advancing a conceptual proposal for an oral-first multi-agent architecture. It contains no equations, fitted parameters, predictions, or derivation chains. The central claim—that decoupling Guaraní NLU from conversation-state and governance agents respects sovereignty and handles diglossia—is presented as a design recommendation rather than a result derived from prior inputs. No self-citations function as load-bearing premises, and no step reduces to its own definition or fitted data by construction. The argument rests on identified gaps in existing text-centric systems and is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim depends on the domain assumption about current AI limitations and the ad hoc proposal of the new architecture, with no free parameters or independent evidence for the invented framework.

axioms (2)
  • domain assumption AI and HCI systems are predominantly text-first and underserve oral languages and indigenous communities
    This is the foundational premise stated in the opening of the abstract.
  • ad hoc to paper An oral-first multi-agent architecture can align with lived oral practices and respect data sovereignty
    Proposed as the solution without supporting evidence or prior validation in the abstract.
invented entities (1)
  • oral-first multi-agent architecture with decoupled NLU, conversation state, and community governance agents no independent evidence
    purpose: To enable conversational AI that prioritizes turn-taking, repair, and shared context for Guaraní speakers
    This specific architecture is introduced in the paper as an alternative to standard text-to-speech pipelines.

pith-pipeline@v0.9.0 · 5481 in / 1500 out tokens · 82596 ms · 2026-05-15T15:43:36.616216+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Essam Alghamdi, Martin Halvey, and Emma Nicol. 2024. System and User Strategies to Repair Conversational Breakdowns of Spoken Dialogue Systems: A Scoping Review. In Proceedings of the 6th ACM Conference on Conversational User Interfaces (Luxembourg, Luxembourg) (CUI ’24). Association for Computing Machinery, New York, NY, USA, Article 28, 13 pages. doi:10...

  2. [2]

    Rosana Ardila, Megan Branson, Kelly Davis, Michael Kohler, Josh Meyer, Michael Henretty, Reuben Morais, Lindsay Saunders, Francis Tyers, and Gregor We- ber. 2020. Common Voice: A Massively-Multilingual Speech Corpus. In Pro- ceedings of the Twelfth Language Resources and Evaluation Conference , Nico- letta Calzolari, Frédéric Béchet, Philippe Blache, Khal...

  3. [3]

    Jonas Becker. 2024. Multi-Agent Large Language Models for Conversational Task- Solving. arXiv preprint arXiv:2410.22932v1. https://arxiv.org/abs/2410.22932

  4. [4]

    Stephanie Russo Carroll et al. 2020. The CARE Principles for Indigenous Data Governance. Data Science Journal (2020)

  5. [5]

    Luis Chiruzzo, Santiago Góngora, Aldo Alvarez, Gustavo Giménez -Lugo, Marvin Agüero-Torales, and Yliana Rodríguez. 2022. Jojajovai: A Parallel Guarani -Spanish Corpus for MT Benchmarking. In Proceedings of the Thirteenth Language Resources and Evaluation Conference , Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Designing Explainable Conversation...

  6. [6]

    Clark and Susan E

    Herbert H. Clark and Susan E. Brennan. 1991. Grounding in Communication. In Perspectives on Socially Shared Cognition. American Psychological Association. https://www.cs.cmu.edu/~illah/CLASSDOCS/Clark91.pdf

  7. [7]

    Bruno Estigarribia. 2015. Jopará and Guaraní in Paraguay (discussion of contact and mixed speech)

  8. [8]

    Bruno Estigarribia. 2020. A Grammar of Paraguayan Guaraní. UCL Press. https: //uclpress.co.uk/book/a -grammar-of-paraguayan -guarani/

  9. [9]

    Ferguson

    Charles A. Ferguson. 1959. Diglossia. Word 15, 2 (1959), 325–340. doi:10.1080/ 00437956.1959.11659702

  10. [10]

    Joshua A. Fishman. 1967. Bilingualism with and without Diglossia; Diglossia with and without Bilingualism. Journal of Social Issues 23, 2 (1967), 29 –38. doi:10. 1111/j.1540-4560.1967.tb00573.x

  11. [11]

    Santiago Góngora, Nicolás Giossa, and Luis Chiruzzo. 2021. Experiments on a Guarani Corpus of News and Social Media. In Proceedings of the First Workshop on Natural Language Processing for Indigenous Languages of the Americas, Manuel Mager, Arturo Oncevay, Annette Rios, Ivan Vladimir Meza Ruiz, Alexis Palmer, Graham Neubig, and Katharina Kann (Eds.). Asso...

  12. [12]

    Ben Hutchinson. 2025. A partnership with The University of Western Australia to improve speech technology for Aboriginal and Torres Strait Islander people’s voices. Google Australia Blog. https://blog.google/intl/en-au/company- news/technology/a-partnership-to-improve-speech-technology-for-first- nations-voices/

  13. [13]

    Instituto Nacional de Estadística (INE), Paraguay. 2024. Día Inter- nacional de la Lengua Materna: Diversidad lingüística en Paraguay. https://www.ine.gov.py/noticias/2298/dia-internacional-de-la-lengua- materna-diversidad-linguistica-en-paraguay

  14. [14]

    Instituto Nacional de Estadística (INE), Paraguay. 2025. 8 de cada 10 personas utiliza internet en Paraguay (EPH 2017–2024)

  15. [15]

    H. Ito. 2012. With Spanish, Guaraní lives: a sociolinguistic analysis of bilingual education in Paraguay. Multilingual Education 2, 1 (2012), 6. doi:10.1186/2191- 5059-2-6

  16. [16]

    JournalismAI. 2025. Guarani AI: When building language tech means building community.https://www.journalismai.info/blog/ 5fcm6ayykhqq7564kbvt9nw92wwmy9

  17. [17]

    Olga Kellert and Nemika Tyagi. 2025. Where and How Do Languages Mix? A Study of Spanish-Guaraní Code-Switching in Paraguay. In Proceedings of the Workshop on Computational Approaches to Linguistic Code-Switching. Association for Computational Linguistics

  18. [18]

    Katherine Mortimer. 2006. Guaraní Académico or Jopará? Educator Perspectives and Ideological Debate in Paraguayan Bilingual Education

  19. [19]

    Organization of American States (OAS). 1992. Paraguay’s Constitution of 1992 with Amendments through 2011. PDF. https://www.oas.org/ext/Portals/33/Files/ Member-States/Parag_intro_textfun_eng_1.pdf

  20. [20]

    Sagar Sapkota et al. 2025. Multi-Party Conversational Agents: A Survey. arXiv preprint arXiv:2505.18845v1. https://arxiv.org/abs/2505.18845

  21. [21]

    Secretaría de Políticas Lingüísticas. [n. d.]. Academia de la Lengua Guaraní. https://spl.gov.py/es/academia-de-la-lengua-guarani/

  22. [22]

    Secretaría de Políticas Lingüísticas (Paraguay). 2010. Ley No 4251/2010: Ley de Lenguas (texto bilingüe). PDF. https://spl.gov.py/files/legal/Ley%204251%20- %20bilingue.pdf

  23. [23]

    Jahanzeb Sherwani, Nosheen Ali, Carolyn Penstein Rosé, and Roni Rosen- feld. 2009. Orality-Grounded HCID: Understanding the Oral User. Infor- mation Technologies & International Development 5, 4 (2009), 37–49. https: //itidjournal.org/index.php/itid/article/download/422/422 -1096-2-PB.pdf

  24. [24]

    Gabriel Skantze. 2021. Turn-taking in conversational systems and human -robot interaction: A review. Computer Speech & Language 67 (2021), 101178. doi:10. 1016/j.csl.2020.101178

  25. [25]

    Tanya Stivers, N. J. Enfield, Penelope Brown, Christina Englert, Makoto Hayashi, Trine Heinemann, G. Hoymann, Federico Rossano, Jan P. de Ruiter, Kyung -Eun Yoon, and Stephen C. Levinson. 2009. Universals and cultural variation in turn - taking in conversation. Proceedings of the National Academy of Sciences 106, 26 (2009), 10587–10592. doi:10.1073/pnas.0...

  26. [26]

    The University of Western Australia. 2025. First Nations peo - ple to benefit from inclusive technology partnership. UWA News. https://www.uwa.edu.au/news/article/2025/february/first-nations-people-to- benefit-from-inclusive-technology-partnership

  27. [27]

    Mark Turin. 2012. Voices of vanishing worlds: Endangered lan- guages, orality, and cognition. Análise Social 205, 47 (2012). https: //www.researchgate.net/publication/262778986_Voices_of_vanishing_worlds_ Endangered_languages_orality_and_cognition