Retrieval-Grounded Multilingual LLM Assistance for Island Smallholder Farmers

Andrew J. McCracken; Ilias Karachalios; Nikolaos D. Tantaroudas

arxiv: 2606.25647 · v1 · pith:U3K7N5X7new · submitted 2026-06-24 · 💻 cs.CE

Retrieval-Grounded Multilingual LLM Assistance for Island Smallholder Farmers

Nikolaos D. Tantaroudas , Ilias Karachalios , Andrew J. McCracken This is my paper

Pith reviewed 2026-06-25 19:51 UTC · model grok-4.3

classification 💻 cs.CE

keywords agricultural AImultilingual LLMretrieval augmented generationsmallholder farmingisland agriculturegeospatial toolsconversational assistantmanaged LLM deployment

0 comments

The pith

For small resource-constrained rural deployments a managed retrieval-grounded multilingual assistant is more attainable and trustworthy than a self-hosted model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how a conversational AI assistant called Falco eleonorae can deliver reliable agronomic advice to island smallholder farmers whose local knowledge in dialect is missing from general LLMs. It builds the system as a thin proxy that delegates generation to managed upstream models while using a retrieval tool to pull from a curated bilingual database of local crops, seasonal calendars, traditional practices, and geospatial data. This design supports voice input, field photo description, and low-bandwidth mobile use without requiring the deployment team to host or fine-tune a large model itself. The core argument is that grounding answers in a read-only local data interface makes the assistant both practical and more trustworthy for the specific community than running an independent model would be.

Core claim

The paper claims that a thin Backend-for-Frontend proxy connected to managed GPT-family models, combined with tool-augmented retrieval from a curated read-only bilingual data interface that exposes local crops, seasonal calendars, traditional practices, dialect glossaries, products, cooperatives, and training content each wrapped in geospatial Well-Known Text envelopes, produces a trustworthy multilingual assistant for a defined island area, and that this managed grounded approach is more attainable than self-hosting an LLM for small resource-constrained rural deployments.

What carries the argument

The Model Context Protocol (MCP) tool that queries the curated read-only bilingual data interface and returns results anchored by geospatial Well-Known Text envelopes.

If this is right

Multilingual queries in Greek primary and English secondary are answered with access to the dialect glossary.
Uploaded field photographs are described by a vision model so only text reaches the agronomic agent.
Voice input is transcribed by a managed EU streaming speech-to-text service before processing.
The system runs as a progressive web application designed for low-bandwidth field conditions.
Security and data-protection controls are inherited from the managed upstream services rather than implemented locally.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same thin-proxy pattern could be reused in other remote agricultural regions by swapping in a new local data interface.
Adding further tool calls for real-time weather or market prices would extend the system without changing the core hosting model.
The design choice favors narrow-domain reliability over the breadth that a fully self-hosted general model would attempt.

Load-bearing premise

The curated bilingual data interface holds complete and authoritative local knowledge that the retrieval tool can surface without omissions or errors for the queries farmers actually pose.

What would settle it

A test query about a common local seasonal practice or crop that returns either an omission or a factual error from the MCP tool would show the grounding is incomplete.

Figures

Figures reproduced from arXiv: 2606.25647 by Andrew J. McCracken, Ilias Karachalios, Nikolaos D. Tantaroudas.

**Figure 2.** Figure 2: Sequence of a single grounded conversational turn. Tool selection and answer generation [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: The “Falco eleonorae” conversational assistant. The interface is Greek-primary, with [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: A grounded exchange in the assistant: a Greek-language question about local crops [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

Smallholder farming communities in remote, depopulating areas have limited access to agricultural advice, and their locally specific agronomic knowledge, often expressed in regional dialect, is poorly represented in the global corpora on which Large Language Models (LLMs) are trained. A general-purpose chatbot therefore answers fluently but unreliably, ungrounded in authoritative local data farmers can trust. This paper presents a conversational AI assistant, Falco eleonorae, embedded in a bilingual (Greek-primary, English-secondary) e-market platform serving farmers and cooperatives of a defined island area of interest. It is a thin Backend-for-Frontend (BFF) proxy in front of a geospatially-aware agronomic agent rather than a self-hosted model. Answer generation and tool selection are delegated to a managed upstream service on OpenAI GPT-5-family models, while one bounded task, describing an uploaded field photograph, is handled directly by a vision-capable model so only text reaches the agent, and voice input is transcribed by a managed EU streaming speech-to-text service. Grounding comes not from a self-hosted vector database but from tool-augmented retrieval: a Model Context Protocol (MCP) tool queries a curated, read-only, bilingual data interface exposing local crops, a seasonal calendar, traditional practices, a dialect glossary, products, agritourism experiences, cooperatives, and training content, each wrapped in a geospatial Well-Known Text envelope anchoring the agent to the area of interest. We detail its multilingual, voice, and image modalities, its progressive-web-application and accessibility design for low-bandwidth field use, and its security and data-protection posture, and argue that for a small, resource-constrained rural deployment a managed, grounded multilingual assistant is more attainable and trustworthy than a self-hosted model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a clear but unevaluated description of a practical RAG deployment for agricultural advice on a Greek island.

read the letter

This paper is a system description of a conversational AI assistant for farmers on one Greek island. It uses a managed LLM backend with retrieval tools pulling from curated local data on crops, seasons, and practices.

The paper does a good job explaining the architecture choices. It delegates most work to OpenAI services while handling image description separately and using EU speech services. The tool-augmented retrieval with geospatial data and the focus on low-bandwidth and accessibility are practical touches. The case for using managed models over self-hosting in resource-constrained settings is laid out with concrete reasons.

Nothing here is technically novel. It applies existing RAG techniques to a new but narrow domain without introducing new methods or results.

The main weakness is the absence of any evaluation. The paper claims the system is attainable and trustworthy but provides no data on performance, user acceptance, or accuracy. The assumption that the curated data covers farmer queries adequately is not tested.

This is for people interested in deploying LLM tools in agriculture or similar rural applications. A reader working on similar systems might find the integration details useful.

I would not bring this to a general reading group. I would not cite it. It should go to peer review for an applied systems journal, as the description is detailed enough to be of interest despite the lack of results.

Referee Report

1 major / 0 minor

Summary. The manuscript presents Falco eleonorae, a conversational AI assistant for island smallholder farmers implemented as a thin Backend-for-Frontend (BFF) proxy. Answer generation is delegated to managed OpenAI GPT-5-family models, with one vision task handled separately and voice input transcribed via an EU streaming service. Grounding is provided via a Model Context Protocol (MCP) tool that queries a curated, read-only, bilingual (Greek-primary) data interface exposing local crops, seasonal calendar, traditional practices, dialect glossary, products, cooperatives and training content, each with geospatial Well-Known Text envelopes. The paper details multilingual/voice/image modalities, progressive-web-application and accessibility design for low-bandwidth use, and security posture, and argues that for resource-constrained rural deployments a managed, grounded multilingual assistant is more attainable and trustworthy than a self-hosted model.

Significance. If the architecture performs as described, the work supplies a concrete, replicable example of combining managed upstream services with domain-curated retrieval for localized agricultural advice in remote, low-resource settings. The explicit attention to low-bandwidth PWA design, accessibility, multilingual dialect support, and data-protection posture constitutes a practical contribution that could inform similar deployments.

major comments (1)

[Abstract] Abstract: the central claim that 'for a small, resource-constrained rural deployment a managed, grounded multilingual assistant is more attainable and trustworthy than a self-hosted model' is presented without any comparative analysis of resource requirements, latency, cost, hallucination rates, or user-trust metrics. Because the manuscript advances no empirical evaluation or quantitative argument, this assertion remains unsupported and is load-bearing for the paper's stated contribution.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting the unsupported central claim in the abstract. The manuscript is a system-description paper focused on architecture, modalities, and deployment considerations for a low-resource setting; it does not contain empirical comparisons. We will revise the abstract to remove the comparative assertion and present the design rationale as a qualitative argument based on practical constraints.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'for a small, resource-constrained rural deployment a managed, grounded multilingual assistant is more attainable and trustworthy than a self-hosted model' is presented without any comparative analysis of resource requirements, latency, cost, hallucination rates, or user-trust metrics. Because the manuscript advances no empirical evaluation or quantitative argument, this assertion remains unsupported and is load-bearing for the paper's stated contribution.

Authors: We agree that the claim is unsupported by quantitative evidence. The manuscript provides no resource, latency, cost, hallucination, or trust metrics, nor any head-to-head evaluation against self-hosted models. The contribution lies in the concrete architecture (BFF proxy, MCP tool, PWA design, multilingual and accessibility features) and the security posture for a defined island deployment. We will revise the abstract to state that the managed, grounded approach was chosen for attainability under the stated constraints, without asserting comparative superiority on the listed dimensions. The revised wording will frame the argument as a design rationale rather than an empirical conclusion. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The manuscript is a purely descriptive system architecture paper presenting a BFF proxy using managed upstream LLM services, MCP tool-augmented retrieval from a curated bilingual data interface, and external speech/vision providers. No mathematical derivations, equations, fitted parameters, predictions, or theorems are advanced. No self-citations appear as load-bearing premises, and the central design argument (managed grounded assistant more attainable than self-hosting for constrained rural use) rests on external service properties and data curation rather than any reduction to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper is an engineering description of a system architecture with no new mathematical axioms or free parameters; it assumes the effectiveness of managed AI services and the completeness of the local database.

axioms (1)

domain assumption The curated local data interface contains accurate, sufficient, and up-to-date information for farmer queries.
Grounding and trustworthiness claims rest on this data being authoritative and comprehensive.

pith-pipeline@v0.9.1-grok · 5868 in / 1211 out tokens · 28446 ms · 2026-06-25T19:51:51.245994+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 11 canonical work pages · 8 internal anchors

[1]

Asaf Tzachor, Medha Devare, Catherine Richards, Pieter Pypers, Aniruddha Ghosh, Jawoo Koo, Sukhwinder Johal, and Brian King

doi:10.3390/su16167068. Asaf Tzachor, Medha Devare, Catherine Richards, Pieter Pypers, Aniruddha Ghosh, Jawoo Koo, Sukhwinder Johal, and Brian King. Large language models and agricultural extension services. Nature Food, 4:941–948,

work page doi:10.3390/su16167068
[2]

Chris High, Namita Singh, and Gusztáv Nemes

doi:10.1038/s43016-023-00867-x. Chris High, Namita Singh, and Gusztáv Nemes. Artificial intelligence for agricultural extension: Supportingtransformativelearningamongsmallholderfarmers.Journal of Development Policy and Practice,

work page doi:10.1038/s43016-023-00867-x
[3]

PoliRURAL Plus Consortium

doi:10.1177/24551333251345224. PoliRURAL Plus Consortium. PoliRURAL Plus: Building capacity for foresight-driven rural innovation (horizon europe).https://poliruralplus.eu/,

work page doi:10.1177/24551333251345224
[4]

doi:10.48550/arXiv.1706.03762. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems (NeurIPS), 33,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762
[5]

Language Models are Few-Shot Learners

doi:10.48550/arXiv.2005.14165. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems (NeurIPS), 35,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.14165 2005
[6]

Training language models to follow instructions with human feedback

doi:10.48550/arXiv.2203.02155. Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155
[7]

doi:10.48550/arXiv.2212.08073. OpenAI. GPT-5 system card. Technical report, OpenAI, August

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073
[8]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

doi:10.48550/arXiv.2005.11401. Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP),

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.11401 2005
[9]

Dense Passage Retrieval for Open-Domain Question Answering

doi:10.48550/arXiv.2004.04906. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR),

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2004.04906 2004
[10]

ReAct: Synergizing Reasoning and Acting in Language Models

doi:10.48550/arXiv.2210.03629. Anthropic and Model Context Protocol Contributors. Model context protocol specification (re- vision 2025-11-25).https://modelcontextprotocol.io/specification/2025-11-25,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.03629 2025
[11]

Querying structured data through natural language using language models

12 Valentin-Micu Hontan, Andrei-Alexandru Bunea, Nikolaos Dimitrios Tantaroudas, and Dan- Matei Popovici. Querying structured data through natural language using language models. arXiv preprint arXiv:2604.03057,

Pith/arXiv arXiv
[12]

Querying Structured Data Through Natural Language Using Language Models

URLhttps://arxiv.org/abs/2604.03057. Nikolaos D. Tantaroudas, Andrew J. McCracken, Ilias Karachalios, and Evangelos Papatheou. INTERACT: AI-powered extended reality platform for inclusive communication with real- time sign language translation and sentiment analysis.Open Research Europe, 6:71, 2026a. doi:10.12688/openreseurope.23201.1. [version 1; awaitin...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.12688/openreseurope.23201.1 2026

[1] [1]

Asaf Tzachor, Medha Devare, Catherine Richards, Pieter Pypers, Aniruddha Ghosh, Jawoo Koo, Sukhwinder Johal, and Brian King

doi:10.3390/su16167068. Asaf Tzachor, Medha Devare, Catherine Richards, Pieter Pypers, Aniruddha Ghosh, Jawoo Koo, Sukhwinder Johal, and Brian King. Large language models and agricultural extension services. Nature Food, 4:941–948,

work page doi:10.3390/su16167068

[2] [2]

Chris High, Namita Singh, and Gusztáv Nemes

doi:10.1038/s43016-023-00867-x. Chris High, Namita Singh, and Gusztáv Nemes. Artificial intelligence for agricultural extension: Supportingtransformativelearningamongsmallholderfarmers.Journal of Development Policy and Practice,

work page doi:10.1038/s43016-023-00867-x

[3] [3]

PoliRURAL Plus Consortium

doi:10.1177/24551333251345224. PoliRURAL Plus Consortium. PoliRURAL Plus: Building capacity for foresight-driven rural innovation (horizon europe).https://poliruralplus.eu/,

work page doi:10.1177/24551333251345224

[4] [4]

doi:10.48550/arXiv.1706.03762. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, et al. Language models are few-shot learners.Advances in Neural Information Processing Systems (NeurIPS), 33,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762

[5] [5]

Language Models are Few-Shot Learners

doi:10.48550/arXiv.2005.14165. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, et al. Training language models to follow instructions with human feedback.Advances in Neural Information Processing Systems (NeurIPS), 35,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.14165 2005

[6] [6]

Training language models to follow instructions with human feedback

doi:10.48550/arXiv.2203.02155. Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, et al. Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2203.02155

[7] [7]

doi:10.48550/arXiv.2212.08073. OpenAI. GPT-5 system card. Technical report, OpenAI, August

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073

[8] [8]

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

doi:10.48550/arXiv.2005.11401. Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP),

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2005.11401 2005

[9] [9]

Dense Passage Retrieval for Open-Domain Question Answering

doi:10.48550/arXiv.2004.04906. Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Conference on Learning Representations (ICLR),

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2004.04906 2004

[10] [10]

ReAct: Synergizing Reasoning and Acting in Language Models

doi:10.48550/arXiv.2210.03629. Anthropic and Model Context Protocol Contributors. Model context protocol specification (re- vision 2025-11-25).https://modelcontextprotocol.io/specification/2025-11-25,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.03629 2025

[11] [11]

Querying structured data through natural language using language models

12 Valentin-Micu Hontan, Andrei-Alexandru Bunea, Nikolaos Dimitrios Tantaroudas, and Dan- Matei Popovici. Querying structured data through natural language using language models. arXiv preprint arXiv:2604.03057,

Pith/arXiv arXiv

[12] [12]

Querying Structured Data Through Natural Language Using Language Models

URLhttps://arxiv.org/abs/2604.03057. Nikolaos D. Tantaroudas, Andrew J. McCracken, Ilias Karachalios, and Evangelos Papatheou. INTERACT: AI-powered extended reality platform for inclusive communication with real- time sign language translation and sentiment analysis.Open Research Europe, 6:71, 2026a. doi:10.12688/openreseurope.23201.1. [version 1; awaitin...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.12688/openreseurope.23201.1 2026