Presents a retrieval-grounded multilingual LLM system for island farmers using managed models and local data tools in a PWA for low-bandwidth use.
Querying Structured Data Through Natural Language Using Language Models
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
This paper presents an open source methodology for allowing users to query structured non textual datasets through natural language Unlike Retrieval Augmented Generation RAG which struggles with numerical and highly structured information our approach trains an LLM to generate executable queries To support this capability we introduce a principled pipeline for synthetic training data generation producing diverse question answer pairs that capture both user intent and the semantics of the underlying dataset We fine tune a compact model DeepSeek R1 Distill 8B using QLoRA with 4 bit quantization making the system suitable for deployment on commodity hardware We evaluate our approach on a dataset describing accessibility to essential services across Durangaldea Spain The fine tuned model achieves high accuracy across monolingual multilingual and unseen location scenarios demonstrating both robust generalization and reliable query generation Our results highlight that small domain specific models can achieve high precision for this task without relying on large proprietary LLMs making this methodology suitable for resource constrained environments and adaptable to broader multi dataset systems We evaluate our approach on a dataset describing accessibility to essential services across Durangaldea Spain The fine tuned model achieves high accuracy across monolingual multilingual and unseen location scenarios demonstrating both robust generalization and reliable query generation Our results highlight that small domain specific models can achieve high precision for this task without relying on large proprietary LLMs making this methodology suitable for resource constrained environments and adaptable to broader multi dataset systems.
fields
cs.CE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Retrieval-Grounded Multilingual LLM Assistance for Island Smallholder Farmers
Presents a retrieval-grounded multilingual LLM system for island farmers using managed models and local data tools in a PWA for low-bandwidth use.