arxiv: 2604.22095 · v1 · submitted 2026-04-23 · 💻 cs.CL

Recognition: unknown

An End-to-End Ukrainian RAG for Local Deployment. Optimized Hybrid Search and Lightweight Generation

Mykola Trokhymovych , Yana Oliinyk , Nazarii Nyzhnyk

Authors on Pith no claims yet

Pith reviewed 2026-05-09 21:00 UTC · model grok-4.3

classification 💻 cs.CL

keywords Ukrainian RAGlocal deploymenthybrid searchsynthetic dataquestion answeringmodel compressionresource-constrained hardware

0 comments

The pith

A two-stage hybrid search and compressed Ukrainian model enable accurate local RAG on constrained hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds a Retrieval-Augmented Generation system for Ukrainian document question answering that combines optimized retrieval with lightweight generation. It uses a custom pipeline to find relevant pages and a model fine-tuned on synthetic data to produce grounded answers, then compresses everything for local running. The system reached second place in the UNLP 2026 Shared Task while obeying strict limits on compute and memory. A reader would care because the work shows that reliable, verifiable answers from documents do not require cloud-scale resources or large general models. This matters for settings where data must stay local or hardware is limited.

Core claim

Our architecture demonstrates that high-quality, verifiable AI question answering can be achieved locally on resource-constrained hardware without sacrificing accuracy by pairing a custom two-stage search pipeline that retrieves relevant document pages with a specialized Ukrainian language model fine-tuned on synthetic data and then compressed for lightweight deployment.

What carries the argument

The custom two-stage search pipeline for page retrieval together with a Ukrainian language model fine-tuned on synthetic data and compressed for local use.

Load-bearing premise

The two-stage hybrid search reliably retrieves relevant pages and the model fine-tuned on synthetic data produces accurate, grounded answers on real user queries without significant degradation.

What would settle it

Test the deployed system on a fresh collection of real Ukrainian user questions and measure whether answer accuracy and grounding fall below the shared-task level.

Figures

Figures reproduced from arXiv: 2604.22095 by Mykola Trokhymovych, Nazarii Nyzhnyk, Yana Oliinyk.

**Figure 2.** Figure 2: Prompt template for answer generation with [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

This paper presents a highly efficient Retrieval-Augmented Generation (RAG) system built specifically for Ukrainian document question answering, which achieved 2nd place in the UNLP 2026 Shared Task. Our solution features a custom two-stage search pipeline that retrieves relevant document pages, paired with a specialized Ukrainian language model fine-tuned on synthetic data to generate accurate, grounded answers. Finally, we compress the model for lightweight deployment. Evaluated under strict computational limits, our architecture demonstrates that high-quality, verifiable AI question answering can be achieved locally on resource-constrained hardware without sacrificing accuracy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

A concrete Ukrainian RAG pipeline that placed second in a shared task but gives almost no evaluation details to back its claims.

read the letter

The paper describes an end-to-end RAG system for Ukrainian document QA that uses two-stage hybrid search, a model fine-tuned on synthetic data, and compression for local hardware. It reports a second-place finish in the UNLP 2026 Shared Task. That is the main concrete contribution: a working, deployable setup for a low-resource language under tight compute constraints. The focus on local, privacy-friendly inference is useful for anyone who needs Ukrainian QA without sending data to external APIs. The authors combine existing pieces—hybrid retrieval and synthetic fine-tuning—in a way that fits the task limits, and the competition result offers at least some external signal that the system performed adequately. Those are the parts worth noting. The rest of the write-up is thin. The abstract states the architecture and the ranking but supplies no baselines, no accuracy numbers, no error analysis, no details on the synthetic data generation, and no description of how the hybrid search was tuned or evaluated. Without those elements it is impossible to tell whether the two-stage retrieval actually improves recall or whether the fine-tuned model stays grounded on real queries. The claim that the system achieves high-quality answers locally without accuracy loss therefore rests on the competition placement alone. That is not enough to assess robustness. The work is aimed at practitioners who build RAG systems for other low-resource languages or who need local deployment recipes. A reader already familiar with standard RAG components will see mostly routine adaptation rather than new insight. If the full paper contains proper experiments, ablations, and comparisons, it would be worth sending to peer review because the application area matters and the engineering constraints are real. As it stands, the current version is too light on evidence for a strong recommendation.

Referee Report

1 major / 1 minor

Summary. The paper presents an end-to-end RAG system for Ukrainian document question answering that achieved 2nd place in the UNLP 2026 Shared Task. It relies on a custom two-stage hybrid search pipeline to retrieve relevant document pages, a Ukrainian LM fine-tuned on synthetic data for grounded answer generation, and subsequent model compression to enable lightweight local deployment on resource-constrained hardware, claiming that high-quality verifiable QA is possible without accuracy loss.

Significance. If the performance claims are substantiated with rigorous evaluation, the work would provide a practical demonstration of local, privacy-preserving RAG for a low-resource language, with potential value for accessible AI tools and deployment in constrained environments. The combination of hybrid retrieval and synthetic-data fine-tuning could offer reusable insights, but the current manuscript supplies no metrics, baselines, or analysis to support these assertions.

major comments (1)

[Abstract] Abstract: the central performance claim (2nd place in UNLP 2026 Shared Task plus high accuracy on resource-constrained hardware) is stated without any evaluation details, metrics, baselines, error analysis, dataset descriptions, or results tables. This is load-bearing for the paper's main assertion and prevents verification of whether the two-stage search and fine-tuned model actually deliver the claimed accuracy.

minor comments (1)

The abstract refers to 'optimized hybrid search' and 'lightweight generation' but provides no concrete description of the optimization techniques, compression method, or how the two-stage pipeline is implemented.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comment regarding the abstract below and commit to revisions that strengthen the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract: the central performance claim (2nd place in UNLP 2026 Shared Task plus high accuracy on resource-constrained hardware) is stated without any evaluation details, metrics, baselines, error analysis, dataset descriptions, or results tables. This is load-bearing for the paper's main assertion and prevents verification of whether the two-stage search and fine-tuned model actually deliver the claimed accuracy.

Authors: We agree that the abstract, as written, does not provide sufficient quantitative details or references to supporting analysis to allow immediate verification of the performance claims. The manuscript's current structure relies on the body text for these elements, but we acknowledge this is insufficient for the abstract's role as a standalone summary. In the revised version, we will expand the abstract to include the specific ranking score from the UNLP 2026 Shared Task, key accuracy metrics on the target hardware, a brief description of the evaluation datasets and baselines, and a high-level reference to the results and analysis sections. We will also ensure the main body contains explicit results tables, baseline comparisons, and error analysis to fully substantiate the claims about the two-stage hybrid search and model compression. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript is an applied engineering description of a RAG pipeline (two-stage hybrid search, synthetic-data fine-tuning, model compression) that reports empirical results from a shared task and local deployment benchmarks. No equations, derivations, first-principles predictions, or fitted parameters are claimed or present that could reduce to the inputs by construction. Claims rest on external task performance and hardware measurements rather than self-referential definitions or self-citation chains. Self-citations, if present, are not load-bearing for any core result.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract contains no technical derivations, equations, or implementation specifics, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5400 in / 1115 out tokens · 30702 ms · 2026-05-09T21:00:34.314124+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 16 canonical work pages · 5 internal anchors

[1]

Passage Re-ranking with BERT

Rodrigo Nogueira and Kyunghyun Cho , title =. CoRR , volume =. 2019 , url =. 1901.04085 , timestamp =

work page internal anchor Pith review arXiv 2019
[2]

Meftun Akarsu and Recep Kaan Karaman and Christopher Mierbach , year=. From. 2604.01733 , archivePrefix=

work page arXiv
[3]

Cormack, Charles L

Cormack, Gordon V. and Clarke, Charles L A and Buettcher, Stefan , title =. Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2009 , isbn =. doi:10.1145/1571941.1572114 , abstract =

work page doi:10.1145/1571941.1572114 2009
[4]

Robertson and Steve Walker and Susan Jones and Micheline Hancock-Beaulieu and Mike Gatford , booktitle=

Stephen E. Robertson and Steve Walker and Susan Jones and Micheline Hancock-Beaulieu and Mike Gatford , booktitle=. Okapi at. 1994 , url=

1994
[5]

Sentence- BERT : Sentence Embeddings using Siamese BERT -Networks

Reimers, Nils and Gurevych, Iryna. Sentence- BERT : Sentence Embeddings using Siamese BERT -Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. 2019

2019
[6]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Yunfan and Xiong, Yun and Gao, Xinyu and Jia, Kangxiang and Pan, Jinliu and Bi, Yuxi and Dai, Yi and Sun, Jiawei and Wang, Meng and Wang, Haofen , title =. arXiv preprint arXiv:2312.10997 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[7]

2026 , eprint=

Diffusion-Pretrained Dense and Contextual Embeddings , author=. 2026 , eprint=

2026
[8]

Gemma Team and Aishwarya Kamath and Johan Ferret and Shreya Pathak and Nino Vieillard and Ramona Merhej and Sarah Perrin and Tatiana Matejovicova and Alexandre Ramé and Morgane Rivière and Louis Rouillard and Thomas Mesnard and Geoffrey Cideron and Jean-bastien Grill and Sabela Ramos and Edouard Yvinec and Michelle Casbon and Etienne Pot and Ivo Penchev a...

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model. Proceedings of ACL'24. 2024. doi:10.18653/v1/2024.acl-long.845

work page doi:10.18653/v1/2024.acl-long.845 2024
[10]

2025 , publisher =

Yukhymenko, Hanna and Alexandrov, Anton and Vechev, Martin , title =. 2025 , publisher =

2025
[11]

LoRA: Low-Rank Adaptation of Large Language Models

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. CoRR , volume =. 2021 , url =. 2106.09685 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2021
[12]

GitHub repository , howpublished =

Gerganov, Georgi , title =. GitHub repository , howpublished =. 2023 , publisher =

2023
[13]

Benchmarking Emerging Deep Learning Quantization Methods for Energy Efficiency , year=

Rajput, Saurabhsingh and Sharma, Tushar , booktitle=. Benchmarking Emerging Deep Learning Quantization Methods for Energy Efficiency , year=
[14]

Shervin Minaee and Tomas Mikolov and Narjes Nikzad and Meysam Chenaghlu and Richard Socher and Xavier Amatriain and Jianfeng Gao , year=. Large. 2402.06196 , archivePrefix=

work page internal anchor Pith review arXiv
[15]

Knowledge Boundary of Large Language Models : A Survey

Li, Moxin and Zhao, Yong and Zhang, Wenxuan and Li, Shuaiyi and Xie, Wenya and Ng, See-Kiong and Chua, Tat-Seng and Deng, Yang. Knowledge Boundary of Large Language Models : A Survey. Proceedings of ACL'25. 2025. doi:10.18653/v1/2025.acl-long.256

work page doi:10.18653/v1/2025.acl-long.256 2025
[16]

Retrieval-augmented generation for knowledge-intensive

Lewis, Patrick and Perez, Ethan and Piktus, Aleksandra and Petroni, Fabio and Karpukhin, Vladimir and Goyal, Naman and K\". Retrieval-augmented generation for knowledge-intensive. Proceedings of NIPS '20 , articleno =. 2020 , isbn =

2020
[17]

doi: 10.18653/v1/2024.acl-long.585

Niu, Cheng and Wu, Yuanhao and Zhu, Juno and Xu, Siliang and Shum, KaShun and Zhong, Randy and Song, Juntong and Zhang, Tong. RAGT ruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models. Proceedings of ACL'24. 2024. doi:10.18653/v1/2024.acl-long.585

work page doi:10.18653/v1/2024.acl-long.585 2024
[18]

Frontiers in Artificial Intelligence , VOLUME=

Maksymenko, Daniil and Turuta, Oleksii , TITLE=. Frontiers in Artificial Intelligence , VOLUME=. 2025 , URL=. doi:10.3389/frai.2025.1538165 , ISSN=

work page doi:10.3389/frai.2025.1538165 2025
[19]

Large language models are biased — local initiatives are fighting for change , journal =

Vargas-Parada, Laura , year =. Large language models are biased — local initiatives are fighting for change , journal =
[20]

2021 , isbn =

Trokhymovych, Mykola and Saez-Trumper, Diego , title =. 2021 , isbn =. doi:10.1145/3459637.3481961 , booktitle =

work page doi:10.1145/3459637.3481961 2021
[21]

Wiki Workshop , year =

Trokhymovych, Mykola and Saez-Trumper, Diego , title =. Wiki Workshop , year =
[22]

Automated Fact Checking: Task Formulations, Methods and Future Directions

Thorne, James and Vlachos, Andreas. Automated Fact Checking: Task Formulations, Methods and Future Directions. Proceedings of COLING'18. 2018

2018
[23]

An Open Multilingual System for Scoring Readability of

Trokhymovych, Mykola and Sen, Indira and Gerlach, Martin , editor =. An Open Multilingual System for Scoring Readability of. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month = aug, year =
[24]

2023 , isbn =

Trokhymovych, Mykola and Aslam, Muniza and Chou, Ai-Jou and Baeza-Yates, Ricardo and Saez-Trumper, Diego , title =. 2023 , isbn =. doi:10.1145/3580305.3599823 , booktitle =

work page doi:10.1145/3580305.3599823 2023
[25]

Hidden Persuasion: Detecting Manipulative Narratives on Social Media During the 2022 R ussian Invasion of U kraine

Akhynko, Kateryna and Kosovan, Oleksandr and Trokhymovych, Mykola. Hidden Persuasion: Detecting Manipulative Narratives on Social Media During the 2022 R ussian Invasion of U kraine. Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025). 2025. doi:10.18653/v1/2025.unlp-1.19

work page doi:10.18653/v1/2025.unlp-1.19 2022
[26]

The UNLP 2025 Shared Task on Detecting Social Media Manipulation

Kyslyi, Roman and Romanyshyn, Nataliia and Sydorskyi, Volodymyr. The UNLP 2025 Shared Task on Detecting Social Media Manipulation. Proceedings UNLP 2025. 2025. doi:10.18653/v1/2025.unlp-1.12

work page doi:10.18653/v1/2025.unlp-1.12 2025
[27]

Don ' t Trust C hat GPT when your Question is not in E nglish: A Study of Multilingual Abilities and Types of LLM s

Zhang, Xiang and Li, Senyu and Hauer, Bradley and Shi, Ning and Kondrak, Grzegorz. Don ' t Trust C hat GPT when your Question is not in E nglish: A Study of Multilingual Abilities and Types of LLM s. Proceedings of EMNLP'23. 2023. doi:10.18653/v1/2023.emnlp-main.491

work page doi:10.18653/v1/2023.emnlp-main.491 2023