arxiv: 2604.14896 · v1 · submitted 2026-04-16 · 💻 cs.AI

Recognition: unknown

Toward Agentic RAG for Ukrainian

Marta Sumyk , Oleksandr Kosovan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 11:21 UTC · model grok-4.3

classification 💻 cs.AI

keywords agentic RAGUkrainianretrieval-augmented generationdocument understandingquery rephrasinganswer retrymulti-domain tasks

0 comments

The pith

For Ukrainian agentic RAG, retrieval quality forms the main bottleneck even after adding query rephrasing and retry loops.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines an initial setup for agentic retrieval-augmented generation aimed at Ukrainian multi-domain document understanding tasks. It layers a simple agent on top of two-stage retrieval, where the agent rephrases user queries and runs answer-retry loops using a compact instruction model. The central observation is that these agentic steps raise answer accuracy to a degree, yet the end-to-end results stay capped by how well the system locates the right documents and pages. This matters for anyone building practical question-answering tools in Ukrainian because it identifies the step that must improve first. The authors note practical constraints of running such pipelines offline and point toward pairing better retrieval with richer agent reasoning.

Core claim

Our analysis reveals that retrieval quality is the primary bottleneck: agentic retry mechanisms improve answer accuracy but the overall score remains constrained by document and page identification. The system combines two-stage retrieval using BGE-M3 with BGE reranking together with a lightweight agentic layer that performs query rephrasing and answer-retry loops on Qwen2.5-3B-Instruct, yet gains from the agentic layer are limited by upstream retrieval failures in the shared-task evaluation.

What carries the argument

The lightweight agentic layer of query rephrasing and answer-retry loops placed on top of two-stage retrieval, which attempts to recover from retrieval errors but cannot fully compensate for them.

If this is right

Agentic retry mechanisms can incrementally raise answer accuracy in Ukrainian RAG pipelines.
Document and page identification remains the dominant constraint on overall system scores.
Offline agentic pipelines encounter practical limitations when applied to this task.
Stronger retrieval combined with more advanced agentic reasoning offers a viable next direction.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

For other low-resource languages, retrieval quality may similarly outweigh agentic refinements in RAG performance.
General advances in multilingual embedding models could deliver larger gains than changes to the agent architecture alone.
Testing the same pipeline on live user queries instead of shared-task data might surface different limiting factors.

Load-bearing premise

The UNLP 2026 Shared Task metrics and offline evaluation setup accurately reflect real-world utility for Ukrainian users and the chosen BGE models plus Qwen2.5-3B are representative baselines.

What would settle it

An experiment that supplies the correct documents and pages to the agentic layer in advance and checks whether answer accuracy then rises substantially beyond the reported scores.

read the original abstract

We present an initial investigation into Agentic Retrieval-Augmented Generation (RAG) for Ukrainian, conducted within the UNLP 2026 Shared Task on Multi-Domain Document Understanding. Our system combines two-stage retrieval (BGE-M3 with BGE reranking) with a lightweight agentic layer performing query rephrasing and answer-retry loops on top of Qwen2.5-3B-Instruct. Our analysis reveals that retrieval quality is the primary bottleneck: agentic retry mechanisms improve answer accuracy but the overall score remains constrained by document and page identification. We discuss practical limitations of offline agentic pipelines and outline directions for combining stronger retrieval with more advanced agentic reasoning for Ukrainian.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a short note applying standard agentic RAG to Ukrainian in a shared task, with the main point that retrieval quality still limits results more than the retry loops.

read the letter

This paper gives an early report on agentic RAG for Ukrainian inside the UNLP 2026 shared task. It runs two-stage retrieval with BGE-M3 plus reranking, then adds query rephrasing and answer-retry loops on Qwen2.5-3B-Instruct. The central observation is that agentic retries help accuracy a little, but document and page identification remain the real constraint on overall scores. They also note practical limits of running these pipelines offline and suggest pairing stronger retrieval with more advanced reasoning next.

Referee Report

2 major / 2 minor

Summary. The paper presents an initial investigation into Agentic Retrieval-Augmented Generation (RAG) for Ukrainian within the UNLP 2026 Shared Task on Multi-Domain Document Understanding. The system uses two-stage retrieval (BGE-M3 with BGE reranking) plus a lightweight agentic layer on Qwen2.5-3B-Instruct for query rephrasing and answer-retry loops. The central claim is that retrieval quality—specifically document and page identification—remains the primary bottleneck even after agentic improvements, with practical limitations of offline pipelines discussed and directions for stronger retrieval plus advanced reasoning outlined.

Significance. If the empirical analysis holds, the work usefully highlights retrieval as the dominant constraint for agentic RAG in a low-resource language setting and the limited gains from retry/rephrasing loops. It contributes an early case study on offline agentic pipelines for Ukrainian multi-domain tasks and identifies concrete next steps (stronger retrieval + advanced reasoning). No reproducible code, parameter-free derivations, or falsifiable predictions are provided.

major comments (2)

[Abstract / §3] Abstract and §3 (experimental analysis): the claim that 'retrieval quality is the primary bottleneck' and that 'agentic retry mechanisms improve answer accuracy' is stated without any quantitative scores, ablation tables, error analysis, or per-component metrics. The manuscript must supply these (e.g., accuracy deltas with/without the agentic layer, document/page identification rates) to make the central claim verifiable.
[§2] §2 (system description): the two-stage retrieval (BGE-M3 + reranker) and Qwen2.5-3B-Instruct agent are presented as representative baselines, yet no justification or comparison to other Ukrainian-capable retrievers/generators is given; this weakens the generality of the bottleneck conclusion.

minor comments (2)

[§4] The shared-task metric definitions and offline evaluation protocol should be briefly restated or cited so readers can assess whether they align with the claimed real-world utility for Ukrainian users.
[§2] Notation for the agentic loop (rephrasing vs. retry) is introduced without a diagram or pseudocode; a small figure would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our initial investigation. We address each major comment below and indicate the revisions planned for the next version of the manuscript.

read point-by-point responses

Referee: [Abstract / §3] Abstract and §3 (experimental analysis): the claim that 'retrieval quality is the primary bottleneck' and that 'agentic retry mechanisms improve answer accuracy' is stated without any quantitative scores, ablation tables, error analysis, or per-component metrics. The manuscript must supply these (e.g., accuracy deltas with/without the agentic layer, document/page identification rates) to make the central claim verifiable.

Authors: We agree that the claims require quantitative support to be verifiable. The current version presents preliminary observations from an initial investigation without detailed metrics. In the revised manuscript we will add accuracy deltas with and without the agentic layer, document and page identification rates, ablation tables, and error analysis to substantiate the central claims. revision: yes
Referee: [§2] §2 (system description): the two-stage retrieval (BGE-M3 + reranker) and Qwen2.5-3B-Instruct agent are presented as representative baselines, yet no justification or comparison to other Ukrainian-capable retrievers/generators is given; this weakens the generality of the bottleneck conclusion.

Authors: We acknowledge that the absence of explicit justification and comparisons limits the generality of the bottleneck conclusion. Our component choices were driven by the offline pipeline constraints and task requirements of the UNLP 2026 Shared Task. In the revision we will add a discussion justifying these selections as representative baselines while noting the scope limitations for broader comparisons. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is a purely empirical report on an implemented Agentic RAG pipeline for a shared task. It describes a concrete system (two-stage BGE retrieval plus Qwen2.5-3B agentic retry) and states an observational conclusion about retrieval being the bottleneck. No derivations, equations, fitted parameters renamed as predictions, or self-citations appear in the provided text. The central claim is an experimental finding, not a reduction of any output to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical content; the paper rests on standard NLP assumptions that BGE embeddings work for Ukrainian retrieval and that the shared-task evaluation reflects practical performance.

pith-pipeline@v0.9.0 · 5402 in / 934 out tokens · 29436 ms · 2026-05-10T11:21:18.022842+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

8 extracted references · 6 canonical work pages · 3 internal anchors

[1]

Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection

Self-RAG: Learning to retrieve, generate, and critique through self-reflection. arXiv preprint arXiv:2310.11511. Tiberiu Boros, Radu Chivereanu, Stefan Dumitrescu, and Octavian Purcaru

work page internal anchor Pith review arXiv
[2]

InProceed- ings of the Third Ukrainian Natural Language Pro- cessing Workshop (UNLP) @ LREC-COLING 2024, pages 75–82, Torino, Italia

Fine-tuning and re- trieval augmented generation for question answering using affordable large language models. InProceed- ings of the Third Ukrainian Natural Language Pro- cessing Workshop (UNLP) @ LREC-COLING 2024, pages 75–82, Torino, Italia. ELRA and ICCL. Yunfan Gao, Yun Xiong, Xinyu Gao, Kangxiang Jia, Jinliu Pan, Yuxi Bi, Yi Dai, Jiawei Sun, Meng W...

2024
[3]

Retrieval-augmented gener- ation for large language models: A survey.Preprint, arXiv:2312.10997. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Hein- rich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, Sebastian Riedel, and Douwe Kiela

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Xuying Ning, Dongqi Fu, Tianxin Wei, Mengting Ai, Jiaru Zou, Ting-Wei Li, Hanghang Tong, Yada Zhu, Hendrik Hamann, and Jingrui He

To- wards agentic rag with deep reasoning: A sur- vey of rag-reasoning systems in llms.Preprint, arXiv:2507.09477. Xuying Ning, Dongqi Fu, Tianxin Wei, Mengting Ai, Jiaru Zou, Ting-Wei Li, Hanghang Tong, Yada Zhu, Hendrik Hamann, and Jingrui He

work page arXiv
[5]

Mariana Romanyshyn, Oleksiy Syvokon, and Roman Kyslyi

Mc-search: Evaluating and enhancing multimodal agentic search with structured long reasoning chains.Preprint, arXiv:2603.00873. Mariana Romanyshyn, Oleksiy Syvokon, and Roman Kyslyi

work page arXiv
[6]

InPro- ceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 67–74, Torino, Italia

The UNLP 2024 shared task on fine- tuning large language models for Ukrainian. InPro- ceedings of the Third Ukrainian Natural Language Processing Workshop (UNLP) @ LREC-COLING 2024, pages 67–74, Torino, Italia. ELRA and ICCL. Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Ta- laei Khoei

2024
[7]

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Agentic retrieval-augmented generation: A survey on agentic rag.Preprint, arXiv:2501.09136. Mykola Trokhymovych and Oleksandr Kosovan

work page internal anchor Pith review arXiv
[8]

Infodeepseek: Bench- marking agentic information seeking for retrieval- augmented generation.Preprint, arXiv:2505.15872

work page arXiv