arxiv: 2605.10296 · v1 · submitted 2026-05-11 · 💻 cs.CL · cs.AI· cs.IR· cs.LG

Recognition: no theorem link

Qwen Goes Brrr: Off-the-Shelf RAG for Ukrainian Multi-Domain Document Understanding

Anton Bazdyrev , Ivan Bashtovyi , Ivan Havlytskyi , Oleksandr Kharytonov , Artur Khodakovskyi

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:06 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.IRcs.LG

keywords RAGdocument understandingUkrainian QAretrieval augmented generationrerankingmultiple choice questionsPDF processingQwen models

0 comments

The pith

A retrieval-augmented pipeline with structure-preserving PDF chunking and answer-option-aware reranking reaches 96 percent accuracy on Ukrainian multi-domain document QA.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that a straightforward retrieval-augmented pipeline can effectively answer multiple-choice questions in Ukrainian across various document domains. By chunking PDFs in a way that maintains their original structure and using a reranker that takes into account both the question and all answer options, the system improves how well it finds relevant passages. This leads to higher accuracy in selecting the correct answer from a small number of top passages. The results indicate that these targeted choices in retrieval and ranking are more useful than building elaborate additional processing steps, at least within the limits of a shared task competition.

Core claim

The authors built a RAG pipeline that chunks PDFs while preserving their structure, retrieves passages with a dense embedder, reranks them using a model fine-tuned to consider both the question and answer choices, and then generates the answer from the top passages using a large language model. On held-out data this raised retrieval recall at one from 0.70 to 0.79 and answer accuracy from 0.93 to 0.97, with leaderboard scores of 0.945 and 0.960. The work claims that these two design choices—structure preservation and answer-space-aware relevance scoring—outperform the addition of complex downstream heuristics under competition rules.

What carries the argument

Contextual chunking of PDFs paired with reranking that conditions on both the question and the set of answer options.

If this is right

Reranking that incorporates answer options improves the quality of retrieved passages for multiple-choice questions.
Limiting generation to the top two reranked passages is enough to reach high answer accuracy.
Preserving the original layout and order in PDF chunking aids retrieval in multi-domain document collections.
Off-the-shelf large language models can serve as the backbone for both retrieval and answer selection in this setting.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline might work for other low-resource languages that have similar PDF-based document collections.
Answer-aware reranking could reduce the need for task-specific fine-tuning in other retrieval-augmented QA applications.
If document structure varies greatly across domains, the chunking method may need adaptation for best results.

Load-bearing premise

The test questions and documents in the shared task represent the distribution of real-world Ukrainian multi-domain document understanding problems, and the benefits of the reranking step will appear on entirely new document collections without any additional tuning.

What would settle it

Measuring performance on a fresh collection of Ukrainian PDFs drawn from different domains or with altered question formats; a substantial drop below the reported accuracy would indicate the approach does not generalize as claimed.

read the original abstract

We participated in the Fifth UNLP shared task on multi-domain document understanding, where systems must answer Ukrainian multiple-choice questions from PDF collections and localize the supporting document and page. We propose a retrieval-augmented pipeline built around three ideas: contextual chunking of PDFs, question-aware dense retrieval and reranking conditioned on both the question and answer options, and constrained answer generation from a small set of reranked passages. Our final system uses Qwen3-Embedding-8B for retrieval, a fine-tuned Qwen3-Reranker-8B for passage ranking, and Qwen3-32B for answer selection. On a held-out split, reranking improves Recall@1 from 0.6957 to 0.7935, while using the top-2 reranked passages raises answer accuracy from 0.9348 to 0.9674. Our best leaderboard run reached 0.9452 on the public leaderboard and 0.9598 on the private leaderboard. Our results suggest that, under strict code-competition constraints, preserving document structure and making relevance estimation aware of the answer space are more effective than adding complex downstream heuristics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper reports on a system for the Fifth UNLP shared task on Ukrainian multi-domain document understanding, where the task is to answer multiple-choice questions from PDF collections and localize supporting documents and pages. The proposed off-the-shelf RAG pipeline features contextual chunking to preserve document structure, question-aware dense retrieval using Qwen3-Embedding-8B, reranking with a fine-tuned Qwen3-Reranker-8B conditioned on the question and answer options, and constrained generation using Qwen3-32B from the top reranked passages. On a held-out split, the system shows Recall@1 improving from 0.6957 to 0.7935 with reranking and answer accuracy from 0.9348 to 0.9674 with top-2 passages. Leaderboard results are 0.9452 public and 0.9598 private. The authors conclude that under strict constraints, structure preservation and answer-space-aware relevance estimation outperform complex downstream heuristics.

Significance. Assuming the empirical results are robust, this work contributes a practical demonstration that targeted use of large language models for retrieval and reranking, with emphasis on document structure and answer option awareness, can deliver strong performance in a challenging multilingual, multi-domain setting. It provides concrete evidence favoring simpler RAG designs over heuristic-heavy approaches in competition-like environments, which may generalize to other low-resource language document understanding tasks. The specific model choices and metric improvements offer a useful reference point for the community.

major comments (2)

[Abstract] Abstract: The central claim that 'preserving document structure and making relevance estimation aware of the answer space are more effective than adding complex downstream heuristics' lacks direct comparative evidence. The reported results only show gains from reranking (Recall@1 from 0.6957 to 0.7935) and top-2 usage (accuracy from 0.9348 to 0.9674) within the proposed pipeline; no ablations or baselines that incorporate complex heuristics (such as multi-hop LLM reasoning or ensemble retrieval) are provided to support the superiority inference.
[Evaluation on held-out split] Evaluation on held-out split: The numeric lifts are presented without error bars, confidence intervals, or statistical tests, and the manuscript provides no details on the construction of the held-out split or its representativeness relative to the leaderboard test distribution. This weakens support for the generalizability claim in the abstract.

minor comments (1)

The abstract would benefit from a short overview sentence listing the three core pipeline components before the results, to improve immediate readability for readers unfamiliar with the shared task.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of our work's significance. We address the two major comments point by point below, with plans for targeted revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that 'preserving document structure and making relevance estimation aware of the answer space are more effective than adding complex downstream heuristics' lacks direct comparative evidence. The reported results only show gains from reranking (Recall@1 from 0.6957 to 0.7935) and top-2 usage (accuracy from 0.9348 to 0.9674) within the proposed pipeline; no ablations or baselines that incorporate complex heuristics (such as multi-hop LLM reasoning or ensemble retrieval) are provided to support the superiority inference.

Authors: We agree that the manuscript lacks direct ablations or baselines against complex heuristic approaches such as multi-hop LLM reasoning or ensemble retrieval. The claim in the abstract is an inference drawn from our pipeline's strong performance (0.9598 private leaderboard) in the shared task under strict constraints, where we avoided such methods. Since we lack access to other participants' internal designs, direct comparisons are not possible. We will revise the abstract to qualify the language, stating that our results suggest these design choices are effective in this constrained setting rather than claiming broad superiority. We will also add a clarifying sentence in the discussion section. revision: partial
Referee: [Evaluation on held-out split] Evaluation on held-out split: The numeric lifts are presented without error bars, confidence intervals, or statistical tests, and the manuscript provides no details on the construction of the held-out split or its representativeness relative to the leaderboard test distribution. This weakens support for the generalizability claim in the abstract.

Authors: We acknowledge that error bars, confidence intervals, and statistical tests are absent, as all experiments were single-run under shared-task time and compute limits. The held-out split was formed by randomly sampling 20% of the organizers' training data with domain stratification to preserve multi-domain coverage; we will add this explicit description to the evaluation section. We will also note the single-run limitation and its implications for generalizability claims. Recomputing with multiple seeds for error bars is not feasible in the current revision timeline, but the observed lifts align with the final leaderboard results. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical results rest on external leaderboard evaluation

full rationale

The manuscript describes an empirical RAG pipeline for a shared-task competition, reporting Recall@1, accuracy, and leaderboard scores obtained from held-out splits and public/private test sets. No equations, fitted parameters renamed as predictions, self-citations used as load-bearing uniqueness theorems, or ansatzes smuggled via prior work appear in the provided text. The central suggestion that structure preservation and answer-aware reranking outperform complex heuristics is an interpretive claim drawn from the observed gains (e.g., reranking lifting Recall@1 from 0.6957 to 0.7935), not a derivation that reduces to its own inputs by construction. Evaluation relies on an external competition benchmark rather than internally generated quantities, satisfying the criteria for a self-contained empirical result.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The contribution is empirical and relies on standard NLP assumptions about dense retrieval and LLM generation; the only explicit free parameter visible in the abstract is the choice of top-2 passages.

free parameters (1)

number of reranked passages for generation
The paper states that using the top-2 reranked passages raises accuracy from 0.9348 to 0.9674, indicating a tuned hyperparameter.

axioms (1)

domain assumption Question-and-answer-option-aware dense retrieval plus reranking will surface the correct supporting passage for multiple-choice QA.
Implicit in the pipeline design and not further justified in the abstract.

pith-pipeline@v0.9.0 · 5537 in / 1124 out tokens · 67896 ms · 2026-05-12T05:06:53.800606+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 5 internal anchors

[1]

Aho and Jeffrey D

Alfred V. Aho and Jeffrey D. Ullman , title =. 1972

work page 1972
[2]

Publications Manual , year = "1983", publisher =

work page 1983
[3]

Chandra and Dexter C

Ashok K. Chandra and Dexter C. Kozen and Larry J. Stockmeyer , year = "1981", title =. doi:10.1145/322234.322243

work page doi:10.1145/322234.322243 1981
[4]

Scalable training of

Andrew, Galen and Gao, Jianfeng , booktitle=. Scalable training of

work page
[5]

Dan Gusfield , title =. 1997

work page 1997
[6]

Tetreault , title =

Mohammad Sadegh Rasooli and Joel R. Tetreault , title =. Computing Research Repository , volume =. 2015 , url =

work page 2015
[7]

Docling Technical Report

Deep Search Team , month =. Docling Technical Report , url =. 2408.09869 , doi =

work page arXiv
[8]

A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =

Ando, Rie Kubota and Zhang, Tong , Issn =. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , Volume =. Journal of Machine Learning Research , Month = dec, Numpages =

work page
[9]

Ivanyuk-Skulskiy, Bogdan and Zaliznyi, Anton and Reshetar, Oleksand and Protsyk, Oleksiy and Romanchuk, Bohdan and Shpihanovych, Vladyslav , month = oct, title =

work page
[13]

2024 , eprint=

Revolutionizing Retrieval-Augmented Generation with Enhanced PDF Structure Recognition , author=. 2024 , eprint=

work page 2024
[15]

2025 , eprint=

Reconstructing Context: Evaluating Advanced Chunking Strategies for Retrieval-Augmented Generation , author=. 2025 , eprint=

work page 2025
[16]

2025 , eprint=

HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking , author=. 2025 , eprint=

work page 2025
[20]

2025 , eprint=

Late Chunking: Contextual Chunk Embeddings Using Long-Context Embedding Models , author=. 2025 , eprint=

work page 2025
[23]

CoRR , volume =

Tri Nguyen and Mir Rosenberg and Xia Song and Jianfeng Gao and Saurabh Tiwary and Rangan Majumder and Li Deng , title =. CoRR , volume =. 2016 , url =

work page 2016
[24]

arXiv preprint , year=

GooAQ: Open Question Answering with Diverse Answer Types , author=. arXiv preprint , year=

work page
[25]

2024 , publisher =

Tom Aarsen , title =. 2024 , publisher =

work page 2024
[26]

Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =

Qwen Team , month =. Qwen3.5: Accelerating Productivity with Native Multimodal Agents , url =

work page
[28]

2025 , eprint=

Gemma 3 Technical Report , author=. 2025 , eprint=

work page 2025
[29]

2026 , eprint=

Diffusion-Pretrained Dense and Contextual Embeddings , author=. 2026 , eprint=

work page 2026
[30]

2025 , eprint=

Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models , author=. 2025 , eprint=

work page 2025
[31]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025
[32]

MamayLM v1.0: An efficient state-of-the-art multimodal Ukrainian LLM , author=

work page
[33]

Paniv, Yurii and Didenko, Bohdan and Haltiuk, Mykola and Humennyy, Vladyslav and Kravchenko, Andrian and Kyslyi, Roman and Makovska, Viktoriia and Orlovskyi, Artem and Ruban, Bohdan and Rudko, Maksym-Yurii and Senyk, Anastasiia and Drushchak, Nazarii and Chaplynskyi, Dmytro and Romanyshyn, Mariana , month = oct, title =

work page
[34]

2025 , month = mar, day =

Daniel Han and Michael Han , title =. 2025 , month = mar, day =

work page 2025
[35]

2024 , eprint=

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration , author=. 2024 , eprint=

work page 2024
[36]

Volodymyr Sydorskyi and Nataliia Romanyshyn and Roman Kyslyi and Olena Nahorna , booktitle =. The. 2026 , address =

work page 2026
[37]

Tom Aarsen. 2024. natural-questions-hard-negatives. https://huggingface.co/datasets/tomaarsen/natural-questions-hard-negatives. Hugging Face dataset, accessed 2026-04-08

work page 2024
[38]

Anton Bazdyrev, Ivan Bashtovyi, Ivan Havlytskyi, Oleksandr Kharytonov, and Artur Khodakovskyi. 2025. https://doi.org/10.18653/v1/2025.unlp-1.13 Transforming causal LLM into MLM encoder for detecting social media manipulation in telegram . In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 112--119, Vienna, Austr...

work page doi:10.18653/v1/2025.unlp-1.13 2025
[39]

Tong Chen, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hongming Zhang, and Dong Yu. 2024. https://doi.org/10.18653/v1/2024.emnlp-main.845 Dense X retrieval: What retrieval granularity should we use? In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 15159--15177, Miami, Florida, USA. Associati...

work page doi:10.18653/v1/2024.emnlp-main.845 2024
[40]

Sedigheh Eslami, Maksim Gaiduk, Markus Krimmel, Louis Milliken, Bo Wang, and Denis Bykov. 2026. https://arxiv.org/abs/2602.11151 Diffusion-pretrained dense and contextual embeddings . Preprint, arXiv:2602.11151

work page arXiv 2026
[41]

Mara Finkelstein, Isaac Caswell, Tobias Domhan, Jan-Thorsten Peter, Juraj Juraska, Parker Riley, Daniel Deutsch, Geza Kovacs, Cole Dilanni, Colin Cherry, Eleftheria Briakou, Elizabeth Nielsen, Jiaming Luo, Kat Black, Ryan Mullins, Sweta Agrawal, Wenda Xu, Erin Kats, Stephane Jaskiewicz, and 2 others. 2026. https://arxiv.org/abs/2601.09012 Translate G emma...

work page arXiv 2026
[42]

Michael Günther, Isabelle Mohr, Daniel James Williams, Bo Wang, and Han Xiao. 2025. https://arxiv.org/abs/2409.04701 Late chunking: Contextual chunk embeddings using long-context embedding models . Preprint, arXiv:2409.04701

work page arXiv 2025
[43]

Mykola Haltiuk and Aleksander Smywi \'n ski-Pohl. 2025. https://doi.org/10.18653/v1/2025.unlp-1.14 On the path to make U krainian a high-resource language . In Proceedings of the Fourth Ukrainian Natural Language Processing Workshop (UNLP 2025), pages 120--130, Vienna, Austria (online). Association for Computational Linguistics

work page doi:10.18653/v1/2025.unlp-1.14 2025
[44]

Daniel Han and Michael Han. 2025. https://unsloth.ai/blog/gemma3 Fine-tune & run gemma 3

work page 2025
[45]

Bogdan Ivanyuk-Skulskiy, Anton Zaliznyi, Oleksand Reshetar, Oleksiy Protsyk, Bohdan Romanchuk, and Vladyslav Shpihanovych. 2021. https://github.com/fido-ai/ua-datasets ua\_datasets: a collection of ukrainian language datasets

work page 2021
[46]

Daniel Khashabi, Amos Ng, Tushar Khot, Ashish Sabharwal, Hannaneh Hajishirzi, and Chris Callison-Burch. 2021. Gooaq: Open question answering with diverse answer types. arXiv preprint

work page 2021
[47]

Peerat Limkonchotiwat, Wuttikorn Ponwitayarat, Lalita Lowphansirikul, Potsawee Manakul, Can Udomcharoenchaikit, Ekapol Chuangsuwanich, and Sarana Nutanong. 2024. https://doi.org/10.18653/v1/2024.findings-emnlp.157 M c C rolin: Multi-consistency cross-lingual training for retrieval question answering . In Findings of the Association for Computational Lingu...

work page doi:10.18653/v1/2024.findings-emnlp.157 2024
[48]

Demiao Lin. 2024. https://arxiv.org/abs/2401.12599 Revolutionizing retrieval-augmented generation with enhanced pdf structure recognition . Preprint, arXiv:2401.12599

work page arXiv 2024
[49]

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. 2024. https://arxiv.org/abs/2306.00978 Awq: Activation-aware weight quantization for llm compression and acceleration . Preprint, arXiv:2306.00978

work page internal anchor Pith review Pith/arXiv arXiv 2024
[50]

Wensheng Lu, Keyu Chen, Ruizhi Qiao, and Xing Sun. 2025. https://arxiv.org/abs/2509.11552 Hichunk: Evaluating and enhancing retrieval-augmented generation with hierarchical chunking . Preprint, arXiv:2509.11552

work page arXiv 2025
[51]

Carlo Merola and Jaspinder Singh. 2025. https://arxiv.org/abs/2504.19754 Reconstructing context: Evaluating advanced chunking strategies for retrieval-augmented generation . Preprint, arXiv:2504.19754

work page arXiv 2025
[52]

Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. https://arxiv.org/abs/1611.09268 MS MARCO: A human generated machine reading comprehension dataset . CoRR, abs/1611.09268

work page internal anchor Pith review arXiv 2016
[53]

Yurii Paniv, Bohdan Didenko, Mykola Haltiuk, Vladyslav Humennyy, Andrian Kravchenko, Roman Kyslyi, Viktoriia Makovska, Artem Orlovskyi, Bohdan Ruban, Maksym-Yurii Rudko, Anastasiia Senyk, Nazarii Drushchak, Dmytro Chaplynskyi, and Mariana Romanyshyn. 2025. https://github.com/lapa-llm/lapa-llm/ Lapa LLM v0.1.2 — the most efficient Ukrainian open-source lan...

work page 2025
[54]

Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. https://doi.org/10.18653/v1/P18-2124 Know what you don ' t know: Unanswerable questions for SQ u AD . In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 784--789, Melbourne, Australia. Association for Computational Linguistics

work page doi:10.18653/v1/p18-2124 2018
[55]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. https://doi.org/10.18653/v1/D16-1264 SQ u AD : 100,000+ questions for machine comprehension of text . In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383--2392, Austin, Texas. Association for Computational Linguistics

work page doi:10.18653/v1/d16-1264 2016
[56]

Volodymyr Sydorskyi, Nataliia Romanyshyn, Roman Kyslyi, and Olena Nahorna. 2026. The UNLP 2026 shared task on multi-domain document understanding. In Proceedings of the Fifth Ukrainian Natural Language Processing Conference (UNLP 2026), Lviv, Ukraine. Association for Computational Linguistics. To appear

work page 2026
[57]

Deep Search Team. 2024. https://doi.org/10.48550/arXiv.2408.09869 Docling technical report . Technical report

work page doi:10.48550/arxiv.2408.09869 2024
[58]

Gemma Team, Aishwarya Kamath, Johan Ferret, Shreya Pathak, Nino Vieillard, Ramona Merhej, Sarah Perrin, Tatiana Matejovicova, Alexandre Ramé, Morgane Rivière, Louis Rouillard, Thomas Mesnard, Geoffrey Cideron, Jean bastien Grill, Sabela Ramos, Edouard Yvinec, Michelle Casbon, Etienne Pot, Ivo Penchev, and 197 others. 2025. https://arxiv.org/abs/2503.19786...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[59]

Qwen Team. 2026. https://qwen.ai/blog?id=qwen3.5 Qwen3.5: Accelerating productivity with native multimodal agents

work page 2026
[60]

Dingmin Wang, Qiuyuan Huang, Matthew Jackson, and Jianfeng Gao. 2024. https://doi.org/10.1162/tacl_a_00646 Retrieve what you need: A mutual learning framework for open-domain question answering . Transactions of the Association for Computational Linguistics, 12:247--263

work page doi:10.1162/tacl_a_00646 2024
[61]

Zhitong Wang, Cheng Gao, Chaojun Xiao, Yufei Huang, Shuzheng Si, Kangyang Luo, Yuzhuo Bai, Wenhao Li, Tangjian Duan, Chuancheng Lv, Guoshan Lu, Gang Chen, Fanchao Qi, and Maosong Sun. 2025. https://doi.org/10.18653/v1/2025.findings-acl.422 Document segmentation matters for retrieval-augmented generation . In Findings of the Association for Computational L...

work page doi:10.18653/v1/2025.findings-acl.422 2025
[62]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, and 41 others. 2025. https://arxiv.org/abs/2505.09388 Qwen3 technical report . Preprint, arXiv:2505.09388

work page internal anchor Pith review Pith/arXiv arXiv 2025
[63]

Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. https://doi.org/10.18653/v1/D18-1259 H otpot QA : A dataset for diverse, explainable multi-hop question answering . In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2369--2380, Brussels...

work page doi:10.18653/v1/d18-1259 2018
[64]

Hanna Yukhymenko, Anton Alexandrov, and Martin Vechev. 2025. Mamaylm v1.0: An efficient state-of-the-art multimodal ukrainian llm

work page 2025
[65]

Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, Fei Huang, and Jingren Zhou. 2025. https://arxiv.org/abs/2506.05176 Qwen3 embedding: Advancing text embedding and reranking through foundation models . Preprint, arXiv:2506.05176

work page internal anchor Pith review Pith/arXiv arXiv 2025