pith. machine review for the scientific record. sign in

arxiv: 2604.25182 · v1 · submitted 2026-04-28 · 💻 cs.CL · cs.IR

Recognition: unknown

CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

Authors on Pith no claims yet

Pith reviewed 2026-05-07 16:37 UTC · model grok-4.3

classification 💻 cs.CL cs.IR
keywords cross-lingual knowledgeretrieval-augmented generationreinforcement learningmultilingual RAGmulti-turn retrievalmultilingual rolloutGRPO
0
0 comments X

The pith

CroSearch-R1 integrates cross-lingual knowledge into RAG via reinforcement learning to improve results on multilingual collections.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that a new reinforcement learning framework called CroSearch-R1 can overcome the limits of simply pasting knowledge from multiple languages into RAG contexts, where language disparities often cancel out any gains. It does this by adding a multi-turn retrieval step that pulls in and aligns facts from other languages as extra evidence, plus a multilingual rollout step that helps the model transfer reasoning patterns across languages. A reader would care because many real-world document collections mix languages, and a method that reliably uses the best facts from any language could raise answer accuracy without extra manual work. The authors report that the approach succeeds at leveraging cross-lingual complementarity.

Core claim

CroSearch-R1 adopts a multi-turn retrieval strategy with cross-lingual knowledge integration to dynamically align the knowledge from other languages as supplementary evidence into a unified representation space and introduces a multilingual rollout mechanism to optimize reasoning transferability across languages, thereby integrating multilingual knowledge into the Group Relative Policy Optimization process for better RAG.

What carries the argument

CroSearch-R1, a search-augmented reinforcement learning framework that folds multilingual knowledge into the Group Relative Policy Optimization (GRPO) process through multi-turn retrieval and multilingual rollout.

Load-bearing premise

The multi-turn retrieval strategy and multilingual rollout can align knowledge from other languages into a single space without adding new disparities or noise that would erase the gains.

What would settle it

Evaluate the system on a multilingual test set containing deliberate factual conflicts across languages and check whether answer accuracy rises or falls compared with single-language RAG baselines.

Figures

Figures reproduced from arXiv: 2604.25182 by Fengran Mo, Jian-Yun Nie, Kaiyu Huang, Rui Qi, Sijin Lu, Yufeng Chen.

Figure 1
Figure 1. Figure 1: Performance variation caused by language collections. view at source ↗
read the original abstract

A multilingual collection may contain useful knowledge in other languages to supplement and correct the facts in the original language for Retrieval-Augmented Generation (RAG). However, the vanilla approach that simply concatenates multiple pieces of knowledge from different languages into the context may fail to improve effectiveness due to the potential disparities across languages. To better leverage multilingual knowledge, we propose CroSearch-R1, a search-augmented reinforcement learning framework to integrate multilingual knowledge into the Group Relative Policy Optimization (GRPO) process. In particular, the approach adopts a multi-turn retrieval strategy with cross-lingual knowledge integration to dynamically align the knowledge from other languages as supplementary evidence into a unified representation space. Furthermore, we introduce a multilingual rollout mechanism to optimize reasoning transferability across languages. Experimental results demonstrate that our framework effectively leverages cross-lingual complementarity and improves the effectiveness of RAG with multilingual collections.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces CroSearch-R1, a search-augmented reinforcement learning framework for Retrieval-Augmented Generation (RAG) over multilingual collections. It augments Group Relative Policy Optimization (GRPO) with a multi-turn retrieval strategy that dynamically integrates cross-lingual knowledge into a unified representation space and a multilingual rollout mechanism intended to improve reasoning transferability across languages. The central claim is that this approach better exploits cross-lingual complementarity than vanilla concatenation of passages from different languages, which can suffer from disparities.

Significance. If the empirical improvements can be shown to arise specifically from cross-lingual alignment rather than retrieval volume or RL effects, the work would offer a practical method for leveraging multilingual knowledge bases in RAG systems. The integration of dynamic retrieval inside GRPO and the rollout for cross-lingual reasoning transfer represent a reasonable technical direction that could influence future multilingual LLM applications.

major comments (1)
  1. [Experimental Results] The experimental evaluation lacks ablations that isolate the contribution of cross-lingual complementarity. In particular, there is no comparison of the proposed multi-turn cross-lingual retrieval against a same-language multi-turn baseline that uses an identical number of turns and token budget. Without such controls, it remains possible that reported gains track increased context length or the GRPO optimization rather than language-specific alignment, which is load-bearing for the claim that the framework 'effectively leverages cross-lingual complementarity'.
minor comments (2)
  1. [Abstract] The abstract states that 'experimental results demonstrate' improvement but supplies no metrics, datasets, or baseline names, forcing readers to reach the full experimental section before any quantitative assessment is possible.
  2. [Method] The description of the multilingual rollout mechanism would benefit from a short pseudocode listing or diagram to clarify how rollouts are generated and scored across languages.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback and for identifying an important gap in our experimental design. We address the major comment below and will revise the manuscript to incorporate the suggested control.

read point-by-point responses
  1. Referee: [Experimental Results] The experimental evaluation lacks ablations that isolate the contribution of cross-lingual complementarity. In particular, there is no comparison of the proposed multi-turn cross-lingual retrieval against a same-language multi-turn baseline that uses an identical number of turns and token budget. Without such controls, it remains possible that reported gains track increased context length or the GRPO optimization rather than language-specific alignment, which is load-bearing for the claim that the framework 'effectively leverages cross-lingual complementarity'.

    Authors: We appreciate this observation. Our existing experiments compare CroSearch-R1 to vanilla multilingual concatenation and to GRPO without the cross-lingual components, and the gains are consistent with the benefits of dynamic alignment and multilingual rollout. However, we agree that a same-language multi-turn retrieval baseline using identical turn count and token budget would more cleanly isolate the contribution of cross-lingual complementarity from simple increases in retrieval volume. We will add this ablation to the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No derivation chain present; empirical framework evaluated via experiments

full rationale

The paper presents a proposed framework (CroSearch-R1) with multi-turn retrieval and multilingual rollout inside GRPO, supported by experimental results on multilingual RAG. No equations, derivations, or mathematical claims are advanced in the abstract or described structure. The central claim reduces to reported performance improvements rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation chain. External benchmarks and ablations would be needed to assess correctness of the complementarity assumption, but no circular reduction exists in the presented material.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract contains no mathematical derivations, fitted parameters, axioms, or new postulated entities; the contribution is described at the level of a high-level algorithmic framework.

pith-pipeline@v0.9.0 · 5459 in / 1007 out tokens · 54379 ms · 2026-05-07T16:37:11.946917+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 32 canonical work pages · 8 internal anchors

  1. [1]

    Syed Rameel Ahmad. 2024. Enhancing Multilingual Information Retrieval in Mixed Human Resources Environments: A RAG Model Implementation for Mul- ticultural Enterprise. arXiv:2401.01511 [cs.IR] https://arxiv.org/abs/2401.01511

  2. [2]

    Nadezhda Chirkova, David Rau, Hervé Déjean, Thibault Formal, Stéphane Clin- chant, and Vassilina Nikoulina. 2024. Retrieval-augmented generation in multi- lingual settings. InProceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024). 177–188

  3. [3]

    Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. 2024. Scaling instruction-finetuned language models.Journal of Machine Learning Research25, 70 (2024), 1–53

  4. [4]

    Chenxu Cui, Haihui Fan, Jinchao Zhang, Lin Shen, Bo Li, and Weiping Wang

  5. [5]

    A large-scale study of reranker relevance feedback at inference

    CIRAG: Retrieval-Augmented Language Model with Collective Intelligence. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia T rovato et al. Computing Machinery, New York, NY , USA, 1316–1326. doi:10...

  6. [6]

    Ping Guo, Yubing Ren, Yue Hu, Yanan Cao, Yunpeng Li, and Heyan Huang

  7. [7]

    InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24)

    Steering Large Language Models for Cross-lingual Information Retrieval. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 585–596. doi:10. 1145/3626772.3657819

  8. [8]

    Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. 2020. Constructing A Multi-hop QA Dataset for Comprehensive Evaluation of Reason- ing Steps. InProceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Sp...

  9. [10]

    Asif Hossain, Nabil Subhan, Mantasha Rahman Mahi, and Jannatul Ferdous Nabila

    Md. Asif Hossain, Nabil Subhan, Mantasha Rahman Mahi, and Jannatul Ferdous Nabila. 2026. Cost-Efficient Cross-Lingual Retrieval-Augmented Generation for Low-Resource Languages: A Case Study in Bengali Agricultural Advisory. arXiv:2601.02065 [cs.CL] https://arxiv.org/abs/2601.02065

  10. [11]

    Yuntong Hu, Zhihan Lei, Zhongjie Dai, Allen Zhang, Abhinav Angirekula, Zheng Zhang, and Liang Zhao. 2025. CG-RAG: Research Question Answering by Cita- tion Graph Retrieval-Augmented LLMs. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy)(SIGIR ’25). Association for Computing Ma...

  11. [12]

    Chao Huang, Fengran Mo, Yufeng Chen, Changhao Guan, Zhenrui Yue, Xinyu Wang, Jinan Xu, and Kaiyu Huang. 2025. Boosting Data Utilization for Multi- lingual Dense Retrieval. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Associat...

  12. [13]

    Kaiyu Huang, Fengran Mo, Xinyu Zhang, Hongliang Li, You Li, Yuanchi Zhang, Weijian Yi, Yulong Mao, Jinchen Liu, Yuzhuang Xu, et al. 2026. A survey on large language models with multilingualism: Recent advances and new frontiers. Artificial Intelligence Review(2026)

  13. [14]

    Andreea Iana, Goran Glavaš, and Heiko Paulheim. 2024. MIND Your Language: A Multilingual Dataset for Cross-lingual News Recommendation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY , USA, 553–563. doi:10.11...

  14. [15]

    Can Iscan, Muhammet Furkan Ozara, and Akhan Akbulut. 2024. Enhancing RAG Pipeline Performance with Translation-Based Embedding Strategies for Non- English Documents. In2024 Innovations in Intelligent Systems and Applications Conference (ASYU). 1–6. doi:10.1109/ASYU62119.2024.10756977

  15. [16]

    Bowen Jin, Hansi Zeng, Zhenrui Yue, Jinsung Yoon, Sercan Arik, Dong Wang, Hamed Zamani, and Jiawei Han. 2025. Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning. arXiv:2503.09516 [cs.CL] https://arxiv.org/abs/2503.09516

  16. [17]

    Aviral Kumar, Vincent Zhuang, Rishabh Agarwal, Yi Su, John D Co-Reyes, Avi Singh, Kate Baumli, Shariq Iqbal, Colton Bishop, Rebecca Roelofs, et al. 2024. Training language models to self-correct via reinforcement learning.arXiv preprint arXiv:2409.12917(2024)

  17. [18]

    Somnath Kumar, Vaibhav Balloli, Mercy Ranjit, Kabir Ahuja, Sunayana Sitaram, Kalika Bali, Tanuja Ganu, and Akshay Nambi. 2025. Bridging the Language Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs. InProceedings of the 31st International Conference on Computational Linguistics, Owen Rambow, Leo Wanner, Marianna Apidianaki, ...

  18. [19]

    Oard, and Scott Miller

    Dawn Lawrie, Efsun Kayi, Eugene Yang, James Mayfield, Douglas W. Oard, and Scott Miller. 2025. Generate-Distill: Training Cross-Language IR Models with Synthetically-Generated Data. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for Computing Machinery...

  19. [20]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in neural information processing systems33 (2020), 9459–9474

  20. [21]

    Bo Li, Zhenghua Xu, and Rui Xie. 2025. Language Drift in Multilingual Retrieval-Augmented Generation: Characterization and Decoding-Time Mitiga- tion. arXiv:2511.09984 [cs.CL] https://arxiv.org/abs/2511.09984

  21. [22]

    Xiaoxi Li, Guanting Dong, Jiajie Jin, Yuyao Zhang, Yujia Zhou, Yutao Zhu, Peitian Zhang, and Zhicheng Dou. 2025. Search-o1: Agentic search-enhanced large reasoning models.arXiv preprint arXiv:2501.05366(2025)

  22. [23]

    Wei Liu, Sony Trenous, Leonardo F. R. Ribeiro, Bill Byrne, and Felix Hieber. 2025. XRAG: Cross-lingual Retrieval-Augmented Generation. InFindings of the Associ- ation for Computational Linguistics: EMNLP 2025, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng (Eds.). Association for Computational Linguistics, Suzhou, China, 15...

  23. [24]

    Shayne Longpre, Yi Lu, and Joachim Daiber. 2021. MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering.Trans- actions of the Association for Computational Linguistics9 (2021), 1389–1406. doi:10.1162/tacl_a_00433

  24. [25]

    Alex Mallen, Akari Asai, Victor Zhong, Rajarshi Das, Daniel Khashabi, and Hannaneh Hajishirzi. 2023. When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories. InProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Anna Rogers, Jordan Boyd-Graber, an...

  25. [26]

    Fengran Mo, Yifan Gao, Sha Li, Hansi Zeng, Xin Liu, Zhaoxuan Tan, Xian Li, Jianshu Chen, Dakuo Wang, and Meng Jiang. 2026. Agentic Conversational Search with Contextualized Reasoning via Reinforcement Learning.arXiv preprint arXiv:2601.13115(2026)

  26. [27]

    Fengran Mo, Yifan Gao, Chuan Meng, Xin Liu, Zhuofeng Wu, Kelong Mao, Zhengyang Wang, Pei Chen, Zheng Li, Xian Li, et al. 2025. Uniconv: Unifying retrieval and response generation for large language models in conversations. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 6936–6949

  27. [28]

    Fengran Mo, Kelong Mao, Ziliang Zhao, Hongjin Qian, Haonan Chen, Yiruo Cheng, Xiaoxi Li, Yutao Zhu, Zhicheng Dou, and Jian-Yun Nie. 2025. A survey of conversational search.ACM Transactions on Information Systems43, 6 (2025), 1–50

  28. [29]

    Fengran Mo, Zhan Su, Yuchen Hui, Jinghan Zhang, Jia Ao Sun, Zheyuan Liu, Chao Zhang, Tetsuya Sakai, and Jian-Yun Nie. 2026. Opendecoder: Open large language model decoding to incorporate document quality in rag.arXiv preprint arXiv:2601.09028(2026)

  29. [30]

    Jeonghyun Park and Hwanhee Lee. 2025. Investigating Language Preference of Multilingual RAG Systems. InFindings of the Association for Computational Linguistics: ACL 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna, Austria, 5647–5675. doi:10.18653/v1/2025.findings-acl.295

  30. [31]

    Rui Qi, Zhibo Man, Yufeng Chen, Fengran Mo, Jinan Xu, and Kaiyu Huang. 2025. SoT: Structured-of-Thought Prompting Guides Multilingual Reasoning in Large Language Models. InFindings of the Association for Computational Linguistics: EMNLP 2025. 11024–11039

  31. [32]

    Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

  32. [33]

    Leonardo Ranaldi, Barry Haddow, and Alexandra Birch. 2025. Mul- tilingual Retrieval-Augmented Generation for Knowledge-Intensive Task. arXiv:2504.03616 [cs.CL] https://arxiv.org/abs/2504.03616

  33. [34]

    Leonardo Ranaldi, Federico Ranaldi, Fabio Massimo Zanzotto, Barry Haddow, and Alexandra Birch. 2025. Improving Multilingual Retrieval-Augmented Language Models through Dialectic Reasoning Argumentations. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, ...

  34. [35]

    Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools.Advances in Neural Information Processing Systems36 (2023), 68539–68551

  35. [36]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models. arXiv:2402.03300 [cs.CL] https://arxiv.org/abs/2402.03300

  36. [37]

    Weihang Su, Yichen Tang, Qingyao Ai, Junxi Yan, Changyue Wang, Hongning Wang, Ziyi Ye, Yujia Zhou, and Yiqun Liu. 2025. Parametric retrieval augmented generation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 1240–1250

  37. [38]

    Zhan Su, Fengran Mo, Jinghan Zhang, Yuchen Hui, Jiaao Sun, and Jian-yun Nie

  38. [39]

    Parametric retrieval-augmented generation using latent routing of lora CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia adapters.arXiv preprint arXiv:2511.17044(2025)

  39. [40]

    Yubao Tang, Ruqing Zhang, Jiafeng Guo, Maarten de Rijke, Yixing Fan, and Xueqi Cheng. 2025. Boosting Retrieval-Augmented Generation with Generation- Augmented Retrieval: A Co-Training Approach. InProceedings of the 48th Inter- national ACM SIGIR Conference on Research and Development in Information Retrieval(Padua, Italy)(SIGIR ’25). Association for Compu...

  40. [41]

    NLLB Team, Marta R. Costa-jussà, James Cross, Onur Çelebi, Maha Elbayad, Kenneth Heafield, Kevin Heffernan, Elahe Kalbassi, Janice Lam, Daniel Licht, Jean Maillard, Anna Sun, Skyler Wang, Guillaume Wenzek, Al Youngblood, Bapi Akula, Loic Barrault, Gabriel Mejia Gonzalez, Prangthip Hansanti, John Hoffman, Semarley Jarrett, Kaushik Ram Sadagopan, Dirk Rowe,...

  41. [42]

    Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal

  42. [43]

    InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers)

    Interleaving retrieval with chain-of-thought reasoning for knowledge- intensive multi-step questions. InProceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers). 10014–10037

  43. [44]

    Liang Wang, Nan Yang, Xiaolong Huang, Linjun Yang, Rangan Majumder, and Furu Wei. 2024. Multilingual e5 text embeddings: A technical report.arXiv preprint arXiv:2402.05672(2024)

  44. [45]

    Wenmin Wang, Peilin Zhang, Ge Liu, Ruihua Wu, and Guixiang Song. 2024. Inves- tigating on the External Knowledge in RAG for Zero-Shot Cross-Language Trans- fer. In2024 IEEE 4th International Conference on Electronic Technology, Com- munication and Information (ICETCI). 1479–1484. doi:10.1109/ICETCI61221. 2024.10594504

  45. [46]

    Suhang Wu, Jialong Tang, Baosong Yang, Ante Wang, Kaidi Jia, Jiawei Yu, Junfeng Yao, and Jinsong Su. 2024. Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation. arXiv:2410.21970 [cs.CL] https://arxiv.org/abs/2410.21970

  46. [47]

    Diji Yang, Jinmeng Rao, Kezhen Chen, Xiaoyuan Guo, Yawen Zhang, Jie Yang, and Yi Zhang. 2024. IM-RAG: Multi-Round Retrieval-Augmented Generation Through Learning Inner Monologues. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (Washington DC, USA)(SIGIR ’24). Association for Computing Mach...

  47. [48]

    Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, and Christopher D. Manning. 2018. HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii (...

  48. [49]

    Hansi Zeng, Zoey Li, Yifan Gao, Chenwei Zhang, Xiaoman Pan, Tao Yang, Fengran Mo, Jiacheng Lin, Xian Li, and Jingbo Shang. 2026. SynPlanResearch- R1: Encouraging Tool Exploration for Deep Research with Synthetic Plans.arXiv preprint arXiv:2603.07853(2026)

  49. [50]

    Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Ruimin Dai, Xi- aoyan Han, Yanjie Fu, Dakuo Wang, and Kunpeng Liu. 2026. StaRPO: Stability- Augmented Reinforcement Policy Optimization.arXiv preprint arXiv:2604.08905 (2026)

  50. [51]

    Jinghan Zhang, Xiting Wang, Yiqiao Jin, Changyu Chen, Xinhao Zhang, and Kun- peng Liu. 2024. Prototypical reward network for data-efficient rlhf. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Long Papers). 13871–13884

  51. [52]

    Jinghan Zhang, Xiting Wang, Fengran Mo, Yeyang Zhou, Wanfu Gao, and Kun- peng Liu. 2025. Entropy-based exploration conduction for multi-step reasoning. In Findings of the Association for Computational Linguistics: ACL 2025. 3895–3906

  52. [53]

    Jinghan Zhang, Xiting Wang, Weijieying Ren, Lu Jiang, Dongjie Wang, and Kunpeng Liu. 2025. Ratt: A thought structure for coherent and correct llm reasoning. InProceedings of the AAAI Conference on Artificial Intelligence, V ol. 39. 26733–26741