arxiv: 2604.12610 · v1 · submitted 2026-04-14 · 💻 cs.CL

Recognition: unknown

Transforming External Knowledge into Triplets for Enhanced Retrieval in RAG of LLMs

Xudong Wang , Chaoning Zhang , Qigan Sun , Zhenzhen Huang , Chang Lu , Sheng Zheng , Zeyu Ma , Caiyan Qin

show 2 more authors

Yang Yang Hengtao Shen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:40 UTC · model grok-4.3

classification 💻 cs.CL

keywords Retrieval-Augmented GenerationLarge Language ModelsKnowledge TripletsStructured RetrievalPrompt-based AdaptationContext Efficiency

0 comments

The pith

Tri-RAG converts external knowledge into Condition-Proof-Conclusion triplets to raise retrieval precision while lowering token costs in LLM generation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current RAG systems pull raw text passages and paste them into the context, which often adds irrelevant material, inflates token counts, and breaks logical flow during reasoning. The paper shows that a lightweight prompt can turn any natural-language knowledge base into fixed triplets of Condition, Proof, and Conclusion without retraining the model. The Condition phrase then serves as the sole retrieval key, so the system matches queries only to the relevant logical unit rather than whole paragraphs. If the transformation preserves the original relations, downstream generation should become both more accurate and cheaper in tokens. Readers would care because this directly attacks the practical limits of context length and hallucination control in real deployments.

Core claim

Tri-RAG automatically transforms external knowledge from natural language into standardized structured triplets consisting of Condition, Proof, and Conclusion, explicitly capturing logical relations among knowledge fragments using lightweight prompt-based adaptation with frozen model parameters. The triplet head Condition is treated as an explicit semantic anchor for retrieval and matching, enabling precise identification of query-relevant knowledge units without directly concatenating lengthy raw texts.

What carries the argument

The Condition-Proof-Conclusion triplet, in which the leading Condition phrase functions as the semantic anchor for query matching.

If this is right

Retrieval precision rises because only the matching Condition is fetched rather than whole passages.
Token consumption drops while semantic alignment between query and evidence improves.
Reasoning chains stay intact because the Proof and Conclusion remain explicitly linked to each Condition.
Generation stability increases in complex tasks as redundant context is removed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same triplet format could be applied to non-RAG tasks such as multi-document summarization or knowledge-base question answering to enforce explicit logical structure.
A direct test would compare end-to-end accuracy on datasets containing conflicting or ambiguous knowledge fragments to see whether the forced triplet format surfaces inconsistencies that raw-text RAG hides.
If the prompt occasionally produces malformed triplets on out-of-domain text, a lightweight verification step that checks for missing Proof or Conclusion fields could be added without changing the frozen model.

Load-bearing premise

That lightweight prompt-based adaptation with frozen model parameters can reliably and consistently transform arbitrary natural language knowledge into standardized Condition-Proof-Conclusion triplets that capture logical relations without introducing errors or losing information.

What would settle it

Measure the logical fidelity of the generated triplets against human-annotated ground-truth relations on a multi-hop reasoning benchmark; if the fraction of triplets that drop, invert, or add incorrect logical links exceeds the improvement margin shown by Tri-RAG over baseline RAG, the performance gains disappear.

Figures

Figures reproduced from arXiv: 2604.12610 by Caiyan Qin, Chang Lu, Chaoning Zhang, Hengtao Shen, Qigan Sun, Sheng Zheng, Xudong Wang, Yang Yang, Zeyu Ma, Zhenzhen Huang.

**Figure 2.** Figure 2: Soft Prompt–Driven Triplet Structuring and Retrieval-Augmented Inference. The upper part shows how soft prompts guide frozen LLMs to extract [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Retrieval-key ablation comparing ALL retrieval using the concatenation [Tc; Tp; Tr] and single-field retrieval using TC, TP, or TR. We report Hit@1 together with index-level retrieval latency and memory footprint. is evaluated once, and the “±” in Table V denotes the standard deviation across evaluated instances. For each evaluation instance, the judge receives the source passage, the extracted triplet s… view at source ↗

read the original abstract

Retrieval-Augmented Generation (RAG) mitigates hallucination in large language models (LLMs) by incorporating external knowledge during generation. However, the effectiveness of RAG depends not only on the design of the retriever and the capacity of the underlying model, but also on how retrieved evidence is structured and aligned with the query. Existing RAG approaches typically retrieve and concatenate unstructured text fragments as context, which often introduces redundant or weakly relevant information. This practice leads to excessive context accumulation, reduced semantic alignment, and fragmented reasoning chains, thereby degrading generation quality while increasing token consumption. To address these challenges, we propose Tri-RAG, a structured triplet-based retrieval framework that improves retrieval efficiency through reasoning-aligned context construction. Tri-RAG automatically transforms external knowledge from natural language into standardized structured triplets consisting of Condition, Proof, and Conclusion, explicitly capturing logical relations among knowledge fragments using lightweight prompt-based adaptation with frozen model parameters. Building on this representation, the triplet head Condition is treated as an explicit semantic anchor for retrieval and matching, enabling precise identification of query-relevant knowledge units without directly concatenating lengthy raw texts. As a result, Tri-RAG achieves a favorable balance between retrieval accuracy and context token efficiency. Experimental results across multiple benchmark datasets demonstrate that Tri-RAG significantly improves retrieval quality and reasoning efficiency, while producing more stable generation behavior and more efficient resource utilization in complex reasoning scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Tri-RAG converts knowledge to Condition-Proof-Conclusion triplets via prompt on frozen models and retrieves on the Condition anchor, but the conversion step lacks any reported validation and the abstract supplies no numbers.

read the letter

The main thing here is a practical tweak to RAG: turn raw external knowledge into fixed triplets with a lightweight prompt, then match queries only to the Condition field instead of dumping whole passages. This is meant to cut token bloat and tighten semantic alignment without touching model weights. The framing of Condition as the retrieval key is the clearest new angle; most prior RAG work either chunks text or adds rerankers, so this explicit logical decomposition stands apart from the papers cited in the abstract. It also keeps the approach deployable since everything after the prompt stays frozen. That part is straightforward engineering and worth noting for anyone already running retrieval pipelines. The rest of the abstract claims better retrieval quality, reasoning efficiency, and stable generation across benchmarks, yet gives zero metrics, baselines, dataset sizes, or ablations. The stress-test point lands: the entire downstream benefit assumes the prompt produces faithful, complete triplets every time. If it drops premises or hallucinates proof steps on anything but simple facts, the efficiency gains become noise. No error rates, no human triplet checks, and no sensitivity tests appear in the provided text, so the central assumption stays untested. The paper is aimed at NLP engineers who already fight context limits in production RAG systems. A reader in that group could pull the triplet schema as a quick experiment, but the current write-up does not yet give enough evidence to treat the results as reliable. It deserves a serious referee pass so the actual experiments and implementation details can be examined; the idea is narrow enough that a few targeted revisions could make it usable.

Referee Report

2 major / 2 minor

Summary. The paper proposes Tri-RAG, a structured retrieval framework for RAG in LLMs. External knowledge is transformed into standardized Condition-Proof-Conclusion triplets via lightweight prompt-based adaptation on a frozen LLM. The Condition component serves as an explicit semantic anchor for retrieval and matching, avoiding direct concatenation of raw text fragments. This is claimed to improve retrieval precision, reduce token consumption, stabilize generation, and enhance reasoning efficiency, with experimental results across multiple benchmarks demonstrating significant gains over standard RAG approaches.

Significance. If the triplet extraction reliably preserves logical structure and the empirical claims hold, Tri-RAG would represent a practical engineering advance in RAG by replacing unstructured context with logically anchored units, potentially lowering context length while improving semantic alignment and reasoning stability. The use of frozen-model prompt adaptation keeps the method lightweight and broadly applicable.

major comments (2)

[Abstract] Abstract: the assertion that 'Experimental results across multiple benchmark datasets demonstrate that Tri-RAG significantly improves retrieval quality and reasoning efficiency' supplies no metrics, baselines, dataset names, ablation results, or error bars. Without these, the central empirical claim cannot be evaluated and the reported balance between accuracy and token efficiency remains unverified.
[Method] Method description (triplet transformation): the load-bearing step of converting arbitrary natural-language knowledge into Condition-Proof-Conclusion triplets via a single prompt on a frozen model is presented without any quantitative fidelity analysis, human-annotated comparison, prompt-sensitivity ablation, or error characterization for complex conditionals and implicit premises. This directly undermines the downstream claims of precise Condition-based retrieval and reduced token use.

minor comments (2)

The manuscript would benefit from an explicit diagram or pseudocode showing the end-to-end flow from knowledge ingestion through triplet extraction, Condition-based retrieval, and generation.
Notation for the triplet fields (Condition, Proof, Conclusion) should be defined once with an example in the main text rather than only in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract] Abstract: the assertion that 'Experimental results across multiple benchmark datasets demonstrate that Tri-RAG significantly improves retrieval quality and reasoning efficiency' supplies no metrics, baselines, dataset names, ablation results, or error bars. Without these, the central empirical claim cannot be evaluated and the reported balance between accuracy and token efficiency remains unverified.

Authors: We agree that the abstract would be more informative with concrete details. The full manuscript reports these elements in Section 4 (Experiments), including specific metrics, baselines such as vanilla RAG, dataset names, and ablation studies. We will revise the abstract to incorporate key quantitative results, such as retrieval accuracy gains and token reductions, along with the main datasets and baselines used. revision: yes
Referee: [Method] Method description (triplet transformation): the load-bearing step of converting arbitrary natural-language knowledge into Condition-Proof-Conclusion triplets via a single prompt on a frozen model is presented without any quantitative fidelity analysis, human-annotated comparison, prompt-sensitivity ablation, or error characterization for complex conditionals and implicit premises. This directly undermines the downstream claims of precise Condition-based retrieval and reduced token use.

Authors: We acknowledge that the manuscript does not currently include a dedicated quantitative analysis of triplet extraction fidelity. We will add a new subsection with human evaluation results, prompt-sensitivity ablations, and error analysis on complex cases to quantify the reliability of the Condition-Proof-Conclusion transformation and better support the retrieval and efficiency claims. revision: yes

Circularity Check

0 steps flagged

No circularity; independent engineering proposal without self-referential derivation

full rationale

The paper introduces Tri-RAG as a practical framework that applies lightweight prompt-based adaptation on a frozen LLM to convert external natural-language knowledge into Condition-Proof-Conclusion triplets for structured retrieval. No equations, fitted parameters, or mathematical derivations appear in the provided text. The central mechanism is presented as a novel engineering choice rather than a result that reduces to prior inputs by construction. No self-citations are load-bearing for uniqueness theorems or ansatzes, and experimental claims rest on benchmark evaluations rather than tautological fits. This is a standard applied contribution whose validity can be assessed externally via replication, satisfying the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approach implicitly assumes prompt engineering can produce reliable logical triplets, but this is not formalized.

pith-pipeline@v0.9.0 · 5573 in / 999 out tokens · 27474 ms · 2026-05-10T15:40:23.990312+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DASH-KV: Accelerating Long-Context LLM Inference via Asymmetric KV Cache Hashing
cs.CL 2026-04 unverdicted novelty 6.0

DASH-KV accelerates long-context LLM inference to linear complexity via asymmetric KV cache hashing and mixed-precision retention, matching full attention performance on LongBench.
CAP-CoT: Cycle Adversarial Prompt for Improving Chain of Thoughts in LLM Reasoning
cs.AI 2026-04 unverdicted novelty 5.0

CAP-CoT uses iterative adversarial prompt cycles to improve CoT accuracy, stability, and robustness across six benchmarks and four LLM backbones.

Reference graph

Works this paper leans on

60 extracted references · 30 canonical work pages · cited by 2 Pith papers · 13 internal anchors

[1]

Sbi-rag: Enhancing math word problem solving for students through schema-based instruction and retrieval-augmented generation,

P. Dixit and T. Oates, “Sbi-rag: Enhancing math word problem solving for students through schema-based instruction and retrieval-augmented generation,”arXiv preprint arXiv:2410.13293, 2024

work page arXiv 2024
[2]

Retrieval-augmented generation to improve math question- answering: Trade-offs between groundedness and human preference,

Z. Levonian, C. Li, W. Zhu, A. Gade, O. Henkel, M.-E. Postle, and W. Xing, “Retrieval-augmented generation to improve math question- answering: Trade-offs between groundedness and human preference,” arXiv preprint arXiv:2310.03184, 2023

work page arXiv 2023
[3]

Retrieval-Augmented Generation for AI-Generated Content: A Survey

P. Zhao, H. Zhang, Q. Yu, Z. Wang, Y . Geng, F. Fu, L. Yang, W. Zhang, J. Jiang, and B. Cui, “Retrieval-augmented generation for ai-generated content: A survey,”arXiv preprint arXiv:2402.19473, 2024

work page internal anchor Pith review arXiv 2024
[4]

Automated feedback for student math responses based on multi-modality and fine-tuning,

H. Li, C. Li, W. Xing, S. Baral, and N. Heffernan, “Automated feedback for student math responses based on multi-modality and fine-tuning,” in Proceedings of the 14th learning analytics and knowledge conference, 2024, pp. 763–770

2024
[5]

Lightweight LLM Agent Memory with Small Language Models

J. Zhang, C. Zhang, S. Chen, Z. Huang, P. Zheng, Z. Wang, P. Guo, F. Mo, S.-H. Bae, J. Zouet al., “Lightweight llm agent memory with small language models,”arXiv preprint arXiv:2604.07798, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[6]

Autoformalization with large language models,

Y . Wu, A. Q. Jiang, W. Li, M. Rabe, C. Staats, M. Jamnik, and C. Szegedy, “Autoformalization with large language models,”Advances in neural information processing systems, vol. 35, pp. 32 353–32 368, 2022

2022
[7]

Premise selection in natural language mathematical texts,

D. Ferreira and A. Freitas, “Premise selection in natural language mathematical texts,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 7365–7374

2020
[8]

preprint arXiv:2305.12524 , year=

W. Chen, M. Yin, M. Ku, P. Lu, Y . Wan, X. Ma, J. Xu, X. Wang, and T. Xia, “Theoremqa: A theorem-driven question answering dataset,” arXiv preprint arXiv:2305.12524, 2023

work page arXiv 2023
[9]

org/abs/2412.16075

K. Yang, G. Poesia, J. He, W. Li, K. Lauter, S. Chaudhuri, and D. Song, “Formal mathematical reasoning: A new frontier in ai,”arXiv preprint arXiv:2412.16075, 2024

work page arXiv 2024
[10]

arXiv preprint arXiv:2602.09794 , year=

J. Zhang, C. Zhang, S. Chen, X. Wang, Z. Huang, P. Zheng, S. Yuan, S. Zheng, Q. Sun, J. Zouet al., “Learning global hypothesis space for en- hancing synergistic reasoning chain,”arXiv preprint arXiv:2602.09794, 2026

work page arXiv 2026
[11]

TDA-RC: Task-Driven Alignment for Knowledge-Based Reasoning Chains in Large Language Models

J. Zhang, Q. Sun, C. Zhang, X. Wang, Z. Huang, Y . Zhou, P. Zheng, C.-l. A. Tai, S.-H. Bae, Z. Maet al., “Tda-rc: Task-driven alignment for knowledge-based reasoning chains in large language models,”arXiv preprint arXiv:2604.04942, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

arXiv preprint arXiv:2602.09821 , year=

J. Zhang, C. Zhang, S. Chen, Y . Liu, C. Li, Q. Sun, S. Yuan, F. D. Puspitasari, D. Han, G. Wanget al., “Text summarization via global structure awareness,”arXiv preprint arXiv:2602.09821, 2026

work page arXiv 2026
[13]

Llava-fa: Learning fourier approximation for compressing large multimodal models

P. Zheng, C. Zhang, J. Mo, G. Li, J. Zhang, J. Zhang, S. Cao, S. Zheng, C. Qin, G. Wanget al., “Llava-fa: Learning fourier ap- proximation for compressing large multimodal models,”arXiv preprint arXiv:2602.00135, 2026

work page arXiv 2026
[14]

arXiv preprint arXiv:2603.13394 , year=

S. Cao, J. Zhang, P. Zheng, J. Yan, C. Qin, Y . Ye, W. Dong, P. Wang, Y . Yang, and C. Zhang, “Language-guided token compression with reinforcement learning in large vision-language models,”arXiv preprint arXiv:2603.13394, 2026

work page arXiv 2026
[15]

arXiv preprint arXiv:2508.01782 , year=

P. Zheng, X. Pu, K. Chen, J. Huang, M. Yang, B. Feng, Y . Ren, J. Jiang, C. Zhang, Y . Yanget al., “Joint lossless compression and steganography for medical images via large language models,”arXiv preprint arXiv:2508.01782, 2025. 12

work page arXiv 2025
[16]

MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms

A. Amini, S. Gabriel, P. Lin, R. Koncel-Kedziorski, Y . Choi, and H. Ha- jishirzi, “Mathqa: Towards interpretable math word problem solving with operation-based formalisms,”arXiv preprint arXiv:1905.13319, 2019

work page Pith review arXiv 1905
[17]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y . Gao, Y . Xiong, X. Gao, K. Jia, J. Pan, Y . Bi, Y . Dai, J. Sun, H. Wang, and H. Wang, “Retrieval-augmented generation for large language models: A survey,”arXiv preprint arXiv:2312.10997, vol. 2, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

The lean theorem prover (system description),

L. De Moura, S. Kong, J. Avigad, F. Van Doorn, and J. von Raumer, “The lean theorem prover (system description),” inInternational Conference on Automated Deduction. Springer, 2015, pp. 378–388

2015
[19]

The role of the mizar mathe- matical library for interactive proof development in mizar,

G. Bancerek, C. Byli ´nski, A. Grabowski, A. Korniłowicz, R. Ma- tuszewski, A. Naumowicz, and K. P ˛ ak, “The role of the mizar mathe- matical library for interactive proof development in mizar,”Journal of Automated Reasoning, vol. 61, no. 1, pp. 9–32, 2018

2018
[20]

The coq proof assistant reference manual: Version 6.1,

B. Barras, S. Boutin, C. Cornes, J. Courant, J.-C. Filliatre, E. Gimenez, H. Herbelin, G. Huet, C. Munoz, C. Murthyet al., “The coq proof assistant reference manual: Version 6.1,” Ph.D. dissertation, Inria, 1997

1997
[21]

Self-Correcting RAG: Enhancing Faithfulness via MMKP Context Selection and NLI-Guided MCTS

S. Xu, Z. Wu, X. Jia, Y . Wang, K. Liu, and A. X. Dong, “Self-correcting rag: Enhancing faithfulness via mmkp context selection and nli-guided mcts,” 2026. [Online]. Available: https://arxiv.org/abs/2604.10734

work page internal anchor Pith review Pith/arXiv arXiv 2026
[22]

The Power of Scale for Parameter-Efficient Prompt Tuning

B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,”arXiv preprint arXiv:2104.08691, 2021

work page internal anchor Pith review arXiv 2021
[23]

How Much Knowledge Can You Pack Into the Parameters of a Language Model?

A. Roberts, C. Raffel, and N. Shazeer, “How much knowledge can you pack into the parameters of a language model?”arXiv preprint arXiv:2002.08910, 2020

work page internal anchor Pith review arXiv 2002
[24]

Retrieval- augmented generation for knowledge-intensive nlp tasks,

P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschelet al., “Retrieval- augmented generation for knowledge-intensive nlp tasks,”Advances in neural information processing systems, vol. 33, pp. 9459–9474, 2020

2020
[25]

Retrieval augmented language model pre-training,

K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang, “Retrieval augmented language model pre-training,” inInternational conference on machine learning. PMLR, 2020, pp. 3929–3938

2020
[26]

Improving language models by retrieving from trillions of tokens,

S. Borgeaud, A. Mensch, J. Hoffmann, T. Cai, E. Rutherford, K. Milli- can, G. B. Van Den Driessche, J.-B. Lespiau, B. Damoc, A. Clarket al., “Improving language models by retrieving from trillions of tokens,” in International conference on machine learning. PMLR, 2022, pp. 2206– 2240

2022
[27]

Dense passage retrieval for open-domain question answering

V . Karpukhin, B. Oguz, S. Min, P. S. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih, “Dense passage retrieval for open-domain question answering.” inEMNLP (1), 2020, pp. 6769–6781

2020
[28]

Billion-scale similarity search with gpus,

J. Johnson, M. Douze, and H. Jégou, “Billion-scale similarity search with gpus,”IEEE Transactions on Big Data, vol. 7, no. 3, pp. 535–547, 2019

2019
[29]

Colbert: Efficient and effective passage search via contextualized late interaction over bert,

O. Khattab and M. Zaharia, “Colbert: Efficient and effective passage search via contextualized late interaction over bert,” inProceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, 2020, pp. 39–48

2020
[30]

2007.01282 , archivePrefix=

G. Izacard and E. Grave, “Leveraging passage retrieval with gener- ative models for open domain question answering,”arXiv preprint arXiv:2007.01282, 2020

work page arXiv 2007
[31]

Unifying large language models and knowledge graphs: A roadmap,

S. Pan, L. Luo, Y . Wang, C. Chen, J. Wang, and X. Wu, “Unifying large language models and knowledge graphs: A roadmap,”IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 7, pp. 3580–3599, 2024

2024
[32]

Fine tuning vs. retrieval augmented generation for less popular knowledge,

H. Soudani, E. Kanoulas, and F. Hasibi, “Fine tuning vs. retrieval augmented generation for less popular knowledge,” inProceedings of the 2024 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, 2024, pp. 12–22

2024
[33]

Active retrieval augmented generation,

Z. Jiang, F. F. Xu, L. Gao, Z. Sun, Q. Liu, J. Dwivedi-Yu, Y . Yang, J. Callan, and G. Neubig, “Active retrieval augmented generation,” in Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 7969–7992

2023
[34]

Rider: Reader-guided passage reranking for open-domain question answering,

Y . Mao, P. He, X. Liu, Y . Shen, J. Gao, J. Han, and W. Chen, “Rider: Reader-guided passage reranking for open-domain question answering,” arXiv preprint arXiv:2101.00294, 2021

work page arXiv 2021
[35]

Knowpo: Knowledge-aware preference optimization for con- trollable knowledge selection in retrieval-augmented language models,

R. Zhang, Y . Xu, Y . Xiao, R. Zhu, X. Jiang, X. Chu, J. Zhao, and Y . Wang, “Knowpo: Knowledge-aware preference optimization for con- trollable knowledge selection in retrieval-augmented language models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 24, 2025, pp. 25 895–25 903

2025
[36]

Replug: Retrieval-augmented black-box language models

W. Shi, S. Min, M. Yasunaga, M. Seo, R. James, M. Lewis, L. Zettle- moyer, and W.-t. Yih, “Replug: Retrieval-augmented black-box language models,”arXiv preprint arXiv:2301.12652, 2023

work page arXiv 2023
[37]

R. S. Sutton, A. G. Bartoet al.,Reinforcement learning: An introduction. MIT press Cambridge, 1998, vol. 1, no. 1

1998
[38]

Augmented language models: a survey, 2023

G. Mialon, R. Dessì, M. Lomeli, C. Nalmpantis, R. Pasunuru, R. Raileanu, B. Rozière, T. Schick, J. Dwivedi-Yu, A. Celikyil- mazet al., “Augmented language models: a survey,”arXiv preprint arXiv:2302.07842, 2023

work page arXiv 2023
[39]

Graphllm: Boosting graph reasoning ability of large language model,

Z. Chai, T. Zhang, L. Wu, K. Han, X. Hu, X. Huang, and Y . Yang, “Graphllm: Boosting graph reasoning ability of large language model,” IEEE Transactions on Big Data, 2025

2025
[40]

Towards visual chain-of-thought reasoning: A comprehensive survey,

P. Zheng, C. Zhang, M. Cui, G. Chen, Q. Sun, J. Huang, J. Zhang, T.-H. Kim, C. Qin, Y . Renet al., “Towards visual chain-of-thought reasoning: A comprehensive survey,” 2026

2026
[41]

Multi-view few-shot reasoning for emerging entities in knowledge graphs,

C. Yan, F. Zhao, X. Tao, and X. Zhu, “Multi-view few-shot reasoning for emerging entities in knowledge graphs,”IEEE Transactions on Big Data, 2024

2024
[42]

Edugraph: Learning path- based hypergraph neural networks for mooc course recommendation,

M. Li, Z. Li, C. Huang, Y . Jiang, and X. Wu, “Edugraph: Learning path- based hypergraph neural networks for mooc course recommendation,” IEEE Transactions on Big Data, 2024

2024
[43]

Learning causal chain graph structure via alternate learning and double pruning,

S. Yang, F. Cao, K. Yu, and J. Liang, “Learning causal chain graph structure via alternate learning and double pruning,”IEEE Transactions on Big Data, vol. 10, no. 4, pp. 442–456, 2023

2023
[44]

Prefix-Tuning: Optimizing Continuous Prompts for Generation

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,”arXiv preprint arXiv:2101.00190, 2021

work page internal anchor Pith review arXiv 2021
[45]

Gpt understands, too,

X. Liu, Y . Zheng, Z. Du, M. Ding, Y . Qian, Z. Yang, and J. Tang, “Gpt understands, too,”AI Open, vol. 5, pp. 208–215, 2024

2024
[46]

Parameter-efficient fine-tuning in large language models: a survey of methodologies,

L. Wang, S. Chen, L. Jiang, S. Pan, R. Cai, S. Yang, and F. Yang, “Parameter-efficient fine-tuning in large language models: a survey of methodologies,”Artificial Intelligence Review, vol. 58, no. 8, p. 227, 2025

2025
[47]

arXiv preprint arXiv:2110.07602 , year=

X. Liu, K. Ji, Y . Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,”arXiv preprint arXiv:2110.07602, 2021

work page arXiv 2021
[48]

arXiv preprint arXiv:2601.17089 , year=

Q. Sun, C. Zhang, J. Zhang, X. Wang, J. Xie, P. Zheng, H. Wang, S. Lee, C.-l. A. Tai, Y . Yanget al., “Grasp: Guided region-aware sparse prompting for adapting mllms to remote sensing,”arXiv preprint arXiv:2601.17089, 2026

work page arXiv 2026
[49]

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

Y . Bai, X. Lv, J. Zhang, H. Lyu, J. Tang, Z. Huang, Z. Du, X. Liu, A. Zeng, L. Houet al., “Longbench: A bilingual, multitask benchmark for long context understanding,”arXiv preprint arXiv:2308.14508, 2023

work page internal anchor Pith review arXiv 2023
[50]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Z. Yang, P. Qi, S. Zhang, Y . Bengio, W. W. Cohen, R. Salakhutdinov, and C. D. Manning, “Hotpotqa: A dataset for diverse, explainable multi-hop question answering,”arXiv preprint arXiv:1809.09600, 2018

work page internal anchor Pith review arXiv 2018
[51]

Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps.arXiv preprint arXiv:2011.01060, 2020

X. Ho, A.-K. D. Nguyen, S. Sugawara, and A. Aizawa, “Constructing a multi-hop qa dataset for comprehensive evaluation of reasoning steps,” arXiv preprint arXiv:2011.01060, 2020

work page arXiv 2011
[52]

Musique: Multihop questions via single-hop question composition,

H. Trivedi, N. Balasubramanian, T. Khot, and A. Sabharwal, “Musique: Multihop questions via single-hop question composition,”Transactions of the Association for Computational Linguistics, vol. 10, pp. 539–554, 2022

2022
[53]

Natural questions: a benchmark for question answering research,

T. Kwiatkowski, J. Palomaki, O. Redfield, M. Collins, A. Parikh, C. Alberti, D. Epstein, I. Polosukhin, J. Devlin, K. Leeet al., “Natural questions: a benchmark for question answering research,”Transactions of the Association for Computational Linguistics, vol. 7, pp. 453–466, 2019

2019
[54]

SQuAD: 100,000+ Questions for Machine Comprehension of Text

P. Rajpurkar, J. Zhang, K. Lopyrev, and P. Liang, “Squad: 100,000+ questions for machine comprehension of text,”arXiv preprint arXiv:1606.05250, 2016

work page internal anchor Pith review arXiv 2016
[55]

Knowledge graph-guided retrieval augmented generation,

X. Zhu, Y . Xie, Y . Liu, Y . Li, and W. Hu, “Knowledge graph-guided retrieval augmented generation,” inProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Compu- tational Linguistics: Human Language Technologies (Volume 1: Long Papers), 2025, pp. 8912–8924

2025
[56]

React: Synergizing reasoning and acting in language models,

S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. Narasimhan, and Y . Cao, “React: Synergizing reasoning and acting in language models,” in International Conference on Learning Representations (ICLR), 2023

2023
[57]

Com- plement lexical retrieval model with semantic residual embeddings,

L. Gao, Z. Dai, T. Chen, Z. Fan, B. Van Durme, and J. Callan, “Com- plement lexical retrieval model with semantic residual embeddings,” in European Conference on Information Retrieval. Springer, 2021, pp. 146–160

2021
[58]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Z. Guo, L. Xia, Y . Yu, T. Ao, and C. Huang, “Lightrag: Simple and fast retrieval-augmented generation,”arXiv preprint arXiv:2410.05779, 2024

work page internal anchor Pith review arXiv 2024
[59]

From Local to Global: A Graph RAG Approach to Query-Focused Summarization

D. Edge, H. Trinh, N. Cheng, J. Bradley, A. Chao, A. Mody, S. Truitt, D. Metropolitansky, R. O. Ness, and J. Larson, “From local to global: A graph rag approach to query-focused summarization,”arXiv preprint arXiv:2404.16130, 2024

work page internal anchor Pith review arXiv 2024
[60]

Lora: Low-rank adaptation of large language models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, W. Chenet al., “Lora: Low-rank adaptation of large language models.” ICLR, vol. 1, no. 2, p. 3, 2022

2022