arxiv: 2605.13481 · v1 · submitted 2026-05-13 · 💻 cs.CL

Recognition: unknown

PersonalAI 2.0: Enhancing knowledge graph traversal/retrieval with planning mechanism for Personalized LLM Agents

Mikhail Menschikov , Matvey Iskornev , Alexander Kharitonov , Alina Bogdanova , Mikhail Belkin , Ekaterina Lisitsyna , Artyom Sosedka , Victoria Dochkina

show 3 more authors

Ruslan Kostoev Ilia Perepechkin Evgeny Burnaev

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:11 UTC · model grok-4.3

classification 💻 cs.CL

keywords knowledge graphsLLM agentsgraph retrievalplanning mechanismretrieval augmented generationfactual correctnesshallucination reductionpersonalized AI

0 comments

The pith

PAI-2 improves LLM factual accuracy through adaptive graph traversal and planning on knowledge graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents PersonalAI 2.0, a framework that augments large language models with external knowledge graphs using a multistage pipeline for query processing. This pipeline enables adaptive and iterative searches that extract entities, match graph vertices, and generate clue queries to direct retrieval. Tests across six benchmarks show gains in answer correctness, with graph traversal methods outperforming flat retrievers by 6 percent on average and the planning component adding an 18 percent lift when enabled. The work targets reductions in hallucination for personalized LLM agents that need structured, context-aware reasoning.

Core claim

PAI-2 performs adaptive, iterative information search guided by extracted entities, matched graph vertices, and generated clue-queries within a dynamic multistage pipeline. On Natural Questions, TriviaQA, HotpotQA, 2WikiMultihopQA, MuSiQue, and DiaASQ it outperforms LightRAG, RAPTOR, and HippoRAG 2, delivering a 4 percent average gain by LLM-as-a-Judge. Graph traversal algorithms such as BeamSearch and WaterCircles improve results by 6 percent over standard flatten retrievers, while the search plan enhancement mechanism supplies an 18 percent boost compared with the disabled version across the six datasets. PAI-2 also reaches state-of-the-art 89 percent information-retention on the MINE-1 7-

What carries the argument

The adaptive multistage query processing pipeline that guides iterative graph search through extracted entities, matched vertices, and generated clue-queries.

Load-bearing premise

The reported gains from planning and traversal will generalize beyond the six tested datasets and the specific LLMs used, and LLM-as-a-Judge will measure factual correctness without its own biases.

What would settle it

Evaluating PAI-2 on an additional benchmark or with a different LLM family and finding no gain or a decline in factual correctness scores would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.13481 by Alexander Kharitonov, Alina Bogdanova, Artyom Sosedka, Ekaterina Lisitsyna, Evgeny Burnaev, Ilia Perepechkin, Matvey Iskornev, Mikhail Belkin, Mikhail Menschikov, Ruslan Kostoev, Victoria Dochkina.

**Figure 2.** Figure 2: FIGURE 2 [PITH_FULL_IMAGE:figures/full_fig_p032_2.png] view at source ↗

**Figure 37.** Figure 37: VOLUME 15, 2026 33 [PITH_FULL_IMAGE:figures/full_fig_p033_37.png] view at source ↗

read the original abstract

We introduce PersonalAI 2.0 (PAI-2), a novel framework, designed to enhance large language model (LLM) based systems through integration of external knowledge graphs (KG). The proposed approach addresses key limitations of existing Graph Retrieval-Augmented Generation (GraphRAG) methods by incorporating a dynamic, multistage query processing pipeline. The central point of PAI-2 design is its ability to perform adaptive, iterative information search, guided by extracted entities, matched graph vertices and generated clue-queries. Conducted evaluation over six benchmarks (Natural Questions, TriviaQA, HotpotQA, 2WikiMultihopQA, MuSiQue and DiaASQ) demonstrates improvement in factual correctness of generating answers compared to analogues methods (LightRAG, RAPTOR, and HippoRAG 2). PAI-2 achieves 4% average gain by LLM-as-a-Judge across four benchmarks, reflecting its effectiveness in reducing hallucination rates and increasing precision. We show that use of graph traversal algorithms (e.g. BeamSearch, WaterCircles) gain superior results compared to standard flatten retriever on average 6%, while enabled search plan enhancement mechanism gain 18% boost compared to disabled one by LLM-as-a-Judge across six datasets. In addition, ablation study reveals that PAI-2 achieves the SOTA result on MINE-1 benchmark, achieving 89% information-retention score, using LLMs from 7-14B tiers. Collectively, these findings underscore the potential of PAI-2 to serve as a foundational model for next-generation personalized AI applications, requiring scalable, context-aware knowledge representation and reasoning capabilities.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PAI-2 adds a planning step and entity-guided queries to graph retrieval and reports gains on QA benchmarks, but the numbers all come from an unvalidated LLM judge.

read the letter

The main takeaway is a multistage pipeline that extracts entities, generates clue-queries, and runs an explicit search plan before traversal. This sits on top of existing graph methods and is positioned as an improvement for personalized agents. The ablations show the planning component delivering an 18% lift and graph traversal beating flat retrieval by 6% on their metrics, which is the kind of practical signal engineers can test quickly. They also reach SOTA on the MINE-1 retention benchmark with 7-14B models, which is a concrete data point worth noting for smaller-model deployments. The work cites the right baselines (LightRAG, RAPTOR, HippoRAG 2) and frames the differences clearly, so the engineering choices read as deliberate rather than ad-hoc. The central limitation is the evaluation. Every reported gain, including the 4% average improvement, rests on LLM-as-a-Judge scoring with no human agreement numbers, no judge-prompt details, and no checks for length or style bias. That directly weakens the claims about reduced hallucination and higher precision, since we cannot tell whether the differences reflect factual gains or scorer artifacts. No error bars, data splits, or confound analysis appear in the provided text either. The paper is aimed at people building retrieval pipelines for agents who need adaptive graph search. It is coherent on its own terms and shows clear thinking about the components, so it clears the bar for serious refereeing even though the current evidence is thin on the metric side. I would send it out for review with the expectation that the authors add human validation or stronger controls before acceptance.

Referee Report

2 major / 1 minor

Summary. The paper introduces PersonalAI 2.0 (PAI-2), a framework integrating external knowledge graphs into LLM systems via a dynamic multistage query pipeline that performs adaptive iterative search using extracted entities, matched vertices, and generated clue-queries. It claims empirical improvements over LightRAG, RAPTOR, and HippoRAG 2 on six benchmarks (Natural Questions, TriviaQA, HotpotQA, 2WikiMultihopQA, MuSiQue, DiaASQ), including a 4% average gain by LLM-as-a-Judge across four benchmarks, 6% average superiority from graph traversal algorithms (e.g., BeamSearch, WaterCircles) versus flatten retrievers, an 18% boost from the enabled search-plan enhancement mechanism across six datasets, and SOTA 89% information-retention on the MINE-1 benchmark using 7-14B LLMs.

Significance. If the quantitative claims hold under rigorous validation, PAI-2 would offer a concrete advance in GraphRAG by demonstrating the value of planning and traversal mechanisms for reducing hallucinations and improving precision in personalized agents. The ablation results isolating the 18% contribution of the search-plan component and the SOTA result on MINE-1 with modest-sized models constitute reproducible evidence of component-level gains that could inform next-generation context-aware KG systems.

major comments (2)

[Abstract] Abstract and Evaluation section: The headline performance claims (4% average LLM-as-a-Judge gain on four benchmarks, 6% traversal improvement, 18% search-plan boost) are presented without any description of the experimental protocol, including data splits, judge-model choice, prompt template for the LLM-as-a-Judge, statistical significance tests, error bars, or controls for confounds such as output length or stylistic bias. This absence directly undermines the central assertion that PAI-2 reduces hallucination rates and increases factual precision.
[Evaluation] Evaluation section: No validation of the LLM-as-a-Judge metric against human judgments, inter-annotator agreement scores, or bias analysis is provided, despite the metric being the sole basis for all reported gains and the claim of superior factual correctness over baselines.

minor comments (1)

[Abstract] Abstract: 'analogues methods' should read 'analogous methods'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important gaps in methodological transparency. We will revise the manuscript to incorporate detailed experimental protocols and a validation study for the LLM-as-a-Judge metric, thereby strengthening the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract and Evaluation section: The headline performance claims (4% average LLM-as-a-Judge gain on four benchmarks, 6% traversal improvement, 18% search-plan boost) are presented without any description of the experimental protocol, including data splits, judge-model choice, prompt template for the LLM-as-a-Judge, statistical significance tests, error bars, or controls for confounds such as output length or stylistic bias. This absence directly undermines the central assertion that PAI-2 reduces hallucination rates and increases factual precision.

Authors: We acknowledge that the abstract and Evaluation section lack explicit descriptions of the experimental protocol. In the revised manuscript, we will expand the Evaluation section to detail the data splits used, the specific judge model and its version, the full prompt template for LLM-as-a-Judge, results from statistical significance tests, error bars on all reported metrics, and controls for confounds including output length and stylistic bias. The abstract will be updated to reference these additions. revision: yes
Referee: [Evaluation] Evaluation section: No validation of the LLM-as-a-Judge metric against human judgments, inter-annotator agreement scores, or bias analysis is provided, despite the metric being the sole basis for all reported gains and the claim of superior factual correctness over baselines.

Authors: We agree that direct validation of the LLM-as-a-Judge metric is necessary. We will add a new subsection in the revised Evaluation section reporting a human validation study, including agreement rates between LLM-as-a-Judge scores and human annotations, inter-annotator agreement metrics, and an analysis of potential biases. This will be based on a sampled subset of the benchmark outputs. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical benchmark gains reported directly from external evaluations

full rationale

The paper introduces PAI-2 as an engineering framework for KG-enhanced LLM agents and supports its claims exclusively through direct empirical comparisons on six named external benchmarks (Natural Questions, TriviaQA, HotpotQA, etc.). Reported improvements (4% average by LLM-as-a-Judge, 6% from traversal algorithms, 18% from search-plan enhancement) are presented as measured outcomes against baselines such as LightRAG and HippoRAG 2, with an ablation study on MINE-1. No equations, fitted parameters, self-definitional quantities, or predictions derived from internal inputs appear in the provided text; the derivation chain consists of system description followed by independent benchmark scoring rather than any reduction of results to the method's own definitions or prior self-citations.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no explicit free parameters, axioms, or invented entities; the system appears to compose existing knowledge-graph traversal algorithms and LLMs without introducing new ungrounded constructs.

pith-pipeline@v0.9.0 · 5656 in / 1129 out tokens · 29165 ms · 2026-05-14T19:11:58.043032+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 5 canonical work pages · 3 internal anchors

[1]

Qwen3 technical report, 2025

An Y ang, Anfeng Li, Baosong Y ang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Y u, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, Fan Zhou, Fei Huang, Feng Hu, Hao Ge, Haoran Wei, Huan Lin, Jialong Tang, Jian Y ang, Jianhong Tu, Jianwei Zhang, Jianxin Y ang, Jiaxi Y ang, Jing Zhou, Jingren Zhou, Junyang Lin, Kai Dang, Keqin Bao, Kexin ...

2025
[2]

Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Huajian Xin, Huazuo Gao, Hui Li, Hui Qu, J

DeepSeek-AI, Aixin Liu, Bei Feng, Bing Xue, Bingxuan Wang, Bochao Wu, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai Dai, Daya Guo, Dejian Y ang, Deli Chen, Dongjie Ji, Erhang Li, Fangyun Lin, Fucong Dai, Fuli Luo, Guangbo Hao, Guanting Chen, Guowei Li, H. Zhang, Han Bao, Hanwei Xu, Haocheng Wang, Haowei Zhang, Honghui Ding, Hua...

2025
[3]

Glm-4.5: Agentic, reasoning, and coding (arc) foundation models, 2025

5 Team, Aohan Zeng, Xin Lv, Qinkai Zheng, Zhenyu Hou, Bin Chen, Chengxing Xie, Cunxiang Wang, Da Yin, Hao Zeng, Jiajie Zhang, Kedong Wang, Lucen Zhong, Mingdao Liu, Rui Lu, Shulin Cao, Xiaohan Zhang, Xuancheng Huang, Y ao Wei, Y ean Cheng, Yifan An, Yilin Niu, Y uanhao Wen, Y ushi Bai, Zhengxiao Du, Zihan Wang, Zilin Zhu, Bohan Zhang, Bosi Wen, Bowen Wu, ...

2025
[4]

Wikontic: Constructing Wikidata-aligned, ontology-aware knowledge graphs with large language models

Alla Chepurova, Aydar Bulatov, Mikhail Burtsev, and Y uri Kuratov. Wikontic: Constructing Wikidata-aligned, ontology-aware knowledge graphs with large language models. In V era Demberg, Kentaro Inui, and Lluís Marquez, editors,Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (V ol- ume 1: Long Pap...

2026
[5]

Autoschemakg: Autonomous knowledge graph construction through dynamic schema induction from web-scale corpora, 2025

Jiaxin Bai, Wei Fan, Qi Hu, Qing Zong, Chunyang Li, Hong Ting Tsang, Hongyu Luo, Y auwai Yim, Haoyu Huang, Xiao Zhou, Feng Qin, Tianshi Zheng, Xi Peng, Xin Y ao, Huiwen Y ang, Leijie Wu, Yi Ji, Gong Zhang, Renhai Chen, and Y angqiu Song. Autoschemakg: Autonomous knowledge graph construction through dynamic schema induction from web-scale corpora, 2025

2025
[6]

T-grag: A dynamic graphrag framework for resolving temporal conflicts and redundancy in knowledge retrieval

Dong Li, Yichen Niu, Ying Ai, Xiang Zou, Biqing Qi, and Jianxing Liu. T-grag: A dynamic graphrag framework for resolving temporal conflicts and redundancy in knowledge retrieval. InProceedings of the 33rd ACM International Conference on Multimedia, MM ’25, page 11880–11889, New Y ork, NY , USA, 2025. Association for Computing Machinery

2025
[7]

Retrieval-Augmented Generation for Large Language Models: A Survey

Y unfan Gao, Y un Xiong, Xinyu Gao, Kangxiang Jia, Jin Pan, Y uxi Bi, Yi Dai, Jiawei Sun, Qianyu Guo, Meng Wang, and Haofen Wang. Retrieval- augmented generation for large language models: A survey.ArXiv, abs/2312.10997, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[8]

Graph retrieval-augmented genera- tion: A survey.ACM Trans

Boci Peng, Y un Zhu, Y ongchao Liu, Xiaohe Bo, Haizhou Shi, Chuntao Hong, Y an Zhang, and Siliang Tang. Graph retrieval-augmented genera- tion: A survey.ACM Trans. Inf. Syst., 44(2), December 2025

2025
[9]

GRAG: Graph retrieval-augmented generation

Y untong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. GRAG: Graph retrieval-augmented generation. In Luis Chiruzzo, Alan Ritter, and Lu Wang, editors,Findings of the Association for Compu- tational Linguistics: NAACL 2025, pages 4145–4157, Albuquerque, New Mexico, April 2025. Association for Computational Linguistics

2025
[10]

GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs

Costas Mavromatis and George Karypis. GNN-RAG: Graph neural retrieval for efficient large language model reasoning on knowledge graphs. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Moham- mad Taher Pilehvar, editors,Findings of the Association for Computational Linguistics: ACL 2025, pages 16682–16699, Vienna, Austria, July 2025. Association for...

2025
[11]

Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models

Linhao Luo, Zicheng Zhao, Chen Gong, Gholamreza Haffari, and Shirui Pan. Graph-constrained reasoning: Faithful reasoning on knowledge graphs with large language models. InF orty-second International Con- ference on Machine Learning, 2025

2025
[12]

Per- sonalAI: A Systematic Comparison of Knowledge Graph Storage and Re- trieval Approaches for Personalized LLM Agents.IEEE Access, 14:58262– 58281, 2026

Mikhail Menschikov, Dmitry Evseev, Victoria Dochkina, Ruslan Kostoev, Ilia Perepechkin, Petr Anokhin, Nikita Semenov, and Evgeny Burnaev. Per- sonalAI: A Systematic Comparison of Knowledge Graph Storage and Re- trieval Approaches for Personalized LLM Agents.IEEE Access, 14:58262– 58281, 2026

2026
[13]

Ni, Heung-Y eung Shum, and Jian Guo

Jiashuo Sun, Chengjin Xu, Lumingyuan Tang, Saizhuo Wang, Chen Lin, Y eyun Gong, Lionel M. Ni, Heung-Y eung Shum, and Jian Guo. Think- on-graph: Deep and responsible reasoning of large language model on knowledge graph. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenRe- view.net, 2024

2024
[14]

Reason- ing on graphs: Faithful and interpretable large language model reasoning

Linhao Luo, Y uan-Fang Li, Gholamreza Haffari, and Shirui Pan. Reason- ing on graphs: Faithful and interpretable large language model reasoning. InInternational Conference on Learning Representations, 2024

2024
[15]

Debate on graph: a flexible and reliable reasoning framework for large language models

Jie Ma, Zhitao Gao, Qi Chai, Wangchun Sun, Pinghui Wang, Hongbin Pei, Jing Tao, Lingyun Song, Jun Liu, Chen Zhang, et al. Debate on graph: a flexible and reliable reasoning framework for large language models. InProceedings of the AAAI Conference on Artificial Intelligence, pages 24768–24776, 2025

2025
[16]

An enhanced prompt- based llm reasoning scheme via knowledge graph-integrated collaboration

Yihao Li, Ru Zhang, Jianyi Liu, and Gongshen Liu. An enhanced prompt- based llm reasoning scheme via knowledge graph-integrated collaboration. ArXiv, abs/2402.04978, 2024

work page arXiv 2024
[17]

Enhancing large language models with pseudo- and multisource- knowledge graphs for open-ended question answering.ArXiv, abs/2402.09911, 2025

Jiaxiang Liu, Tong Zhou, Y ubo Chen, Kang Liu, and Jun Zhao. Enhancing large language models with pseudo- and multisource- knowledge graphs for open-ended question answering.ArXiv, abs/2402.09911, 2025

work page arXiv 2025
[18]

Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov

Tom Kwiatkowski, Jennimaria Palomaki, Olivia Redfield, Michael Collins, Ankur Parikh, Chris Alberti, Danielle Epstein, Illia Polosukhin, Jacob VOLUME 15, 2026 35 Devlin, Kenton Lee, Kristina Toutanova, Llion Jones, Matthew Kelcey, Ming-Wei Chang, Andrew M. Dai, Jakob Uszkoreit, Quoc Le, and Slav Petrov. Natural questions: A benchmark for question answerin...

2026
[19]

Trivi- aQA: A large scale distantly supervised challenge dataset for reading com- prehension

Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zettlemoyer. Trivi- aQA: A large scale distantly supervised challenge dataset for reading com- prehension. In Regina Barzilay and Min-Y en Kan, editors,Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (V olume 1: Long Papers), pages 1601–1611, V ancouver, Canada, July
[20]

Association for Computational Linguistics
[21]

Zhilin Y ang, Peng Qi, Saizheng Zhang, Y oshua Bengio, William Co- hen, Ruslan Salakhutdinov, and Christopher D. Manning. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Ellen Riloff, David Chiang, Julia Hockenmaier, and Jun’ichi Tsujii, editors, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Proc...

2018
[22]

Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps

Xanh Ho, Anh-Khoa Duong Nguyen, Saku Sugawara, and Akiko Aizawa. Constructing a multi-hop QA dataset for comprehensive evaluation of reasoning steps. In Donia Scott, Nuria Bel, and Chengqing Zong, edi- tors,Proceedings of the 28th International Conference on Computational Linguistics, pages 6609–6625, Barcelona, Spain (Online), December 2020. Internationa...

2020
[23]

Musique: Multihop questions via single-hop question composition

Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sab- harwal. Musique: Multihop questions via single-hop question composition. Transactions of the Association for Computational Linguistics, 10:539– 554, 05 2022

2022
[24]

Di- aASQ: A benchmark of conversational aspect-based sentiment quadruple analysis

Bobo Li, Hao Fei, Fei Li, Y uhan Wu, Jinsong Zhang, Shengqiong Wu, Jingye Li, Yijiang Liu, Lizi Liao, Tat-Seng Chua, and Donghong Ji. Di- aASQ: A benchmark of conversational aspect-based sentiment quadruple analysis. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 1...

2023
[25]

Bleu: a method for automatic evaluation of machine translation

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002

2002
[26]

Rouge: A package for automatic evaluation of summaries

Chin-Y ew Lin. Rouge: A package for automatic evaluation of summaries. InText summarization branches out, pages 74–81, 2004

2004
[27]

Meteor universal: Language specific translation evaluation for any target language

Michael Denkowski and Alon Lavie. Meteor universal: Language specific translation evaluation for any target language. InProceedings of the ninth workshop on statistical machine translation, pages 376–380, 2014

2014
[28]

BERTScore: Evaluating Text Generation with BERT

Tianyi Zhang, V arsha Kishore, Felix Wu, Kilian Q. Weinberger, and Y oav Artzi. Bertscore: Evaluating text generation with bert.ArXiv, abs/1904.09675, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[29]

Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in Neural Information Processing Systems, 36:46595–46623, 2023

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Y onghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt-bench and chatbot arena.Advances in Neural Information Processing Systems, 36:46595–46623, 2023

2023
[30]

Computing krippendorff’s alpha-reliability

Klaus Krippendorff. Computing krippendorff’s alpha-reliability. 2011

2011
[31]

Ligh- tRAG: Simple and fast retrieval-augmented generation

Zirui Guo, Lianghao Xia, Y anhua Y u, Tu Ao, and Chao Huang. Ligh- tRAG: Simple and fast retrieval-augmented generation. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 10746–10761, Suzhou, China, November 2025. As- sociation for Computa...

2025
[32]

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. Raptor: Recursive abstractive processing for tree-organized retrieval. InInternational Conference on Learning Representations (ICLR), 2024

2024
[33]

From rag to memory: Non-parametric continual learning for large language models, 2025

Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Y u Su. From rag to memory: Non-parametric continual learning for large language models, 2025

2025
[34]

Sorokin, Dmitry Evseev, Mikhail S

Petr Anokhin, Nikita Semenov, Artyom Y . Sorokin, Dmitry Evseev, Mikhail S. Burtsev, and Evgeny Burnaev. Arigraph: Learning knowledge graph world models with episodic memory for llm agents. InInternational Joint Conference on Artificial Intelligence, 2024

2024
[35]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. Lost in the middle: How language models use long contexts.Transactions of the Association for Computational Linguistics, 12:157–173, 2024

2024
[36]

Zep: A Temporal Knowledge Graph Architecture for Agent Memory

Preston Rasmussen, Pavlo Paliychuk, Travis Beauvais, Jack Ryan, and Daniel Chalef. Zep: A temporal knowledge graph architecture for agent memory.ArXiv, abs/2501.13956, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

Extract, define, canonicalize: An LLM- based framework for knowledge graph construction

Bowen Zhang and Harold Soh. Extract, define, canonicalize: An LLM- based framework for knowledge graph construction. In Y aser Al-Onaizan, Mohit Bansal, and Y un-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 9820–9836, Miami, Florida, USA, November 2024. Association for Com- putational Linguistics

2024
[38]

Kggen: Extracting knowledge graphs from plain text with language models, 2025

Belinda Mo, Kyssen Y u, Joshua Kazdan, Joan Cabezas, Proud Mpala, Lisa Y u, Chris Cundy, Charilaos Kanatsoulis, and Sanmi Koyejo. Kggen: Extracting knowledge graphs from plain text with language models, 2025

2025
[39]

From local to global: A graph rag approach to query-focused summarization, 2025

Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From local to global: A graph rag approach to query-focused summarization, 2025. M. MENSCHIKOVreceived the B.Sc. degree in Software Engineering from Petrozavodsk State University in 2023 and the M.Sc....

2025
[40]

His research interests include generative modeling, mani- fold learning, deep learning for 3D data analysis, multi-agent systems, and industrial applications

He is currently the Director of the AI Center, Skolkovo Institute of Science and Technology, and a Full Professor. His research interests include generative modeling, mani- fold learning, deep learning for 3D data analysis, multi-agent systems, and industrial applications. VOLUME 15, 2026 37

2026