pith. machine review for the scientific record. sign in

arxiv: 2605.10530 · v1 · submitted 2026-05-11 · 💻 cs.IR

Recognition: no theorem link

Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:50 UTC · model grok-4.3

classification 💻 cs.IR
keywords personalized deep researchLLM agentsuser context modelinginformation retrievalknowledge discoveryhybrid evaluationretrieval-reasoning loop
0
0 comments X

The pith

Integrating dynamic user context into the retrieval-reasoning loop improves relevance of LLM-generated research reports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLM-driven deep research agents can move beyond a static, one-size-fits-all approach by folding dynamic user context directly into query planning, retrieval, and synthesis. Rather than treating personalization as an afterthought, the system builds a user profile from interaction history, adjusts the depth and breadth of exploration to match expertise and interests, and applies dual-stage retrieval plus context-aware stopping rules. This matters for users because generic systems often deliver reports that repeat known material for experts or overwhelm novices with unnecessary detail. The authors support the approach with a new dataset of realistic tasks and a hybrid evaluation that mixes lexical scores with LLM judgments of factual accuracy and personalization fit. Experiments against commercial baselines indicate gains in both retrieval utility and report relevance.

Core claim

PDR unifies user profile modeling with iterative query development, dual-stage (private/public) retrieval, and context-aware synthesis inside the core retrieval-reasoning loop. This integration lets the agent align research sub-goals with user intent and optimize evidence collection, producing higher retrieval utility and report relevance than generic baselines on the released PDR Dataset across four user tasks.

What carries the argument

The PDR framework, which folds dynamic user context into the core retrieval-reasoning loop through unified profile modeling, iterative queries, dual-stage retrieval, and context-aware synthesis.

If this is right

  • Tailored query development reduces redundant retrieval for users with prior expertise.
  • Context-aware stopping criteria prevent over- or under-collection of evidence.
  • Dual-stage retrieval balances private user data with public sources for better alignment.
  • The hybrid evaluation framework enables consistent benchmarking of personalization quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same context integration could be tested on multi-session interactions to track evolving interests over time.
  • Extending the private retrieval stage to additional personal data sources might further tighten report personalization without extra user input.
  • The approach suggests a path toward research agents that automatically calibrate explanation level without explicit user prompts.

Load-bearing premise

Dynamic user context can be reliably extracted and maintained from limited interaction history without systematic misalignment or privacy issues that degrade the retrieval-reasoning loop.

What would settle it

Running PDR on the released dataset with sparse user history and finding no measurable gain in LLM-judged report relevance or retrieval utility over a non-personalized commercial baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.10530 by Huifeng Guo, Pengyue Jia, Wenlin Zhang, Xiangyu Zhao, Xiaopeng Li, Yejing Wang, Yichao Wang, Yingyi Zhang, Yong Liu.

Figure 1
Figure 1. Figure 1: Comparison between Conventional and Personal [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the Personalized Deep Research (PDR) framework. It consists of four core stages: (i) profile extraction [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Dataset Construction Pipeline for PDR. communication styles, our approach resolves this tension through a dynamic structure-control mechanism. This mechanism adapts section ordering, content depth, and tone based on the user’s profile, ensuring outputs align with both verified evidence and individual preferences. The generation process follows a systematic three￾step workflow: (1) aggregation of sub-query … view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the PDR-Eval Framework for Deep [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
read the original abstract

Deep Research agents driven by LLMs have automated the scholarly discovery pipeline, from planning and query formulation to iterative web exploration. Yet they remain constrained by a static, ``one-size-fits-all'' retrieval paradigm. Current systems fail to adaptively adjust the depth and breadth of exploration based on the user's existing expertise or latent interests, frequently resulting in reports that are either redundant for experts or overly dense for novices. To address this, we introduce Personalized Deep Research (PDR), a framework that integrates dynamic user context into the core retrieval-reasoning loop. Rather than treating personalization as a post-hoc formatting step, PDR unifies user profile modeling with iterative query development, dual-stage (private/public) retrieval, and context-aware synthesis. This allows the system to autonomously align research sub-goals with user intent and optimize the stopping criteria for evidence collection. To facilitate benchmarking, we release the PDR Dataset, covering four realistic user tasks, and propose a hybrid evaluation framework combining lexical metrics with LLM-based judgments to assess factual accuracy and personalization alignment. Experimental results against commercial baselines demonstrate that PDR significantly improves retrieval utility and report relevance, effectively bridging the gap between generic information retrieval and personalized knowledge acquisition. The resource is available to the public at https://github.com/Applied-Machine-Learning-Lab/SIGIR2026_PDR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Personalized Deep Research (PDR), a framework that embeds dynamic user context extraction and modeling directly into the LLM-driven deep research pipeline, encompassing iterative query development, dual-stage retrieval, and context-aware report synthesis. It contributes the PDR Dataset for benchmarking four realistic user tasks and a hybrid evaluation protocol that combines lexical metrics with LLM-as-a-judge assessments of factual accuracy and personalization alignment. The central experimental claim is that PDR outperforms commercial baselines in retrieval utility and report relevance.

Significance. Should the core claims be substantiated, this work would meaningfully advance the field of personalized information retrieval by moving beyond post-hoc adaptation to an integrated user-centric retrieval-reasoning loop. The public release of the dataset and evaluation resources is a notable strength that could facilitate standardized benchmarking in LLM-based research agents.

major comments (2)
  1. [Experimental Evaluation] The reported gains over baselines are not supported by an ablation that isolates the dynamic user profile component (e.g., by comparing to a version without profile-guided query development or stopping criteria). This omission makes it difficult to attribute improvements specifically to personalization rather than the underlying iterative planning enhancements.
  2. [Hybrid Evaluation Framework] While the hybrid evaluation includes LLM-based judgments for personalization alignment, the manuscript does not provide evidence (such as correlation coefficients or a human study) that these judgments reliably capture user-specific relevance, which is load-bearing for the claim of bridging generic IR and personalized knowledge acquisition.
minor comments (2)
  1. [Abstract] The phrase 'significantly improves' should be accompanied by quantitative effect sizes or p-values to allow readers to assess the practical importance of the results.
  2. [Framework Description] The description of how user context is extracted from limited interaction history could benefit from a concrete example or pseudocode to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor and evaluation validity that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses
  1. Referee: [Experimental Evaluation] The reported gains over baselines are not supported by an ablation that isolates the dynamic user profile component (e.g., by comparing to a version without profile-guided query development or stopping criteria). This omission makes it difficult to attribute improvements specifically to personalization rather than the underlying iterative planning enhancements.

    Authors: We agree that the current experiments do not include an ablation isolating the dynamic user profile components from the iterative planning mechanisms. The reported comparisons are against commercial baselines that employ static retrieval without integrated user context or profile-guided stopping criteria. To strengthen attribution, we will add an ablation study in the revised manuscript that disables profile-guided query development and stopping criteria while retaining the iterative planning structure, allowing direct measurement of the personalization contribution. revision: yes

  2. Referee: [Hybrid Evaluation Framework] While the hybrid evaluation includes LLM-based judgments for personalization alignment, the manuscript does not provide evidence (such as correlation coefficients or a human study) that these judgments reliably capture user-specific relevance, which is load-bearing for the claim of bridging generic IR and personalized knowledge acquisition.

    Authors: We concur that demonstrating the reliability of the LLM-as-a-judge for personalization alignment is necessary to support the hybrid evaluation claims. The original manuscript introduces the hybrid protocol but does not report correlation with human judgments or inter-rater agreement. In the revision we will conduct a human evaluation on a representative subset of generated reports, compute correlation coefficients and agreement metrics (e.g., Pearson correlation or Cohen’s kappa) between human assessors and the LLM judge, and include these results to validate the personalization alignment scores. revision: yes

Circularity Check

0 steps flagged

No circularity: framework, dataset, and external-baseline evaluation are independent

full rationale

The paper introduces the PDR framework, releases the PDR Dataset for four tasks, and reports improvements over commercial baselines via hybrid lexical+LLM evaluation. No equations, fitted parameters, or self-citations are used to derive the claimed gains; the retrieval-utility and relevance results are measured against external systems on a released dataset. The central claims therefore do not reduce to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the premise that user context can be modeled accurately enough to guide retrieval depth without introducing new failure modes; no free parameters, axioms, or invented physical entities are introduced beyond standard LLM prompting and retrieval components.

pith-pipeline@v0.9.0 · 5563 in / 1068 out tokens · 27963 ms · 2026-05-12T04:50:48.605628+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 5 internal anchors

  1. [1]

    Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

  2. [2]

    Self-rag: Learning to retrieve, generate, and critique through self-reflection. (2024)

  3. [3]

    Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72

  4. [4]

    Alec Berntson, Alina Stoica Beck, Amaia Salvador Aguilera, Farzad Sunavala, Thibault Gisselbrecht, and Xianshun Chen. 2024. Raising the bar for RAG excel- lence: query rewriting and new semantic ranker. Microsoft Azure AI Services Blog. Announcing generative query rewriting and next -gen semantic ranker in Azure AI Search

  5. [5]

    Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv preprint arXiv:2402.03216 (2024)

  6. [6]

    Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, and Zuozhu Liu. 2024. Pad: Personalized alignment at decoding-time.arXiv e-prints(2024), arXiv–2410

  7. [7]

    DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, et al. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.arXiv preprint arXiv:2501.12948(2025). https: //arxiv.org/abs/2501.12948

  8. [8]

    Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao. 2025. DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents. arXiv preprint arXiv:2506.11763(2025)

  9. [9]

    Zichuan Fu, Xiangyang Li, Chuhan Wu, Yichao Wang, Kuicai Dong, Xiangyu Zhao, Mengchen Zhao, Huifeng Guo, and Ruiming Tang. 2025. A unified frame- work for multi-domain ctr prediction via large language models.ACM Transac- tions on Information Systems43, 5 (2025), 1–33

  10. [10]

    Jingtong Gao, Bo Chen, Xiangyu Zhao, Weiwen Liu, Xiangyang Li, Yichao Wang, Wanyu Wang, Huifeng Guo, and Ruiming Tang. 2025. Llm4rerank: Llm-based auto-reranking framework for recommendations. InProceedings of the ACM on Web Conference 2025. 228–239

  11. [11]

    2024.Gemini Deep Research — your personal research assistant

    Google. 2024.Gemini Deep Research — your personal research assistant. https: //gemini.google/overview/deep-research/

  12. [12]

    2025.How we built our multi-agent research system

    Jeremy Hadfield, Barry Zhang, Kenneth Lien, Florian Scholz, Jeremy Fox, and Daniel Ford. 2025.How we built our multi-agent research system. Anthropic PBC. https://www.anthropic.com/engineering/built-multi-agent-research-system

  13. [13]

    Pengyue Jia, Yiding Liu, Xiangyu Zhao, Xiaopeng Li, Changying Hao, Shuaiqiang Wang, and Dawei Yin. 2024. Mill: Mutual verification with large language mod- els for zero-shot query expansion. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Paper...

  14. [14]

    Jiajie Jin, Yutao Zhu, Guanting Dong, Yuyao Zhang, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, Zhicheng Dou, and Ji-Rong Wen. 2025. FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research.arXiv preprint arXiv:2405.13576(2025). https://arxiv.org/abs/2405.13576 Resource track, WWW 2025 (to appear)

  15. [15]

    Ishita Kumar, Snigdha Viswanathan, Sushrita Yerra, Alireza Salemi, Ryan A Rossi, Franck Dernoncourt, Hanieh Deilamsalehy, Xiang Chen, Ruiyi Zhang, Shubham Agarwal, et al. 2024. Longlamp: A benchmark for personalized long-form text generation.arXiv preprint arXiv:2407.11016(2024)

  16. [16]

    Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, Ji Liu, et al. 2025. A survey of generative recommendation from a tri-decoupled perspective: Tokenization, architecture, and optimization. (2025)

  17. [17]

    Xiaopeng Li, Pengyue Jia, Derong Xu, Yi Wen, Yingyi Zhang, Wenlin Zhang, Wanyu Wang, Yichao Wang, Zhaocheng Du, Xiangyang Li, et al. 2025. A survey of personalization: From rag to agent.arXiv preprint arXiv:2504.10147(2025)

  18. [18]

    Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, and Zhicheng Dou. 2025. Webthinker: Empowering large reasoning models with deep research capability.arXiv preprint arXiv:2504.21776(2025)

  19. [19]

    Xiaopeng Li, Lixin Su, Pengyue Jia, Xiangyu Zhao, Suqi Cheng, Junfeng Wang, and Dawei Yin. 2023. Agent4ranking: Semantic robust ranking via personalized query rewriting using multi-agent llm.arXiv preprint arXiv:2312.15450(2023)

  20. [20]

    Xiaopeng Li, Fan Yan, Xiangyu Zhao, Yichao Wang, Bo Chen, Huifeng Guo, and Ruiming Tang. 2023. Hamur: Hyper adapter for multi-domain recommendation. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1268–1277

  21. [21]

    Xiaopeng Li, Yuanjin Zheng, Wanyu Wang, Pengyue Jia, Yiqi Wang, Maolin Wang, Xuetao Wei, Xiangyu Zhao, et al. 2025. MTA: A Merge-then-Adapt Framework for Personalized Large Language Model.arXiv preprint arXiv:2511.20072(2025). SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Xiaopeng Li et al

  22. [22]

    Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, et al. 2025. Towards ai search paradigm. arXiv preprint arXiv:2506.17188(2025)

  23. [23]

    Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. InText Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013/

  24. [24]

    Qidong Liu, Xian Wu, Wanyu Wang, Yejing Wang, Yuanshao Zhu, Xiangyu Zhao, Feng Tian, and Yefeng Zheng. 2025. Llmemb: Large language model can be a good embedding generator for sequential recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 12183–12191

  25. [25]

    Qidong Liu, Xiangyu Zhao, Yuhao Wang, Yejing Wang, Zijian Zhang, Yuqi Sun, Xiang Li, Maolin Wang, Pengyue Jia, Chong Chen, et al. 2025. Large Language Model Enhanced Recommender Systems: Methods, Applications and Trends. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 6096–6106

  26. [26]

    Medium. 2012. Medium. https://medium.com. Accessed: 2025-08-01

  27. [27]

    Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. 2024. GAIA: a benchmark for General AI Assistants. InThe Twelfth International Conference on Learning Representations. https://openreview.net/ forum?id=fibxvahvs3

  28. [28]

    2025.Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities

    Moonshot AI. 2025.Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities. https://moonshotai.github.io/Kimi-Researcher/

  29. [29]

    2025.Title of the Article

    Author’s Name. 2025.Title of the Article. https://example.substack.com/p/article- title Accessed: 2025-08-01

  30. [30]

    2025.Introducing Deep Research

    OpenAI. 2025.Introducing Deep Research. https://openai.com/zh-Hans-CN/ index/introducing-deep-research/

  31. [31]

    2025.Introducing Perplexity Deep Research

    Perplexity Team. 2025.Introducing Perplexity Deep Research. https://www. perplexity.ai/hub/blog/introducing-perplexity-deep-research

  32. [32]

    Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, et al . 2025. Humanity’s last exam.arXiv preprint arXiv:2501.14249(2025)

  33. [33]

    Alireza Salemi, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Weize Kong, Tao Chen, Zhuowan Li, Michael Bendersky, and Hamed Zamani. 2025. Reasoning- Enhanced Self-Training for Long-Form Personalized Text Generation.arXiv preprint arXiv:2501.04167(2025)

  34. [34]

    Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04227(2025)

  35. [35]

    Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. 2025. Agen- tic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136(2025)

  36. [36]

    Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. 2025. AI-Researcher: Autonomous Scientific Innovation.arXiv preprint arXiv:2505.18705(2025)

  37. [37]

    Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Ar- netminer: extraction and mining of academic social networks. InProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 990–998

  38. [38]

    TED Conferences, LLC. 1984. TED. https://ted.com. Accessed: 2025-08-01

  39. [39]

    Michael Völske, Martin Potthast, Shahbaz Syed, and Benno Stein. 2017. Tl; dr: Mining reddit to learn automatic summarization. InProceedings of the Workshop on New Frontiers in Summarization. 59–63

  40. [40]

    Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, et al

  41. [41]

    InProceed- ings of the 2021 International Conference on Management of Data (SIGMOD ’21)

    Milvus: A Purpose-Built Vector Data Management System. InProceed- ings of the 2021 International Conference on Management of Data (SIGMOD ’21). doi:10.1145/3448016.3457550

  42. [42]

    Yuhao Wang, Xiangyu Zhao, Bo Chen, Qidong Liu, Huifeng Guo, Huanshuo Liu, Yichao Wang, Rui Zhang, and Ruiming Tang. 2023. PLATE: A prompt-enhanced paradigm for multi-scenario recommendations. InProceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval. 1498–1507

  43. [43]

    Jason Wei, Nguyen Karina, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, and William Fedus. 2024. Measuring short-form factuality in large language models.arXiv preprint arXiv:2411.04368(2024)

  44. [44]

    Derong Xu, Xinhang Li, Ziheng Zhang, Zhenxi Lin, Zhihong Zhu, Zhi Zheng, Xian Wu, Xiangyu Zhao, Tong Xu, and Enhong Chen. 2025. Harnessing large language models for knowledge graph question answering via adaptive multi- aspect retrieval-augmentation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 25570–25578

  45. [45]

    Tianze Xu, Pengrui Lu, Lyumanshan Ye, Xiangkun Hu, and Pengfei Liu. 2025. ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry.arXiv preprint arXiv:2507.16280(2025)

  46. [46]

    Wenlin Zhang, Kuicai Dong, Junyi Li, Yingyi Zhang, Xiaopeng Li, Pengyue Jia, Yi Wen, Derong Xu, Maolin Wang, Yichao Wang, et al. 2026. To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention. InProceedings of the ACM Web Conference 2026. 2049–2059

  47. [47]

    Wenlin Zhang, Xiangyang Li, Kuicai Dong, Yichao Wang, Pengyue Jia, Xiaopeng Li, Yingyi Zhang, Derong Xu, Zhaocheng Du, Huifeng Guo, et al. 2025. Process vs. outcome reward: Which is better for agentic RAG reinforcement learning. arXiv preprint arXiv:2505.14069(2025)

  48. [48]

    Wenlin Zhang, Xiaopeng Li, Yingyi Zhang, Pengyue Jia, Yichao Wang, Huifeng Guo, Yong Liu, and Xiangyu Zhao. 2025. Deep research: A survey of autonomous research agents.arXiv preprint arXiv:2508.12752(2025)

  49. [49]

    Yingyi Zhang, Pengyue Jia, Derong Xu, Yi Wen, Xianneng Li, Yichao Wang, Wenlin Zhang, Xiaopeng Li, Weinan Gan, Huifeng Guo, et al. 2026. Personalize before retrieve: Llm-based personalized query expansion for user-centric retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 16406– 16414

  50. [50]

    Yingyi Zhang, Junyi Li, Wenlin Zhang, Pengyue Jia, Xianneng Li, Yichao Wang, Derong Xu, Yi Wen, Huifeng Guo, Yong Liu, and Xiangyu Zhao. 2026. Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval. InThe Fourteenth International Conference on Learning Representations. https: //openreview.net/forum?id=f7p0F2X6XN

  51. [51]

    Zijian Zhang, Shuchang Liu, Ziru Liu, Rui Zhong, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Qidong Liu, and Peng Jiang. 2025. Llm-powered user simulator for recommender system. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 13339–13347

  52. [52]

    Deep reinforce- ment learning for search, recommendation, and online advertising: a survey

    Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019. " Deep reinforce- ment learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator.ACM sigweb newsletter2019, Spring (2019), 1–15

  53. [53]

    Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, and Pengfei Liu. 2025. Deepresearcher: Scaling deep research via reinforce- ment learning in real-world environments.arXiv preprint arXiv:2504.03160(2025)