arxiv: 2605.10530 · v1 · submitted 2026-05-11 · 💻 cs.IR

Recognition: no theorem link

Personalized Deep Research: A User-Centric Framework, Dataset, and Hybrid Evaluation for Knowledge Discovery

Xiaopeng Li , Wenlin Zhang , Yingyi Zhang , Pengyue Jia , Yejing Wang , Yichao Wang , Yong Liu , Huifeng Guo

show 1 more author

Xiangyu Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:50 UTC · model grok-4.3

classification 💻 cs.IR

keywords personalized deep researchLLM agentsuser context modelinginformation retrievalknowledge discoveryhybrid evaluationretrieval-reasoning loop

0 comments

The pith

Integrating dynamic user context into the retrieval-reasoning loop improves relevance of LLM-generated research reports.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLM-driven deep research agents can move beyond a static, one-size-fits-all approach by folding dynamic user context directly into query planning, retrieval, and synthesis. Rather than treating personalization as an afterthought, the system builds a user profile from interaction history, adjusts the depth and breadth of exploration to match expertise and interests, and applies dual-stage retrieval plus context-aware stopping rules. This matters for users because generic systems often deliver reports that repeat known material for experts or overwhelm novices with unnecessary detail. The authors support the approach with a new dataset of realistic tasks and a hybrid evaluation that mixes lexical scores with LLM judgments of factual accuracy and personalization fit. Experiments against commercial baselines indicate gains in both retrieval utility and report relevance.

Core claim

PDR unifies user profile modeling with iterative query development, dual-stage (private/public) retrieval, and context-aware synthesis inside the core retrieval-reasoning loop. This integration lets the agent align research sub-goals with user intent and optimize evidence collection, producing higher retrieval utility and report relevance than generic baselines on the released PDR Dataset across four user tasks.

What carries the argument

The PDR framework, which folds dynamic user context into the core retrieval-reasoning loop through unified profile modeling, iterative queries, dual-stage retrieval, and context-aware synthesis.

If this is right

Tailored query development reduces redundant retrieval for users with prior expertise.
Context-aware stopping criteria prevent over- or under-collection of evidence.
Dual-stage retrieval balances private user data with public sources for better alignment.
The hybrid evaluation framework enables consistent benchmarking of personalization quality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same context integration could be tested on multi-session interactions to track evolving interests over time.
Extending the private retrieval stage to additional personal data sources might further tighten report personalization without extra user input.
The approach suggests a path toward research agents that automatically calibrate explanation level without explicit user prompts.

Load-bearing premise

Dynamic user context can be reliably extracted and maintained from limited interaction history without systematic misalignment or privacy issues that degrade the retrieval-reasoning loop.

What would settle it

Running PDR on the released dataset with sparse user history and finding no measurable gain in LLM-judged report relevance or retrieval utility over a non-personalized commercial baseline would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.10530 by Huifeng Guo, Pengyue Jia, Wenlin Zhang, Xiangyu Zhao, Xiaopeng Li, Yejing Wang, Yichao Wang, Yingyi Zhang, Yong Liu.

**Figure 2.** Figure 2: Overview of the Personalized Deep Research (PDR) framework. It consists of four core stages: (i) profile extraction [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Dataset Construction Pipeline for PDR. communication styles, our approach resolves this tension through a dynamic structure-control mechanism. This mechanism adapts section ordering, content depth, and tone based on the user’s profile, ensuring outputs align with both verified evidence and individual preferences. The generation process follows a systematic threestep workflow: (1) aggregation of sub-query … view at source ↗

**Figure 4.** Figure 4: Overview of the PDR-Eval Framework for Deep [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

Deep Research agents driven by LLMs have automated the scholarly discovery pipeline, from planning and query formulation to iterative web exploration. Yet they remain constrained by a static, ``one-size-fits-all'' retrieval paradigm. Current systems fail to adaptively adjust the depth and breadth of exploration based on the user's existing expertise or latent interests, frequently resulting in reports that are either redundant for experts or overly dense for novices. To address this, we introduce Personalized Deep Research (PDR), a framework that integrates dynamic user context into the core retrieval-reasoning loop. Rather than treating personalization as a post-hoc formatting step, PDR unifies user profile modeling with iterative query development, dual-stage (private/public) retrieval, and context-aware synthesis. This allows the system to autonomously align research sub-goals with user intent and optimize the stopping criteria for evidence collection. To facilitate benchmarking, we release the PDR Dataset, covering four realistic user tasks, and propose a hybrid evaluation framework combining lexical metrics with LLM-based judgments to assess factual accuracy and personalization alignment. Experimental results against commercial baselines demonstrate that PDR significantly improves retrieval utility and report relevance, effectively bridging the gap between generic information retrieval and personalized knowledge acquisition. The resource is available to the public at https://github.com/Applied-Machine-Learning-Lab/SIGIR2026_PDR.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Personalized Deep Research (PDR), a framework that embeds dynamic user context extraction and modeling directly into the LLM-driven deep research pipeline, encompassing iterative query development, dual-stage retrieval, and context-aware report synthesis. It contributes the PDR Dataset for benchmarking four realistic user tasks and a hybrid evaluation protocol that combines lexical metrics with LLM-as-a-judge assessments of factual accuracy and personalization alignment. The central experimental claim is that PDR outperforms commercial baselines in retrieval utility and report relevance.

Significance. Should the core claims be substantiated, this work would meaningfully advance the field of personalized information retrieval by moving beyond post-hoc adaptation to an integrated user-centric retrieval-reasoning loop. The public release of the dataset and evaluation resources is a notable strength that could facilitate standardized benchmarking in LLM-based research agents.

major comments (2)

[Experimental Evaluation] The reported gains over baselines are not supported by an ablation that isolates the dynamic user profile component (e.g., by comparing to a version without profile-guided query development or stopping criteria). This omission makes it difficult to attribute improvements specifically to personalization rather than the underlying iterative planning enhancements.
[Hybrid Evaluation Framework] While the hybrid evaluation includes LLM-based judgments for personalization alignment, the manuscript does not provide evidence (such as correlation coefficients or a human study) that these judgments reliably capture user-specific relevance, which is load-bearing for the claim of bridging generic IR and personalized knowledge acquisition.

minor comments (2)

[Abstract] The phrase 'significantly improves' should be accompanied by quantitative effect sizes or p-values to allow readers to assess the practical importance of the results.
[Framework Description] The description of how user context is extracted from limited interaction history could benefit from a concrete example or pseudocode to improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive feedback on our manuscript. The comments highlight important aspects of experimental rigor and evaluation validity that we will address in the revision. Below we respond point-by-point to the major comments.

read point-by-point responses

Referee: [Experimental Evaluation] The reported gains over baselines are not supported by an ablation that isolates the dynamic user profile component (e.g., by comparing to a version without profile-guided query development or stopping criteria). This omission makes it difficult to attribute improvements specifically to personalization rather than the underlying iterative planning enhancements.

Authors: We agree that the current experiments do not include an ablation isolating the dynamic user profile components from the iterative planning mechanisms. The reported comparisons are against commercial baselines that employ static retrieval without integrated user context or profile-guided stopping criteria. To strengthen attribution, we will add an ablation study in the revised manuscript that disables profile-guided query development and stopping criteria while retaining the iterative planning structure, allowing direct measurement of the personalization contribution. revision: yes
Referee: [Hybrid Evaluation Framework] While the hybrid evaluation includes LLM-based judgments for personalization alignment, the manuscript does not provide evidence (such as correlation coefficients or a human study) that these judgments reliably capture user-specific relevance, which is load-bearing for the claim of bridging generic IR and personalized knowledge acquisition.

Authors: We concur that demonstrating the reliability of the LLM-as-a-judge for personalization alignment is necessary to support the hybrid evaluation claims. The original manuscript introduces the hybrid protocol but does not report correlation with human judgments or inter-rater agreement. In the revision we will conduct a human evaluation on a representative subset of generated reports, compute correlation coefficients and agreement metrics (e.g., Pearson correlation or Cohen’s kappa) between human assessors and the LLM judge, and include these results to validate the personalization alignment scores. revision: yes

Circularity Check

0 steps flagged

No circularity: framework, dataset, and external-baseline evaluation are independent

full rationale

The paper introduces the PDR framework, releases the PDR Dataset for four tasks, and reports improvements over commercial baselines via hybrid lexical+LLM evaluation. No equations, fitted parameters, or self-citations are used to derive the claimed gains; the retrieval-utility and relevance results are measured against external systems on a released dataset. The central claims therefore do not reduce to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the premise that user context can be modeled accurately enough to guide retrieval depth without introducing new failure modes; no free parameters, axioms, or invented physical entities are introduced beyond standard LLM prompting and retrieval components.

pith-pipeline@v0.9.0 · 5563 in / 1068 out tokens · 27963 ms · 2026-05-12T04:50:48.605628+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

53 extracted references · 53 canonical work pages · 5 internal anchors

[1]

Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi

work page
[2]

Self-rag: Learning to retrieve, generate, and critique through self-reflection. (2024)

work page 2024
[3]

Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. InProceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. 65–72

work page 2005
[4]

Alec Berntson, Alina Stoica Beck, Amaia Salvador Aguilera, Farzad Sunavala, Thibault Gisselbrecht, and Xianshun Chen. 2024. Raising the bar for RAG excel- lence: query rewriting and new semantic ranker. Microsoft Azure AI Services Blog. Announcing generative query rewriting and next -gen semantic ranker in Azure AI Search

work page 2024
[5]

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. 2024. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation.arXiv preprint arXiv:2402.03216 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[6]

Ruizhe Chen, Xiaotian Zhang, Meng Luo, Wenhao Chai, and Zuozhu Liu. 2024. Pad: Personalized alignment at decoding-time.arXiv e-prints(2024), arXiv–2410

work page 2024
[7]

DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, et al. 2025. DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning.arXiv preprint arXiv:2501.12948(2025). https: //arxiv.org/abs/2501.12948

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Mingxuan Du, Benfeng Xu, Chiwei Zhu, Xiaorui Wang, and Zhendong Mao. 2025. DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents. arXiv preprint arXiv:2506.11763(2025)

work page arXiv 2025
[9]

Zichuan Fu, Xiangyang Li, Chuhan Wu, Yichao Wang, Kuicai Dong, Xiangyu Zhao, Mengchen Zhao, Huifeng Guo, and Ruiming Tang. 2025. A unified frame- work for multi-domain ctr prediction via large language models.ACM Transac- tions on Information Systems43, 5 (2025), 1–33

work page 2025
[10]

Jingtong Gao, Bo Chen, Xiangyu Zhao, Weiwen Liu, Xiangyang Li, Yichao Wang, Wanyu Wang, Huifeng Guo, and Ruiming Tang. 2025. Llm4rerank: Llm-based auto-reranking framework for recommendations. InProceedings of the ACM on Web Conference 2025. 228–239

work page 2025
[11]

2024.Gemini Deep Research — your personal research assistant

Google. 2024.Gemini Deep Research — your personal research assistant. https: //gemini.google/overview/deep-research/

work page 2024
[12]

2025.How we built our multi-agent research system

Jeremy Hadfield, Barry Zhang, Kenneth Lien, Florian Scholz, Jeremy Fox, and Daniel Ford. 2025.How we built our multi-agent research system. Anthropic PBC. https://www.anthropic.com/engineering/built-multi-agent-research-system

work page 2025
[13]

Pengyue Jia, Yiding Liu, Xiangyu Zhao, Xiaopeng Li, Changying Hao, Shuaiqiang Wang, and Dawei Yin. 2024. Mill: Mutual verification with large language mod- els for zero-shot query expansion. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Paper...

work page 2024
[14]

Jiajie Jin, Yutao Zhu, Guanting Dong, Yuyao Zhang, Xinyu Yang, Chenghao Zhang, Tong Zhao, Zhao Yang, Zhicheng Dou, and Ji-Rong Wen. 2025. FlashRAG: A Modular Toolkit for Efficient Retrieval-Augmented Generation Research.arXiv preprint arXiv:2405.13576(2025). https://arxiv.org/abs/2405.13576 Resource track, WWW 2025 (to appear)

work page arXiv 2025
[15]

Ishita Kumar, Snigdha Viswanathan, Sushrita Yerra, Alireza Salemi, Ryan A Rossi, Franck Dernoncourt, Hanieh Deilamsalehy, Xiang Chen, Ruiyi Zhang, Shubham Agarwal, et al. 2024. Longlamp: A benchmark for personalized long-form text generation.arXiv preprint arXiv:2407.11016(2024)

work page arXiv 2024
[16]

Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, Ji Liu, et al. 2025. A survey of generative recommendation from a tri-decoupled perspective: Tokenization, architecture, and optimization. (2025)

work page 2025
[17]

Xiaopeng Li, Pengyue Jia, Derong Xu, Yi Wen, Yingyi Zhang, Wenlin Zhang, Wanyu Wang, Yichao Wang, Zhaocheng Du, Xiangyang Li, et al. 2025. A survey of personalization: From rag to agent.arXiv preprint arXiv:2504.10147(2025)

work page arXiv 2025
[18]

Xiaoxi Li, Jiajie Jin, Guanting Dong, Hongjin Qian, Yutao Zhu, Yongkang Wu, Ji-Rong Wen, and Zhicheng Dou. 2025. Webthinker: Empowering large reasoning models with deep research capability.arXiv preprint arXiv:2504.21776(2025)

work page arXiv 2025
[19]

Xiaopeng Li, Lixin Su, Pengyue Jia, Xiangyu Zhao, Suqi Cheng, Junfeng Wang, and Dawei Yin. 2023. Agent4ranking: Semantic robust ranking via personalized query rewriting using multi-agent llm.arXiv preprint arXiv:2312.15450(2023)

work page arXiv 2023
[20]

Xiaopeng Li, Fan Yan, Xiangyu Zhao, Yichao Wang, Bo Chen, Huifeng Guo, and Ruiming Tang. 2023. Hamur: Hyper adapter for multi-domain recommendation. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management. 1268–1277

work page 2023
[21]

Xiaopeng Li, Yuanjin Zheng, Wanyu Wang, Pengyue Jia, Yiqi Wang, Maolin Wang, Xuetao Wei, Xiangyu Zhao, et al. 2025. MTA: A Merge-then-Adapt Framework for Personalized Large Language Model.arXiv preprint arXiv:2511.20072(2025). SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Xiaopeng Li et al

work page arXiv 2025
[22]

Yuchen Li, Hengyi Cai, Rui Kong, Xinran Chen, Jiamin Chen, Jun Yang, Haojie Zhang, Jiayi Li, Jiayi Wu, Yiqun Chen, et al. 2025. Towards ai search paradigm. arXiv preprint arXiv:2506.17188(2025)

work page arXiv 2025
[23]

Chin-Yew Lin. 2004. ROUGE: A Package for Automatic Evaluation of Summaries. InText Summarization Branches Out. Association for Computational Linguistics, Barcelona, Spain, 74–81. https://aclanthology.org/W04-1013/

work page 2004
[24]

Qidong Liu, Xian Wu, Wanyu Wang, Yejing Wang, Yuanshao Zhu, Xiangyu Zhao, Feng Tian, and Yefeng Zheng. 2025. Llmemb: Large language model can be a good embedding generator for sequential recommendation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 12183–12191

work page 2025
[25]

Qidong Liu, Xiangyu Zhao, Yuhao Wang, Yejing Wang, Zijian Zhang, Yuqi Sun, Xiang Li, Maolin Wang, Pengyue Jia, Chong Chen, et al. 2025. Large Language Model Enhanced Recommender Systems: Methods, Applications and Trends. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V. 2. 6096–6106

work page 2025
[26]

Medium. 2012. Medium. https://medium.com. Accessed: 2025-08-01

work page 2012
[27]

Grégoire Mialon, Clémentine Fourrier, Thomas Wolf, Yann LeCun, and Thomas Scialom. 2024. GAIA: a benchmark for General AI Assistants. InThe Twelfth International Conference on Learning Representations. https://openreview.net/ forum?id=fibxvahvs3

work page 2024
[28]

2025.Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities

Moonshot AI. 2025.Kimi-Researcher: End-to-End RL Training for Emerging Agentic Capabilities. https://moonshotai.github.io/Kimi-Researcher/

work page 2025
[29]

2025.Title of the Article

Author’s Name. 2025.Title of the Article. https://example.substack.com/p/article- title Accessed: 2025-08-01

work page 2025
[30]

2025.Introducing Deep Research

OpenAI. 2025.Introducing Deep Research. https://openai.com/zh-Hans-CN/ index/introducing-deep-research/

work page 2025
[31]

2025.Introducing Perplexity Deep Research

Perplexity Team. 2025.Introducing Perplexity Deep Research. https://www. perplexity.ai/hub/blog/introducing-perplexity-deep-research

work page 2025
[32]

Long Phan, Alice Gatti, Ziwen Han, Nathaniel Li, Josephina Hu, Hugh Zhang, Chen Bo Calvin Zhang, Mohamed Shaaban, John Ling, Sean Shi, et al . 2025. Humanity’s last exam.arXiv preprint arXiv:2501.14249(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[33]

Alireza Salemi, Cheng Li, Mingyang Zhang, Qiaozhu Mei, Weize Kong, Tao Chen, Zhuowan Li, Michael Bendersky, and Hamed Zamani. 2025. Reasoning- Enhanced Self-Training for Long-Form Personalized Text Generation.arXiv preprint arXiv:2501.04167(2025)

work page arXiv 2025
[34]

Samuel Schmidgall, Yusheng Su, Ze Wang, Ximeng Sun, Jialian Wu, Xiaodong Yu, Jiang Liu, Zicheng Liu, and Emad Barsoum. 2025. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04227(2025)

work page arXiv 2025
[35]

Aditi Singh, Abul Ehtesham, Saket Kumar, and Tala Talaei Khoei. 2025. Agen- tic retrieval-augmented generation: A survey on agentic rag.arXiv preprint arXiv:2501.09136(2025)

work page internal anchor Pith review arXiv 2025
[36]

Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. 2025. AI-Researcher: Autonomous Scientific Innovation.arXiv preprint arXiv:2505.18705(2025)

work page arXiv 2025
[37]

Jie Tang, Jing Zhang, Limin Yao, Juanzi Li, Li Zhang, and Zhong Su. 2008. Ar- netminer: extraction and mining of academic social networks. InProceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. 990–998

work page 2008
[38]

TED Conferences, LLC. 1984. TED. https://ted.com. Accessed: 2025-08-01

work page 1984
[39]

Michael Völske, Martin Potthast, Shahbaz Syed, and Benno Stein. 2017. Tl; dr: Mining reddit to learn automatic summarization. InProceedings of the Workshop on New Frontiers in Summarization. 59–63

work page 2017
[40]

Jianguo Wang, Xiaomeng Yi, Rentong Guo, Hai Jin, Peng Xu, Shengjun Li, Xiangyu Wang, Xiangzhou Guo, Chengming Li, Xiaohai Xu, Kun Yu, et al

work page
[41]

InProceed- ings of the 2021 International Conference on Management of Data (SIGMOD ’21)

Milvus: A Purpose-Built Vector Data Management System. InProceed- ings of the 2021 International Conference on Management of Data (SIGMOD ’21). doi:10.1145/3448016.3457550

work page doi:10.1145/3448016.3457550 2021
[42]

Yuhao Wang, Xiangyu Zhao, Bo Chen, Qidong Liu, Huifeng Guo, Huanshuo Liu, Yichao Wang, Rui Zhang, and Ruiming Tang. 2023. PLATE: A prompt-enhanced paradigm for multi-scenario recommendations. InProceedings of the 46th In- ternational ACM SIGIR Conference on Research and Development in Information Retrieval. 1498–1507

work page 2023
[43]

Jason Wei, Nguyen Karina, Hyung Won Chung, Yunxin Joy Jiao, Spencer Papay, Amelia Glaese, John Schulman, and William Fedus. 2024. Measuring short-form factuality in large language models.arXiv preprint arXiv:2411.04368(2024)

work page internal anchor Pith review arXiv 2024
[44]

Derong Xu, Xinhang Li, Ziheng Zhang, Zhenxi Lin, Zhihong Zhu, Zhi Zheng, Xian Wu, Xiangyu Zhao, Tong Xu, and Enhong Chen. 2025. Harnessing large language models for knowledge graph question answering via adaptive multi- aspect retrieval-augmentation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 25570–25578

work page 2025
[45]

Tianze Xu, Pengrui Lu, Lyumanshan Ye, Xiangkun Hu, and Pengfei Liu. 2025. ResearcherBench: Evaluating Deep AI Research Systems on the Frontiers of Scientific Inquiry.arXiv preprint arXiv:2507.16280(2025)

work page arXiv 2025
[46]

Wenlin Zhang, Kuicai Dong, Junyi Li, Yingyi Zhang, Xiaopeng Li, Pengyue Jia, Yi Wen, Derong Xu, Maolin Wang, Yichao Wang, et al. 2026. To Search or Not to Search: Aligning the Decision Boundary of Deep Search Agents via Causal Intervention. InProceedings of the ACM Web Conference 2026. 2049–2059

work page 2026
[47]

Wenlin Zhang, Xiangyang Li, Kuicai Dong, Yichao Wang, Pengyue Jia, Xiaopeng Li, Yingyi Zhang, Derong Xu, Zhaocheng Du, Huifeng Guo, et al. 2025. Process vs. outcome reward: Which is better for agentic RAG reinforcement learning. arXiv preprint arXiv:2505.14069(2025)

work page arXiv 2025
[48]

Wenlin Zhang, Xiaopeng Li, Yingyi Zhang, Pengyue Jia, Yichao Wang, Huifeng Guo, Yong Liu, and Xiangyu Zhao. 2025. Deep research: A survey of autonomous research agents.arXiv preprint arXiv:2508.12752(2025)

work page arXiv 2025
[49]

Yingyi Zhang, Pengyue Jia, Derong Xu, Yi Wen, Xianneng Li, Yichao Wang, Wenlin Zhang, Xiaopeng Li, Weinan Gan, Huifeng Guo, et al. 2026. Personalize before retrieve: Llm-based personalized query expansion for user-centric retrieval. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 40. 16406– 16414

work page 2026
[50]

Yingyi Zhang, Junyi Li, Wenlin Zhang, Pengyue Jia, Xianneng Li, Yichao Wang, Derong Xu, Yi Wen, Huifeng Guo, Yong Liu, and Xiangyu Zhao. 2026. Evoking User Memory: Personalizing LLM via Recollection-Familiarity Adaptive Retrieval. InThe Fourteenth International Conference on Learning Representations. https: //openreview.net/forum?id=f7p0F2X6XN

work page 2026
[51]

Zijian Zhang, Shuchang Liu, Ziru Liu, Rui Zhong, Qingpeng Cai, Xiangyu Zhao, Chunxu Zhang, Qidong Liu, and Peng Jiang. 2025. Llm-powered user simulator for recommender system. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 39. 13339–13347

work page 2025
[52]

Deep reinforce- ment learning for search, recommendation, and online advertising: a survey

Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin. 2019. " Deep reinforce- ment learning for search, recommendation, and online advertising: a survey" by Xiangyu Zhao, Long Xia, Jiliang Tang, and Dawei Yin with Martin Vesely as coordinator.ACM sigweb newsletter2019, Spring (2019), 1–15

work page 2019
[53]

Yuxiang Zheng, Dayuan Fu, Xiangkun Hu, Xiaojie Cai, Lyumanshan Ye, Pengrui Lu, and Pengfei Liu. 2025. Deepresearcher: Scaling deep research via reinforce- ment learning in real-world environments.arXiv preprint arXiv:2504.03160(2025)

work page arXiv 2025