PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

Bihui Yu; Cheng Tan; Fuqiang Wang; Jiaohao Fu; Jie Dong; Jingxuan Wei; Siyuan Li; Song Tan; Xinglong Xu; Zheng Guo

arxiv: 2606.07454 · v1 · pith:Z464W5RCnew · submitted 2026-06-05 · 💻 cs.IR · cs.AI

PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

Fuqiang Wang , Song Tan , Zheng Guo , Jiaohao Fu , Xinglong Xu , Bihui Yu , Jie Dong , Zheng Sun

show 3 more authors

Siyuan Li Jingxuan Wei Cheng Tan

This is my paper

Pith reviewed 2026-06-27 20:26 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords scientific paper recommendationlongitudinal evaluationuser profilinginterest driftrecommendation systemsinformation retrievaladaptive rankingdaily streams

0 comments

The pith

PaperFlow organizes daily scientific paper recommendation into profiling, recommending, and adapting stages that handle interest shifts over time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that scientific paper recommendation should be treated as a longitudinal daily process rather than static ranking over fixed sets. It proposes PaperFlow as a framework with three coupled stages: building inspectable user profiles from cold-start evidence, ranking each day's paper stream under a display budget using multiple signals, and updating the profile from distinct feedback types while modeling drift across days. A shared longitudinal benchmark with 24 simulated users, 50 daily streams, and over 1,200 episodes is introduced to enable consistent evaluation. Experiments show PaperFlow outperforming five baselines on oracle ranking, alignment with simulated selections, and blind human judgments.

Core claim

PaperFlow structures scientific paper recommendation as three coupled stages—Profiling from heterogeneous cold-start evidence into a structured scholarly profile, Recommending via multi-signal aggregation on date-specific streams under a fixed budget, and Adapting from semantically distinct feedback while modeling interest drift—and demonstrates stronger oracle-based ranking, behavioral alignment with simulated reading, and blind human-evaluation scores than five baselines on a fixed longitudinal user-day benchmark containing 1,200 episodes and 20,727 papers.

What carries the argument

The PaperFlow framework's three coupled stages (Profiling, Recommending, Adapting) linked through a longitudinal user-day benchmark that fixes users, dates, candidate pools, visible inputs, and hidden relevance labels under a shared temporal boundary.

If this is right

Daily paper streams can be ranked more effectively when user profiles are maintained and updated across days rather than recomputed from scratch.
Interest drift becomes modellable when feedback is treated as semantically distinct signals instead of uniform relevance labels.
Evaluation of recommendation systems gains consistency when benchmarks fix the temporal information boundary and separate visible inputs from hidden labels.
Automatic metrics gain credibility when aligned with blind human expert judgments on the same episodes.
Cold-start scenarios in scientific reading become addressable through structured, inspectable profiles built from heterogeneous evidence.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged structure could apply to other evolving recommendation settings such as daily news or research tool suggestions where user state changes over short time scales.
The fixed-benchmark design could serve as a template for testing adaptive systems in domains outside papers, provided the simulated users are replaced by real traces.
If the simulated drift patterns diverge from actual researcher behavior, the adaptation stage would need recalibration using real longitudinal logs.
Integration with live arXiv or PubMed feeds would allow direct measurement of whether the multi-signal ranking reduces information overload on working days.

Load-bearing premise

The 24 simulated research users, their feedback signals, and the hidden simulated relevance labels accurately reflect real scientific reading behavior, interest drift, and relevance judgments.

What would settle it

Deploy PaperFlow to real researchers for multiple consecutive days, record their actual reading selections and explicit feedback, then compare the system's rankings and adaptations against those observed choices and reported satisfaction.

Figures

Figures reproduced from arXiv: 2606.07454 by Bihui Yu, Cheng Tan, Fuqiang Wang, Jiaohao Fu, Jie Dong, Jingxuan Wei, Siyuan Li, Song Tan, Xinglong Xu, Zheng Guo, Zheng Sun.

**Figure 2.** Figure 2: Overview of the PaperFlow dynamic personalized scientific reading loop. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Construction pipeline of the PaperFlow benchmark. Daily paper streams and simulated [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Automatic–human metric alignment (ModelAutoScore vs. ModelHumanScore) [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 6.** Figure 6: Interest-drift analysis. Cell color is normalized within each metric, with darker green [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Representative PaperFlow case study. reproducibility and controlled temporal comparison, but the simulated labels should not be interpreted as human-annotated truth or as deployment logs from real users. The current benchmark is mainly derived from arXiv daily paper streams, so coverage may differ across fields and publication venues. Future work should connect the protocol with larger-scale human evaluati… view at source ↗

**Figure 8.** Figure 8: The 24 simulated researcher profiles. mark construction. The system must combine long-term directions, short-term behavior, and explicit rules to decide whether a paper is worth showing to the user. B.2 The 24 Simulated Researchers [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Overview of PaperFlow evaluation metrics. The figure groups metrics by oracle-based [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Case-study selection criteria. The figure summarizes five representative case types for in [PITH_FULL_IMAGE:figures/full_fig_p045_10.png] view at source ↗

**Figure 11.** Figure 11: Successful recommendation case for an NLP user. The episode shows that the user’s [PITH_FULL_IMAGE:figures/full_fig_p046_11.png] view at source ↗

**Figure 12.** Figure 12: Interest-drift case from GUI/Web agents to multimodal reasoning. Repeated evidence [PITH_FULL_IMAGE:figures/full_fig_p047_12.png] view at source ↗

**Figure 13.** Figure 13: Behavior-consistency case with a high-SelectedNDCG list. Selected papers appear near the front of the Top-20, showing that behavior-based agreement can complement static oracle relevance in longitudinal recommendation evaluation. 48 [PITH_FULL_IMAGE:figures/full_fig_p048_13.png] view at source ↗

read the original abstract

Scientific paper recommendation is typically evaluated as static ranking over a fixed candidate set, yet real scientific reading unfolds as a daily, longitudinal process in which interests shift and feedback accumulates. We introduce PaperFlow, a framework that organizes it into three coupled stages: Profiling, which constructs and maintains a structured, inspectable scholarly profile from heterogeneous cold-start evidence; Recommending, which ranks each date-specific paper stream through multi-signal aggregation under a fixed display budget; and Adapting, which updates user state from semantically distinct feedback signals and models interest drift across days. We further define a longitudinal user-day benchmark that fixes users, dates, candidate pools, visible inputs, and hidden simulated relevance labels under a shared temporal information boundary. The benchmark contains 24 simulated research users, 50 daily paper streams, 1,200 user-day episodes, 20,727 unique papers, and 497,448 episode-paper records. We additionally specify a blind human-evaluation protocol to validate alignment between automatic metrics and expert judgments. Experiments against five scientific recommendation baselines show that PaperFlow achieves the strongest oracle-based ranking, the highest behavioral alignment with simulated reading selections, and the best blind human-evaluation score.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PaperFlow adds a three-stage framework and a fixed longitudinal benchmark, but its performance claims rest on 24 simulated users whose interest-drift model is untested against real reading behavior.

read the letter

The paper defines PaperFlow as three linked stages—profiling from cold-start signals, ranking daily streams under a display limit, and adapting profiles with feedback while tracking drift—and pairs it with a benchmark that fixes 24 users, 50 days, candidate pools, and hidden relevance labels. That benchmark contains 1,200 user-day episodes and nearly 500k records. The main advance is moving evaluation away from static ranking toward a repeatable longitudinal setup that could be reused.

The experiments report that the framework beats five baselines on oracle ranking, alignment with simulated selections, and a blind human score. The abstract is clear that these results come from the same simulated environment.

The load-bearing weakness is exactly the simulation. All three stages are scored inside a model of user behavior and label assignment that is not shown to match actual scholars. If the drift rules or relevance procedure diverge from real patterns, the reported wins do not demonstrate better recommendations for working researchers. The blind human protocol is offered as an external check, yet its independence from the simulation is not established in the abstract.

No circular reasoning appears in the stated claims, and the citation pattern follows standard IR and recsys references. The methods are not visible here, so I cannot judge the aggregation or update rules in detail.

This work is for groups already building or evaluating paper recommenders who want a longitudinal testbed. A reader focused on benchmark design would find the fixed temporal boundary useful to examine. It deserves peer review because the framework and benchmark are concrete enough to discuss and improve, even if the simulation validity must be addressed in revision.

Referee Report

1 major / 1 minor

Summary. The manuscript presents PaperFlow, a framework for recommending scientific papers in a daily longitudinal setting. It consists of three stages: Profiling to build inspectable scholarly profiles from cold-start evidence, Recommending to rank date-specific paper streams under a display budget using multi-signal aggregation, and Adapting to update user state from feedback and model interest drift. The authors introduce a longitudinal user-day benchmark with 24 simulated users, 50 daily streams, 1,200 episodes, and 497,448 records, and report that PaperFlow outperforms five baselines in oracle ranking, behavioral alignment, and blind human evaluation.

Significance. This work has potential significance in moving paper recommendation evaluation from static to dynamic, longitudinal settings that better reflect real scientific reading. The provision of a fixed, reproducible benchmark with explicit temporal boundaries and the use of blind human evaluation are positive aspects that could facilitate future comparisons. Credit is due for the detailed specification of the benchmark parameters (24 users, 50 streams, shared information boundary) and the multi-stage framework. However, the significance is tempered by the exclusive reliance on simulated data for all quantitative claims.

major comments (1)

[Longitudinal user-day benchmark definition and evaluation] The performance claims (strongest oracle-based ranking, highest behavioral alignment with simulated reading selections, best blind human-evaluation score) are obtained exclusively on the longitudinal user-day benchmark constructed from 24 simulated research users, hidden simulated relevance labels, and an interest-drift model (as described in the benchmark definition and evaluation sections). No validation of the simulation procedure against real scholarly reading patterns, interest drift, or external relevance judgments is provided, which is load-bearing for the central empirical contribution and the reported superiority over the five baselines.

minor comments (1)

[Abstract] The abstract could more explicitly state the limitations of the simulated benchmark to set appropriate expectations.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for acknowledging the reproducibility of the benchmark and the value of longitudinal evaluation. We address the single major comment below.

read point-by-point responses

Referee: [Longitudinal user-day benchmark definition and evaluation] The performance claims (strongest oracle-based ranking, highest behavioral alignment with simulated reading selections, best blind human-evaluation score) are obtained exclusively on the longitudinal user-day benchmark constructed from 24 simulated research users, hidden simulated relevance labels, and an interest-drift model (as described in the benchmark definition and evaluation sections). No validation of the simulation procedure against real scholarly reading patterns, interest drift, or external relevance judgments is provided, which is load-bearing for the central empirical contribution and the reported superiority over the five baselines.

Authors: We agree the simulation is central to the quantitative results. The benchmark was deliberately constructed as a fully specified, reproducible simulation to enable controlled longitudinal experiments (interest drift, accumulating feedback, date-specific streams) that cannot be ethically or practically obtained at this scale with real users. All five baselines are evaluated under identical conditions, so relative improvements remain valid within the benchmark. The blind human evaluation supplies an external, non-simulated check: domain experts preferred PaperFlow outputs without knowledge of the underlying labels or drift model. We will add a dedicated Limitations subsection that explicitly discusses the simulation assumptions, their potential divergence from real reading patterns, and the absence of direct validation against external user logs. No new experiments are required for this clarification. revision: partial

Circularity Check

0 steps flagged

No circularity; empirical claims rest on external baseline comparisons within a self-defined but non-reductive benchmark

full rationale

The paper introduces PaperFlow as a three-stage framework and defines its own longitudinal user-day benchmark with 24 simulated users and hidden relevance labels. It then reports performance against five independent baselines plus a blind human-evaluation protocol. No derivation, equation, or result reduces to its own inputs by construction; the benchmark is an evaluation substrate rather than a self-referential loop, and no self-citations or fitted parameters are invoked as load-bearing premises. This is the standard case of a proposal evaluated on a custom testbed.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the framework and benchmark are described at a high level without technical internals.

pith-pipeline@v0.9.1-grok · 5768 in / 1141 out tokens · 22356 ms · 2026-06-27T20:26:18.892421+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

71 extracted references · 18 canonical work pages

[1]

Hwang, Varsha Kishore, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Daniel S

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, David Wadden, Matt Latzke, Jenna Sparks, Jena D. Hwang, Varsha Kishore, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Daniel S. Weld, Doug Downey, Wen-Tau Yih, P...

work page doi:10.1038/s41586-025-10072-4 2026
[2]

Agentic Feedback Loop Modeling Improves Recommendation and User Simulation

Shihao Cai, Jizhi Zhang, Keqin Bao, Chongming Gao, Qifan Wang, Fuli Feng, and Xiangnan He. Agentic Feedback Loop Modeling Improves Recommendation and User Simulation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2235–2244,
[3]

URLhttps://doi.org/10.1145/3726302.3729893

doi: 10.1145/3726302.3729893. URLhttps://doi.org/10.1145/3726302.3729893

work page doi:10.1145/3726302.3729893
[4]

A Multi-Agent Conversational Recommender System.arXiv preprint arXiv:2402.01135, 2024

Jiabao Fang, Shen Gao, Pengjie Ren, Xiuying Chen, Suzan Verberne, and Zhaochun Ren. A Multi-Agent Conversational Recommender System.arXiv preprint arXiv:2402.01135, 2024. URL https://arxiv.org/abs/ 2402.01135

arXiv 2024
[5]

Scholar Inbox: Personalized Paper Recommendations for Scientists

Markus Flicke, Glenn Angrabeit, Madhav Iyengar, Vitalii Protsenko, Illia Shakun, Jovan Cicvaric, Bora Kargi, 11 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams Haoyu He, Lukas Schuler, Lewin Scholz, Kavyanjali Agnihotri, Yong Cao, and Andreas Geiger. Scholar Inbox: Personalized Paper Recommendations for Scientists. InACL 2025 S...

2025
[6]

2023 , isbn =

Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Andrew Head, Marti A. Hearst, and Daniel S. Weld. Scim: Intelligent Skimming Support for Scientific Papers. InIUI 2023, 2023. doi: 10.1145/3581641.3584034. URLhttps://doi.org/10.1145/3581641.3584034

work page doi:10.1145/3581641.3584034 2023
[7]

Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.arXiv preprint arXiv:2303.14524, 2023

Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.arXiv preprint arXiv:2303.14524, 2023. URLhttps://arxiv.org/abs/2303.14524

arXiv 2023
[8]

PaSa: An LLM Agent for Comprehensive Academic Paper Search

Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, and Weinan E. PaSa: An LLM Agent for Comprehensive Academic Paper Search. InACL 2025 Long Papers, 2025. URL https: //aclanthology.org/2025.acl-long.572/

2025
[9]

Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs (RecBench+)

Jiani Huang, Shijie Wang, Liang bo Ning, Wenqi Fan, Shuaiqiang Wang, Dawei Yin, and Qing Li. Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs (RecBench+). InWSDM 2026, 2026. doi: 10.1145/3773966.3777954. URL https://doi.org/10.1145/ 3773966.3777954

work page doi:10.1145/3773966.3777954 2026
[10]

Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations.ACM Transactions on Recommender Systems, 2025

Xu Huang, Jianxun Lian, Yuxuan Lei, Jing Yao, Defu Lian, and Xing Xie. Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations.ACM Transactions on Recommender Systems, 2025. doi: 10.1145/3731446. URLhttps://doi.org/10.1145/3731446

work page doi:10.1145/3731446 2025
[11]

Kang, Sherry Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur

Hyeonsu B. Kang, Sherry Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur. Synergi: A Mixed- Initiative System for Scholarly Synthesis and Sensemaking. InUIST 2023, 2023. doi: 10.1145/3586183.3606759. URLhttps://doi.org/10.1145/3586183.3606759

work page doi:10.1145/3586183.3606759 2023
[12]

Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue

Yoonjoo Lee, Hyeonsu B. Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue. PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers. InCHI 2024, 2024. doi: 10.1145/3613904.3642196. URL https://doi.org/10.1145/ 3613904.3642196

work page doi:10.1145/3613904.3642196 2024
[13]

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, and Haizhou Li. Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems. InFindings of NAACL 2025, 2025. URLhttps://aclanthology.org/2025.findings-naacl.17/

2025
[14]

Towards Personalized Deep Research: Benchmarks and Evaluations (PDR-Bench)

Yuan Liang, Jiaxian Li, Yuqing Wang, Piaohong Wang, Motong Tian, Pai Liu, Shuofei Qiao, Runnan Fang, He Zhu, Ge Zhang, Minghao Liu, Yuchen Eleanor Jiang, Ningyu Zhang, and Wangchunshu Zhou. Towards Personalized Deep Research: Benchmarks and Evaluations (PDR-Bench). InICLR 2026 Poster, 2026. URL https://openreview.net/forum?id=51LIRzF53v

2026
[15]

acl-long.805/

Guanyu Lin, Tao Feng, Pengrui Han, Ge Liu, and Jiaxuan You. Arxiv Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 122–130, 2024. doi: 10.18653/v1/2024. emnlp-demo.13. URLhttps://aclanthology.org/202...

work page doi:10.18653/v1/2024 2024
[16]

Towards Interest Drift-driven User Representation Learning in Sequential Recommendation (IDURL)

Xiaolin Lin, Weike Pan, and Zhong Ming. Towards Interest Drift-driven User Representation Learning in Sequential Recommendation (IDURL). InSIGIR 2025, 2025. doi: 10.1145/3726302.3730099. URL https: //doi.org/10.1145/3726302.3730099

work page doi:10.1145/3726302.3730099 2025
[17]

Knowledge-Enhanced Recommendation with User-Centric Subgraph Network.arXiv preprint arXiv:2403.14377, 2024

Guangyi Liu, Quanming Yao, Yongqi Zhang, and Lei Chen. Knowledge-Enhanced Recommendation with User-Centric Subgraph Network.arXiv preprint arXiv:2403.14377, 2024. URLhttps://arxiv.org/abs/2403. 14377

arXiv 2024
[19]

URLhttps://arxiv.org/abs/2503.01189

arXiv
[20]

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasi- ades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond 12 PaperFlow: Profiling, Recommending, and Adapting Acros...

work page doi:10.1145/3654777 2024
[21]

Programming with data: Test-driven data engineering for self-improving llms from raw corpora

Chenkai Pan, Xinglong Xu, Yuhang Xu, Yujun Wu, Siyuan Li, Jintao Chen, Conghui He, Jingxuan Wei, and Cheng Tan. Programming with data: Test-driven data engineering for self-improving llms from raw corpora. arXiv preprint arXiv:2604.24819, 2026

Pith/arXiv arXiv 2026
[22]

Zhang, and Daniel S Weld

Napol Rachatasumrit, Jonathan Bragg, Amy X. Zhang, and Daniel S Weld. CiteRead: Integrating Localized Citation Contexts into Scientific Paper Reading. InIUI 2022, 2022. doi: 10.1145/3490099.3511162. URL https://doi.org/10.1145/3490099.3511162

work page doi:10.1145/3490099.3511162 2022
[23]

Rahmani, Xi Wang, Xiao Fu, and Aldo Lipani

Jerome Ramos, Hossen A. Rahmani, Xi Wang, Xiao Fu, and Aldo Lipani. Transparent and Scrutable Recommendations Using Natural Language User Profiles. InACL 2024 Long Papers, 2024. URL https: //aclanthology.org/2024.acl-long.753/

2024
[24]

AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems

Yu Shang, Peijie Liu, Yuwei Yan, Zijing Wu, Leheng Sheng, Yuanqing Yu, Chumeng Jiang, An Zhang, Fengli Xu, Yu Wang, Min Zhang, and Yong Li. AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems. InNeurIPS 2025 Datasets and Benchmarks Track Spotlight, 2025. URL https:// openreview.net/forum?id=fm77rDf9JS

2025
[25]

Large Language Models are Learnable Planners for Long-Term Recommendation

Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, and Fuli Feng. Large Language Models are Learnable Planners for Long-Term Recommendation. InSIGIR 2024, 2024. doi: 10.1145/3626772.3657683. URLhttps://doi.org/10.1145/3626772.3657683

work page doi:10.1145/3626772.3657683 2024
[26]

RAH! RecSys– Assistant–Human: A Human-Centered Recommendation Framework With LLM Agents.IEEE Transactions on Computational Social Systems, 11(5):6759–6770, 2024

Yubo Shu, Haonan Zhang, Hansu Gu, Peng Zhang, Tun Lu, Dongsheng Li, and Ning Gu. RAH! RecSys– Assistant–Human: A Human-Centered Recommendation Framework With LLM Agents.IEEE Transactions on Computational Social Systems, 11(5):6759–6770, 2024. doi: 10.1109/TCSS.2024.3404039. URL https://doi. org/10.1109/TCSS.2024.3404039

work page doi:10.1109/tcss.2024.3404039 2024
[27]

Skarlinski, Sam Cox, Jon M

Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, and Andrew D. White. Language agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024. URL https://arxiv.org/abs/2409. 13740

arXiv 2024
[28]

Ruotong Wang, Xinyi Zhou, Lin Qiu, Joseph Chee Chang, Jonathan Bragg, and Amy X. Zhang. PaperPing: A Socially-aware AI Agent that Recommends Academic Papers to Research Group Chats with Contextualized Explanations. InCSCW Companion 2025, 2025. doi: 10.1145/3715070.3757230. URL https://doi.org/10. 1145/3715070.3757230

work page doi:10.1145/3715070.3757230 2025
[29]

Ruotong Wang, Xinyi Zhou, Lin Qiu, Joseph Chee Chang, Jonathan Bragg, and Amy X. Zhang. Social- RAG: Retrieving from Group Interactions to Socially Ground AI Generation. InCHI 2025, 2025. doi: 10.1145/3706598.3713749. URLhttps://doi.org/10.1145/3706598.3713749

work page doi:10.1145/3706598.3713749 2025
[30]

Discourse-Aware Scientific Paper Recommendation via QA-Style Sum- marization and Multi-Level Contrastive Learning.arXiv preprint arXiv:2511.03330, 2025

Shenghua Wang and Zhen Yin. Discourse-Aware Scientific Paper Recommendation via QA-Style Sum- marization and Multi-Level Contrastive Learning.arXiv preprint arXiv:2511.03330, 2025. URL https: //arxiv.org/abs/2511.03330

arXiv 2025
[31]

SurveyAgent: A Conversational System for Personalized and Efficient Research Survey.arXiv preprint arXiv:2404.06364, 2024

Xintao Wang, Jiangjie Chen, Nianqi Li, Lida Chen, Xinfeng Yuan, Wei Shi, Xuyang Ge, Rui Xu, and Yanghua Xiao. SurveyAgent: A Conversational System for Personalized and Efficient Research Survey.arXiv preprint arXiv:2404.06364, 2024. URLhttps://arxiv.org/abs/2404.06364

arXiv 2024
[32]

RecMind: Large Language Model Powered Agent For Recommendation

Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, and Yingzhen Yang. RecMind: Large Language Model Powered Agent For Recommendation. In Findings of NAACL 2024, 2024. URLhttps://aclanthology.org/2024.findings-naacl.271/. 13 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

2024
[33]

Pager: Bridging the semantic-execution gap in point-precise geometric gui control.arXiv preprint arXiv:2605.15963, 2026

Jingxuan Wei, Xi Bai, Shan Liu, Caijun Jia, Zheng Sun, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, et al. Pager: Bridging the semantic-execution gap in point-precise geometric gui control.arXiv preprint arXiv:2605.15963, 2026

Pith/arXiv arXiv 2026
[34]

The trinity of consistency as a defining principle for general world models.arXiv preprint arXiv:2602.23152, 2026

Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Chang Yu, et al. The trinity of consistency as a defining principle for general world models.arXiv preprint arXiv:2602.23152, 2026

arXiv 2026
[35]

Research Paper Recommender System by Considering Users’ Information Seeking Behaviors

Zhelin Xu, Shuhei Yamamoto, and Hideo Joho. Research Paper Recommender System by Considering Users’ Information Seeking Behaviors. InJSAI 2025, 2025. URL https://www.jstage.jst.go.jp/article/pjsai/ JSAI2025/0/JSAI2025_1D4OS24b02/_article/-char/en

2025
[36]

Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems

Hyunsik Yoo, SeongKu Kang, Ruizhong Qiu, Charlie Xu, Fei Wang, and Hanghang Tong. Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems. InSIGIR 2025, 2025. doi: 10.1145/3726302.3729964. URLhttps://doi.org/10.1145/3726302.3729964

work page doi:10.1145/3726302.3729964 2025
[37]

Paperfit: Vision-in-the-loop typesetting optimization for scientific documents.arXiv preprint arXiv:2605.10341, 2026

Bihui Yu, Xinglong Xu, Junjie Jiang, Jiabei Cheng, Caijun Jia, Siyuan Li, Conghui He, Jingxuan Wei, and Cheng Tan. Paperfit: Vision-in-the-loop typesetting optimization for scientific documents.arXiv preprint arXiv:2605.10341, 2026

Pith/arXiv arXiv 2026
[38]

AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems

Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems. InThe Web Conference 2024 / WWW 2024, 2024. doi: 10.1145/3589334.3645537. URL https: //doi.org/10.1145/3589334.3645537

work page doi:10.1145/3589334.3645537 2024
[39]

Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning

Yuyue Zhao, Jiancan Wu, Xiang Wang, Wei Tang, Dingxian Wang, and Maarten de Rijke. Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning. InSIGIR 2024, 2024. doi: 10.1145/3626772.3657828. URLhttps://doi.org/10.1145/3626772.3657828. 14 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams Appendix Appendix Table...

work page doi:10.1145/3626772.3657828 2024
[40]

Retrieve the candidate papers for the corresponding date from the fixed paper pool
[41]

Compute the base relevance score from the user profile and paper content
[42]

Check must-read author, institution, and keyword matches
[43]

Apply interest-drift weighting to new interest topics and downweight suppressed old topics
[44]

Apply recent reading-signal weights to short-term topics that repeatedly appear and are selected by the user
[45]

Generate the Top-20 recommendation list
[46]

Simulate user selections based on oracle labels, system rank, system label, and drift-topic matches
[47]

always show papers by this author,

Generate reading reports for selected papers and record token usage, episode metadata, and the drift timeline. This pipeline connects recommendation, selection, profile update, and reading assistance into a closed loop. The recommendation stage determines what the user sees. The selection stage simulates what the user actually clicks or reads. The profile...
[48]

Do not invent methods, experiments, or conclusions that do not appear in the paper information
[49]

Do not exaggerate it as directly solving the user's problem

If the paper is only topically related, explicitly say it is topically related. Do not exaggerate it as directly solving the user's problem
[50]

If a must-read author, institution, or keyword is matched, state the reason
[51]

If the paper is related to an interest-drift direction, state the corresponding new interest topic
[52]

user_profile

Output only one or two sentences. User: { "user_profile": { "core_directions": {...}, "must_read": {...}, "drift_state": "...", "reading_signal_topics": [...] }, "paper": { "title": "...", "abstract": "...", "authors": [...], "institutions": [...], "keywords": [...] }, "signals": { "system_label": "...", "score": 0.0, "matched_topics": [...], "matched_mus...
[53]

ProfileMatch: whether the list matches the user's profile and current research interests
[54]

RankingQuality: whether stronger or more useful papers appear earlier in the Top-20 list
[55]

DecisionUsefulness: whether the list helps the user decide what to read, skim, or skip
[56]

ProfileMatch

DiversityFocusBalance: whether the list balances focused relevance with useful breadth. Return JSON: { "ProfileMatch": 1, "RankingQuality": 1, "DecisionUsefulness": 1, "DiversityFocusBalance": 1, "comments": "brief rationale" } Model-comparison report prompt. Model-Comparison Report Prompt You will see a researcher profile, a recommended paper, the paper ...

Pith/arXiv arXiv 2026
[57]

The candidate paper pool remains fixed
[58]

User profiles remain fixed
[59]

The Top-20 display budget remains fixed
[60]

The embedding model remains fixed
[61]

Evaluation metrics remain fixed
[62]

42 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

Each model writes to a separate output directory. 42 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams
[63]

8.TokenCost is reported only as an efficiency metric and is not included in ModelAutoScore or ModelHumanScore

Token usage is recorded by date. 8.TokenCost is reported only as an efficiency metric and is not included in ModelAutoScore or ModelHumanScore. If a model encounters JSON parsing errors, API connection errors, or token-usage accounting anoma- lies, these should be recorded separately and should not be directly compared with models that completed full runs...
[64]

Several Top-5 papers are both must_read and relevant
[65]

The user selects five papers from the list
[66]

The ranking remains strong beyond static oracle labels. Top-20 Overview: Density of Relevant/Must-read Papers Rank Relevant/Must_read Rank Relevant/Must_read relevant/must_read relevant(not must_read) not relevant User Long-term Interests Mechanism: The success comes from aligned signals. The user's long-termdirections are NLP, LLMs, and information extra...
[67]

The user initially concentrates on GUI agents and web automation
[68]

The observing state accumulates repeated new-topic hits
[69]

The signal eventually crosses the drift threshold. New-Topic Hits over Time New-topic hits (per observation) Drift threshold threshold(30) Time (observations) Update Mechanism: After the threshold is crossed, the system locks multimodal reasoning as the anchor topic. It then increases the ranking weight of papers related to multimodal interaction and reas...
[70]

Some papers may be only weak_relevant under the oracle
[71]

The user may still click or read them because similar topicswere selected recently
[72]

Interpretation: The case illustrates that behavioral consistency is not identical to static relevance

If the system ranks these papers early, SelectedNDCG@20 increases. Interpretation: The case illustrates that behavioral consistency is not identical to static relevance. Selected papers are not always those with the highest oracle gain, but they may better match the user's recent reading trajectory. Longitudinal scientific recommendation should therefore ...

[1] [1]

Hwang, Varsha Kishore, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Daniel S

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, David Wadden, Matt Latzke, Jenna Sparks, Jena D. Hwang, Varsha Kishore, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Daniel S. Weld, Doug Downey, Wen-Tau Yih, P...

work page doi:10.1038/s41586-025-10072-4 2026

[2] [2]

Agentic Feedback Loop Modeling Improves Recommendation and User Simulation

Shihao Cai, Jizhi Zhang, Keqin Bao, Chongming Gao, Qifan Wang, Fuli Feng, and Xiangnan He. Agentic Feedback Loop Modeling Improves Recommendation and User Simulation. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2235–2244,

[3] [3]

URLhttps://doi.org/10.1145/3726302.3729893

doi: 10.1145/3726302.3729893. URLhttps://doi.org/10.1145/3726302.3729893

work page doi:10.1145/3726302.3729893

[4] [4]

A Multi-Agent Conversational Recommender System.arXiv preprint arXiv:2402.01135, 2024

Jiabao Fang, Shen Gao, Pengjie Ren, Xiuying Chen, Suzan Verberne, and Zhaochun Ren. A Multi-Agent Conversational Recommender System.arXiv preprint arXiv:2402.01135, 2024. URL https://arxiv.org/abs/ 2402.01135

arXiv 2024

[5] [5]

Scholar Inbox: Personalized Paper Recommendations for Scientists

Markus Flicke, Glenn Angrabeit, Madhav Iyengar, Vitalii Protsenko, Illia Shakun, Jovan Cicvaric, Bora Kargi, 11 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams Haoyu He, Lukas Schuler, Lewin Scholz, Kavyanjali Agnihotri, Yong Cao, and Andreas Geiger. Scholar Inbox: Personalized Paper Recommendations for Scientists. InACL 2025 S...

2025

[6] [6]

2023 , isbn =

Raymond Fok, Hita Kambhamettu, Luca Soldaini, Jonathan Bragg, Kyle Lo, Andrew Head, Marti A. Hearst, and Daniel S. Weld. Scim: Intelligent Skimming Support for Scientific Papers. InIUI 2023, 2023. doi: 10.1145/3581641.3584034. URLhttps://doi.org/10.1145/3581641.3584034

work page doi:10.1145/3581641.3584034 2023

[7] [7]

Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.arXiv preprint arXiv:2303.14524, 2023

Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.arXiv preprint arXiv:2303.14524, 2023. URLhttps://arxiv.org/abs/2303.14524

arXiv 2023

[8] [8]

PaSa: An LLM Agent for Comprehensive Academic Paper Search

Yichen He, Guanhua Huang, Peiyuan Feng, Yuan Lin, Yuchen Zhang, Hang Li, and Weinan E. PaSa: An LLM Agent for Comprehensive Academic Paper Search. InACL 2025 Long Papers, 2025. URL https: //aclanthology.org/2025.acl-long.572/

2025

[9] [9]

Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs (RecBench+)

Jiani Huang, Shijie Wang, Liang bo Ning, Wenqi Fan, Shuaiqiang Wang, Dawei Yin, and Qing Li. Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs (RecBench+). InWSDM 2026, 2026. doi: 10.1145/3773966.3777954. URL https://doi.org/10.1145/ 3773966.3777954

work page doi:10.1145/3773966.3777954 2026

[10] [10]

Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations.ACM Transactions on Recommender Systems, 2025

Xu Huang, Jianxun Lian, Yuxuan Lei, Jing Yao, Defu Lian, and Xing Xie. Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations.ACM Transactions on Recommender Systems, 2025. doi: 10.1145/3731446. URLhttps://doi.org/10.1145/3731446

work page doi:10.1145/3731446 2025

[11] [11]

Kang, Sherry Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur

Hyeonsu B. Kang, Sherry Tongshuang Wu, Joseph Chee Chang, and Aniket Kittur. Synergi: A Mixed- Initiative System for Scholarly Synthesis and Sensemaking. InUIST 2023, 2023. doi: 10.1145/3586183.3606759. URLhttps://doi.org/10.1145/3586183.3606759

work page doi:10.1145/3586183.3606759 2023

[12] [12]

Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue

Yoonjoo Lee, Hyeonsu B. Kang, Matt Latzke, Juho Kim, Jonathan Bragg, Joseph Chee Chang, and Pao Siangliulue. PaperWeaver: Enriching Topical Paper Alerts by Contextualizing Recommended Papers with User-collected Papers. InCHI 2024, 2024. doi: 10.1145/3613904.3642196. URL https://doi.org/10.1145/ 3613904.3642196

work page doi:10.1145/3613904.3642196 2024

[13] [13]

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, and Haizhou Li. Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems. InFindings of NAACL 2025, 2025. URLhttps://aclanthology.org/2025.findings-naacl.17/

2025

[14] [14]

Towards Personalized Deep Research: Benchmarks and Evaluations (PDR-Bench)

Yuan Liang, Jiaxian Li, Yuqing Wang, Piaohong Wang, Motong Tian, Pai Liu, Shuofei Qiao, Runnan Fang, He Zhu, Ge Zhang, Minghao Liu, Yuchen Eleanor Jiang, Ningyu Zhang, and Wangchunshu Zhou. Towards Personalized Deep Research: Benchmarks and Evaluations (PDR-Bench). InICLR 2026 Poster, 2026. URL https://openreview.net/forum?id=51LIRzF53v

2026

[15] [15]

acl-long.805/

Guanyu Lin, Tao Feng, Pengrui Han, Ge Liu, and Jiaxuan You. Arxiv Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 122–130, 2024. doi: 10.18653/v1/2024. emnlp-demo.13. URLhttps://aclanthology.org/202...

work page doi:10.18653/v1/2024 2024

[16] [16]

Towards Interest Drift-driven User Representation Learning in Sequential Recommendation (IDURL)

Xiaolin Lin, Weike Pan, and Zhong Ming. Towards Interest Drift-driven User Representation Learning in Sequential Recommendation (IDURL). InSIGIR 2025, 2025. doi: 10.1145/3726302.3730099. URL https: //doi.org/10.1145/3726302.3730099

work page doi:10.1145/3726302.3730099 2025

[17] [17]

Knowledge-Enhanced Recommendation with User-Centric Subgraph Network.arXiv preprint arXiv:2403.14377, 2024

Guangyi Liu, Quanming Yao, Yongqi Zhang, and Lei Chen. Knowledge-Enhanced Recommendation with User-Centric Subgraph Network.arXiv preprint arXiv:2403.14377, 2024. URLhttps://arxiv.org/abs/2403. 14377

arXiv 2024

[18] [19]

URLhttps://arxiv.org/abs/2503.01189

arXiv

[19] [20]

Kyle Lo, Joseph Chee Chang, Andrew Head, Jonathan Bragg, Amy X. Zhang, Cassidy Trier, Chloe Anastasi- ades, Tal August, Russell Authur, Danielle Bragg, Erin Bransom, Isabel Cachola, Stefan Candra, Yoganand Chandrasekhar, Yen-Sung Chen, Evie Yu-Yen Cheng, Yvonne Chou, Doug Downey, Rob Evans, Raymond 12 PaperFlow: Profiling, Recommending, and Adapting Acros...

work page doi:10.1145/3654777 2024

[20] [21]

Programming with data: Test-driven data engineering for self-improving llms from raw corpora

Chenkai Pan, Xinglong Xu, Yuhang Xu, Yujun Wu, Siyuan Li, Jintao Chen, Conghui He, Jingxuan Wei, and Cheng Tan. Programming with data: Test-driven data engineering for self-improving llms from raw corpora. arXiv preprint arXiv:2604.24819, 2026

Pith/arXiv arXiv 2026

[21] [22]

Zhang, and Daniel S Weld

Napol Rachatasumrit, Jonathan Bragg, Amy X. Zhang, and Daniel S Weld. CiteRead: Integrating Localized Citation Contexts into Scientific Paper Reading. InIUI 2022, 2022. doi: 10.1145/3490099.3511162. URL https://doi.org/10.1145/3490099.3511162

work page doi:10.1145/3490099.3511162 2022

[22] [23]

Rahmani, Xi Wang, Xiao Fu, and Aldo Lipani

Jerome Ramos, Hossen A. Rahmani, Xi Wang, Xiao Fu, and Aldo Lipani. Transparent and Scrutable Recommendations Using Natural Language User Profiles. InACL 2024 Long Papers, 2024. URL https: //aclanthology.org/2024.acl-long.753/

2024

[23] [24]

AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems

Yu Shang, Peijie Liu, Yuwei Yan, Zijing Wu, Leheng Sheng, Yuanqing Yu, Chumeng Jiang, An Zhang, Fengli Xu, Yu Wang, Min Zhang, and Yong Li. AgentRecBench: Benchmarking LLM Agent-based Personalized Recommender Systems. InNeurIPS 2025 Datasets and Benchmarks Track Spotlight, 2025. URL https:// openreview.net/forum?id=fm77rDf9JS

2025

[24] [25]

Large Language Models are Learnable Planners for Long-Term Recommendation

Wentao Shi, Xiangnan He, Yang Zhang, Chongming Gao, Xinyue Li, Jizhi Zhang, Qifan Wang, and Fuli Feng. Large Language Models are Learnable Planners for Long-Term Recommendation. InSIGIR 2024, 2024. doi: 10.1145/3626772.3657683. URLhttps://doi.org/10.1145/3626772.3657683

work page doi:10.1145/3626772.3657683 2024

[25] [26]

RAH! RecSys– Assistant–Human: A Human-Centered Recommendation Framework With LLM Agents.IEEE Transactions on Computational Social Systems, 11(5):6759–6770, 2024

Yubo Shu, Haonan Zhang, Hansu Gu, Peng Zhang, Tun Lu, Dongsheng Li, and Ning Gu. RAH! RecSys– Assistant–Human: A Human-Centered Recommendation Framework With LLM Agents.IEEE Transactions on Computational Social Systems, 11(5):6759–6770, 2024. doi: 10.1109/TCSS.2024.3404039. URL https://doi. org/10.1109/TCSS.2024.3404039

work page doi:10.1109/tcss.2024.3404039 2024

[26] [27]

Skarlinski, Sam Cox, Jon M

Michael D. Skarlinski, Sam Cox, Jon M. Laurent, James D. Braza, Michaela Hinks, Michael J. Hammerling, Manvitha Ponnapati, Samuel G. Rodriques, and Andrew D. White. Language agents achieve superhuman synthesis of scientific knowledge.arXiv preprint arXiv:2409.13740, 2024. URL https://arxiv.org/abs/2409. 13740

arXiv 2024

[27] [28]

Ruotong Wang, Xinyi Zhou, Lin Qiu, Joseph Chee Chang, Jonathan Bragg, and Amy X. Zhang. PaperPing: A Socially-aware AI Agent that Recommends Academic Papers to Research Group Chats with Contextualized Explanations. InCSCW Companion 2025, 2025. doi: 10.1145/3715070.3757230. URL https://doi.org/10. 1145/3715070.3757230

work page doi:10.1145/3715070.3757230 2025

[28] [29]

Ruotong Wang, Xinyi Zhou, Lin Qiu, Joseph Chee Chang, Jonathan Bragg, and Amy X. Zhang. Social- RAG: Retrieving from Group Interactions to Socially Ground AI Generation. InCHI 2025, 2025. doi: 10.1145/3706598.3713749. URLhttps://doi.org/10.1145/3706598.3713749

work page doi:10.1145/3706598.3713749 2025

[29] [30]

Discourse-Aware Scientific Paper Recommendation via QA-Style Sum- marization and Multi-Level Contrastive Learning.arXiv preprint arXiv:2511.03330, 2025

Shenghua Wang and Zhen Yin. Discourse-Aware Scientific Paper Recommendation via QA-Style Sum- marization and Multi-Level Contrastive Learning.arXiv preprint arXiv:2511.03330, 2025. URL https: //arxiv.org/abs/2511.03330

arXiv 2025

[30] [31]

SurveyAgent: A Conversational System for Personalized and Efficient Research Survey.arXiv preprint arXiv:2404.06364, 2024

Xintao Wang, Jiangjie Chen, Nianqi Li, Lida Chen, Xinfeng Yuan, Wei Shi, Xuyang Ge, Rui Xu, and Yanghua Xiao. SurveyAgent: A Conversational System for Personalized and Efficient Research Survey.arXiv preprint arXiv:2404.06364, 2024. URLhttps://arxiv.org/abs/2404.06364

arXiv 2024

[31] [32]

RecMind: Large Language Model Powered Agent For Recommendation

Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, and Yingzhen Yang. RecMind: Large Language Model Powered Agent For Recommendation. In Findings of NAACL 2024, 2024. URLhttps://aclanthology.org/2024.findings-naacl.271/. 13 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

2024

[32] [33]

Pager: Bridging the semantic-execution gap in point-precise geometric gui control.arXiv preprint arXiv:2605.15963, 2026

Jingxuan Wei, Xi Bai, Shan Liu, Caijun Jia, Zheng Sun, Xinglong Xu, Siyuan Li, Linzhuang Sun, Bihui Yu, Conghui He, et al. Pager: Bridging the semantic-execution gap in point-precise geometric gui control.arXiv preprint arXiv:2605.15963, 2026

Pith/arXiv arXiv 2026

[33] [34]

The trinity of consistency as a defining principle for general world models.arXiv preprint arXiv:2602.23152, 2026

Jingxuan Wei, Siyuan Li, Yuhang Xu, Zheng Sun, Junjie Jiang, Hexuan Jin, Caijun Jia, Honghao He, Xinglong Xu, Chang Yu, et al. The trinity of consistency as a defining principle for general world models.arXiv preprint arXiv:2602.23152, 2026

arXiv 2026

[34] [35]

Research Paper Recommender System by Considering Users’ Information Seeking Behaviors

Zhelin Xu, Shuhei Yamamoto, and Hideo Joho. Research Paper Recommender System by Considering Users’ Information Seeking Behaviors. InJSAI 2025, 2025. URL https://www.jstage.jst.go.jp/article/pjsai/ JSAI2025/0/JSAI2025_1D4OS24b02/_article/-char/en

2025

[35] [36]

Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems

Hyunsik Yoo, SeongKu Kang, Ruizhong Qiu, Charlie Xu, Fei Wang, and Hanghang Tong. Embracing Plasticity: Balancing Stability and Plasticity in Continual Recommender Systems. InSIGIR 2025, 2025. doi: 10.1145/3726302.3729964. URLhttps://doi.org/10.1145/3726302.3729964

work page doi:10.1145/3726302.3729964 2025

[36] [37]

Paperfit: Vision-in-the-loop typesetting optimization for scientific documents.arXiv preprint arXiv:2605.10341, 2026

Bihui Yu, Xinglong Xu, Junjie Jiang, Jiabei Cheng, Caijun Jia, Siyuan Li, Conghui He, Jingxuan Wei, and Cheng Tan. Paperfit: Vision-in-the-loop typesetting optimization for scientific documents.arXiv preprint arXiv:2605.10341, 2026

Pith/arXiv arXiv 2026

[37] [38]

AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems

Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems. InThe Web Conference 2024 / WWW 2024, 2024. doi: 10.1145/3589334.3645537. URL https: //doi.org/10.1145/3589334.3645537

work page doi:10.1145/3589334.3645537 2024

[38] [39]

Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning

Yuyue Zhao, Jiancan Wu, Xiang Wang, Wei Tang, Dingxian Wang, and Maarten de Rijke. Let Me Do It For You: Towards LLM Empowered Recommendation via Tool Learning. InSIGIR 2024, 2024. doi: 10.1145/3626772.3657828. URLhttps://doi.org/10.1145/3626772.3657828. 14 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams Appendix Appendix Table...

work page doi:10.1145/3626772.3657828 2024

[39] [40]

Retrieve the candidate papers for the corresponding date from the fixed paper pool

[40] [41]

Compute the base relevance score from the user profile and paper content

[41] [42]

Check must-read author, institution, and keyword matches

[42] [43]

Apply interest-drift weighting to new interest topics and downweight suppressed old topics

[43] [44]

Apply recent reading-signal weights to short-term topics that repeatedly appear and are selected by the user

[44] [45]

Generate the Top-20 recommendation list

[45] [46]

Simulate user selections based on oracle labels, system rank, system label, and drift-topic matches

[46] [47]

always show papers by this author,

Generate reading reports for selected papers and record token usage, episode metadata, and the drift timeline. This pipeline connects recommendation, selection, profile update, and reading assistance into a closed loop. The recommendation stage determines what the user sees. The selection stage simulates what the user actually clicks or reads. The profile...

[47] [48]

Do not invent methods, experiments, or conclusions that do not appear in the paper information

[48] [49]

Do not exaggerate it as directly solving the user's problem

If the paper is only topically related, explicitly say it is topically related. Do not exaggerate it as directly solving the user's problem

[49] [50]

If a must-read author, institution, or keyword is matched, state the reason

[50] [51]

If the paper is related to an interest-drift direction, state the corresponding new interest topic

[51] [52]

user_profile

Output only one or two sentences. User: { "user_profile": { "core_directions": {...}, "must_read": {...}, "drift_state": "...", "reading_signal_topics": [...] }, "paper": { "title": "...", "abstract": "...", "authors": [...], "institutions": [...], "keywords": [...] }, "signals": { "system_label": "...", "score": 0.0, "matched_topics": [...], "matched_mus...

[52] [53]

ProfileMatch: whether the list matches the user's profile and current research interests

[53] [54]

RankingQuality: whether stronger or more useful papers appear earlier in the Top-20 list

[54] [55]

DecisionUsefulness: whether the list helps the user decide what to read, skim, or skip

[55] [56]

ProfileMatch

DiversityFocusBalance: whether the list balances focused relevance with useful breadth. Return JSON: { "ProfileMatch": 1, "RankingQuality": 1, "DecisionUsefulness": 1, "DiversityFocusBalance": 1, "comments": "brief rationale" } Model-comparison report prompt. Model-Comparison Report Prompt You will see a researcher profile, a recommended paper, the paper ...

Pith/arXiv arXiv 2026

[56] [57]

The candidate paper pool remains fixed

[57] [58]

User profiles remain fixed

[58] [59]

The Top-20 display budget remains fixed

[59] [60]

The embedding model remains fixed

[60] [61]

Evaluation metrics remain fixed

[61] [62]

42 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

Each model writes to a separate output directory. 42 PaperFlow: Profiling, Recommending, and Adapting Across Daily Paper Streams

[62] [63]

8.TokenCost is reported only as an efficiency metric and is not included in ModelAutoScore or ModelHumanScore

Token usage is recorded by date. 8.TokenCost is reported only as an efficiency metric and is not included in ModelAutoScore or ModelHumanScore. If a model encounters JSON parsing errors, API connection errors, or token-usage accounting anoma- lies, these should be recorded separately and should not be directly compared with models that completed full runs...

[63] [64]

Several Top-5 papers are both must_read and relevant

[64] [65]

The user selects five papers from the list

[65] [66]

The ranking remains strong beyond static oracle labels. Top-20 Overview: Density of Relevant/Must-read Papers Rank Relevant/Must_read Rank Relevant/Must_read relevant/must_read relevant(not must_read) not relevant User Long-term Interests Mechanism: The success comes from aligned signals. The user's long-termdirections are NLP, LLMs, and information extra...

[66] [67]

The user initially concentrates on GUI agents and web automation

[67] [68]

The observing state accumulates repeated new-topic hits

[68] [69]

The signal eventually crosses the drift threshold. New-Topic Hits over Time New-topic hits (per observation) Drift threshold threshold(30) Time (observations) Update Mechanism: After the threshold is crossed, the system locks multimodal reasoning as the anchor topic. It then increases the ranking weight of papers related to multimodal interaction and reas...

[69] [70]

Some papers may be only weak_relevant under the oracle

[70] [71]

The user may still click or read them because similar topicswere selected recently

[71] [72]

Interpretation: The case illustrates that behavioral consistency is not identical to static relevance

If the system ranks these papers early, SelectedNDCG@20 increases. Interpretation: The case illustrates that behavioral consistency is not identical to static relevance. Selected papers are not always those with the highest oracle gain, but they may better match the user's recent reading trajectory. Longitudinal scientific recommendation should therefore ...