arxiv: 2605.11553 · v1 · submitted 2026-05-12 · 💻 cs.IR

Recognition: 2 theorem links

· Lean Theorem

TwiSTAR:Think Fast, Think Slow, Then Act,Generative Recommendation with Adaptive Reasoning

Shiteng Cao , Kaian Jiang , Yunlong Gong , Zhiheng Li

Authors on Pith no claims yet

Pith reviewed 2026-05-13 01:36 UTC · model grok-4.3

classification 💻 cs.IR

keywords generative recommendationadaptive reasoningsemantic IDschain-of-thoughtreinforcement learningplannercommonsense explanations

0 comments

The pith

A planner learns to invoke slow reasoning only for hard user histories in generative recommendation, raising accuracy while lowering latency.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative recommenders that rely on Semantic IDs currently apply either fast direct generation or full chain-of-thought reasoning to every user sequence, creating a fixed trade-off between speed and quality. The paper introduces an adaptive system that equips an LLM with three tools—a fast retriever, a lightweight ranker, and a slow model that first produces explicit item-to-item commonsense explanations—and trains a planner to choose the right tool per sequence. The planner is first warmed up with supervision and then refined through agentic reinforcement learning. Experiments across three datasets show the approach beats uniform fast and uniform slow baselines by delivering higher accuracy at reduced average inference cost. A sympathetic reader would care because it points toward practical LLM-based systems that spend compute where it actually improves outcomes rather than everywhere.

Core claim

The central claim is that equipping a generative recommender with a fast SID retriever, a candidate ranker, and a slow reasoning model that converts collaborative item-to-item knowledge into natural-language rationales, then training a planner via supervised warm-up and agentic RL to decide which tool to call for each user sequence, produces both higher accuracy and lower latency than any single fixed strategy applied uniformly.

What carries the argument

The planner, trained through supervised warm-up followed by agentic reinforcement learning, that dynamically selects among the fast SID-based retriever, lightweight ranker, or slow reasoning model with injected commonsense explanations.

Load-bearing premise

The planner can reliably detect which user sequences benefit from slow reasoning and that the injected item-to-item commonsense explanations remain useful across different datasets.

What would settle it

An evaluation on the same three datasets in which the adaptive planner produces no accuracy gain over the best fixed-strategy baseline or fails to reduce average latency would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.11553 by Kaian Jiang, Shiteng Cao, Yunlong Gong, Zhiheng Li.

**Figure 2.** Figure 2: Our framework first extracts SIDs from item metadata and aligns them with text em [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Recall@10, NDCG@10 and relative inference cost (normalized to the full model) of [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

read the original abstract

Generative recommendation with Semantic IDs (SIDs) has emerged as a promising paradigm, yet existing methods apply a fixed inference strategy, either fast direct generation or slow chain-of-thought reasoning, uniformly across all user histories. This approach creates a trade-off: fast recommendation model produces suboptimal accuracy on hard samples, while always invoking slow reasoning incurs prohibitive latency and wastes computation on easy cases. To address this, we propose Think Fast, Think Slow, Then Act, a framework that learns to adaptively allocate reasoning effort per user sequence. Our system equips an LLM with three complementary tools: a fast SID-based retriever, a lightweight candidate ranker, and a slow reasoning model that generates explicit rationales before recommending. Crucially, we inject collaborative commonsense into the slow model by transforming item-to-item knowledge into natural language explanations. A planner, trained through supervised warm-up followed by agentic reinforcement learning, dynamically decides which tool to invoke. Experiments on three datasets demonstrate that our method outperforms strong baselines, achieving consistent accuracy gains while reducing inference latency compared to uniform slow reasoning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TwiSTAR's adaptive planner for mixing fast and slow paths in generative rec is a practical idea, but the abstract gives no ablations or routing stats to show the adaptation actually matters.

read the letter

The main thing here is that the paper sets up a system to decide per user sequence whether to use a fast SID retriever or a slow reasoning model with explicit rationales, plus some injected commonsense. That framing of the uniform-strategy problem is useful, and the planner trained first with supervised warm-up then agentic RL is a concrete mechanism for the allocation. Turning item-to-item signals into natural-language explanations for the slow path is a straightforward way to ground the reasoning without extra parameters. The three-tool setup keeps the design modular and easy to follow on paper. Those pieces together give a workable response to the latency-accuracy tension in generative recommendation. The abstract claims consistent gains over baselines on three datasets with lower latency than always-slow, which at least shows the framework can be implemented and run. The approach is distinct enough from the fixed fast or fixed slow baselines cited that it adds a new option for people building these systems. The soft spots are in the missing checks on the central claims. There are no numbers, no error bars, no statistical tests, and no per-dataset breakdowns. More critically, the abstract supplies no planner decision statistics, no oracle comparison for routing accuracy, and no ablations that disable the planner or the commonsense injection. Without those, it is impossible to tell whether the accuracy edge comes from smart per-sequence allocation or simply from having all three tools available at once. The latency reduction is mechanically expected once any fast path is added, so that part does not need much proof. This leaves the adaptive part as an unverified assumption rather than a demonstrated result. The work is aimed at researchers in recommender systems who are already using LLMs and semantic IDs and who need to control inference cost in practice. A reader who wants design patterns for tool use and staged training in recsys would get concrete ideas even if the results require more scrutiny. It has enough of a distinct mechanism and a real applied problem to deserve a serious referee, provided the full paper adds the routing diagnostics and component ablations. I would send it to peer review with the expectation that revisions focus on validating what the planner actually contributes and whether the commonsense explanations hold up across datasets.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TwiSTAR, a generative recommendation system using Semantic IDs (SIDs) that equips an LLM with three tools—a fast SID retriever, a lightweight ranker, and a slow reasoning model augmented with natural-language item-to-item commonsense explanations. A planner, trained first by supervised warm-up and then by agentic reinforcement learning, adaptively selects which tool to invoke per user sequence. The central empirical claim is that this adaptive allocation yields consistent accuracy gains over strong baselines while reducing inference latency relative to always invoking the slow reasoning path, demonstrated on three datasets.

Significance. If the planner reliably routes hard sequences to slow reasoning and the commonsense explanations measurably improve the slow path, the framework offers a practical way to resolve the accuracy–latency trade-off in LLM-based generative recommenders. The combination of tool use, commonsense injection, and agentic RL for routing is a coherent extension of recent work on adaptive inference. However, the absence of direct validation for the planner’s decisions and the contribution of the injected explanations prevents a clear assessment of whether the reported gains are attributable to the adaptive mechanism itself.

major comments (3)

[Abstract / Experiments] Abstract and Experiments section: the claim of 'consistent accuracy gains' and 'reducing inference latency' is presented without any numerical results, error bars, statistical tests, per-dataset breakdowns, or description of how the planner was evaluated. This omission makes the central empirical claim unverifiable from the provided text.
[Experiments] Experiments section: no ablation is reported that disables the planner (e.g., uniform slow reasoning, random routing, or always-fast) or removes the commonsense injection. Without these controls it is impossible to determine whether accuracy improvements arise from adaptive allocation or simply from the presence of multiple tools.
[Method] Method / Planner subsection: the manuscript provides no planner decision statistics (percentage of sequences routed to slow reasoning, agreement with an oracle that knows when slow reasoning helps, or routing accuracy on held-out data). This leaves the weakest assumption—that the planner reliably identifies sequences requiring slow reasoning—unsupported by direct evidence.

minor comments (2)

[Title] The title contains a missing space after the colon ('TwiSTAR:Think Fast').
[Abstract] The abstract refers to 'three complementary tools' but does not explicitly list the lightweight candidate ranker in the final sentence; a brief clarification would improve readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on TwiSTAR. The comments highlight important areas for strengthening the empirical validation of our adaptive reasoning framework. We address each major comment below and will incorporate revisions to improve clarity and evidence.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: the claim of 'consistent accuracy gains' and 'reducing inference latency' is presented without any numerical results, error bars, statistical tests, per-dataset breakdowns, or description of how the planner was evaluated. This omission makes the central empirical claim unverifiable from the provided text.

Authors: We agree that the abstract presents high-level claims. The Experiments section reports per-dataset accuracy and latency results across three datasets, but lacks explicit error bars, statistical tests, and a detailed planner evaluation description in the main text. In revision, we will update the abstract with key numerical highlights (e.g., relative gains), add error bars and significance tests to experimental tables, include per-dataset breakdowns with planner routing details, and expand the planner evaluation description to make all claims directly verifiable. revision: yes
Referee: [Experiments] Experiments section: no ablation is reported that disables the planner (e.g., uniform slow reasoning, random routing, or always-fast) or removes the commonsense injection. Without these controls it is impossible to determine whether accuracy improvements arise from adaptive allocation or simply from the presence of multiple tools.

Authors: This is a valid concern. Our experiments compare against fixed fast and slow baselines, but do not include explicit ablations for random routing or commonsense removal. We will add these controls in the revised Experiments section: always-fast, uniform slow reasoning, random tool selection, and slow reasoning without commonsense explanations. These will isolate the planner's adaptive contribution and the value of injected explanations. revision: yes
Referee: [Method] Method / Planner subsection: the manuscript provides no planner decision statistics (percentage of sequences routed to slow reasoning, agreement with an oracle that knows when slow reasoning helps, or routing accuracy on held-out data). This leaves the weakest assumption—that the planner reliably identifies sequences requiring slow reasoning—unsupported by direct evidence.

Authors: We acknowledge that direct planner diagnostics are missing. The manuscript emphasizes end-to-end results but omits routing statistics. In revision, we will add a dedicated analysis with the percentage of sequences routed to slow reasoning, correlation with sequence difficulty, and routing accuracy or oracle agreement metrics on held-out data where available. This will provide direct evidence supporting the planner's decisions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical framework with no derivations or self-referential reductions

full rationale

The paper describes an engineering framework for adaptive tool use in generative recommendation (fast SID retriever, ranker, slow reasoning model with injected commonsense) and a planner trained via supervised warm-up plus agentic RL. No equations, first-principles derivations, or closed-form predictions appear in the provided text. Central claims rest on empirical comparisons across three datasets rather than any quantity that reduces to its own fitted inputs by construction. No self-citations are invoked to justify uniqueness theorems, ansatzes, or load-bearing premises. The planner's routing behavior and commonsense injection are presented as design choices whose value is asserted via experiment, not via definitional equivalence. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The abstract implies several unstated modeling choices: that Semantic IDs are already effective, that item-to-item knowledge can be reliably turned into natural-language explanations, and that the RL reward signal for the planner is well-defined. No explicit free parameters or invented entities are named.

axioms (2)

domain assumption LLMs can usefully generate explicit rationales for item recommendations when given collaborative patterns in natural language
Invoked when describing the slow reasoning model and commonsense injection
domain assumption A planner trained with supervised warm-up plus agentic RL can learn to allocate fast versus slow tools without excessive overhead
Central to the adaptive mechanism

pith-pipeline@v0.9.0 · 5496 in / 1405 out tokens · 31192 ms · 2026-05-13T01:36:44.050467+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

A planner, trained through supervised warm-up followed by agentic reinforcement learning, dynamically decides which tool to invoke... Experiments on three datasets demonstrate that our method outperforms strong baselines, achieving consistent accuracy gains while reducing inference latency compared to uniform slow reasoning.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we inject collaborative commonsense into the slow model by transforming item-to-item knowledge into natural language explanations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

39 extracted references · 39 canonical work pages · 6 internal anchors

[1]

LLaRA: Large language-recommendation assistant

Jiayi Liao, Sihang Li, Zhengyi Yang, Jiancan Wu, Yancheng Yuan, Xiang Wang, and Xi- angnan He. LLaRA: Large language-recommendation assistant. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, page 1785–1795, New York, NY , USA, 2024. Association for Com- puting Machinery. ISBN 97...

work page doi:10.1145/3626772.3657690 2024
[2]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models, 2023. URLhttps://arxiv.org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023
[3]

OneRec- Think: In-text reasoning for generative recommendation, 2025

Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai, and Guorui Zhou. OneRec- Think: In-text reasonin...

work page arXiv 2025
[4]

Generative reasoning recommendation via LLMs, 2025

Minjie Hong, Zetong Zhou, Zirun Guo, Ziang Zhang, Ruofan Hu, Weinan Gan, Jieming Zhu, and Zhou Zhao. Generative reasoning recommendation via LLMs, 2025. URL https: //arxiv.org/abs/2510.20815

work page arXiv 2025
[5]

OxygenREC: An instruction-following generative framework for e-commerce recommendation

Xuegang Hao, Ming Zhang, Alex Li, Xiangyu Qian, Zhi Ma, Yanlong Zang, Shijie Yang, Zhongxuan Han, Xiaolong Ma, Jinguang Liu, Zhen Li, Zhida Jiang, Shusheng Wang, Ning Tang, Yanchen Qiao, Chenxiang Yang, Chen Sun, Jincheng Yuan, Chunhua Peng, Heng Hu, Peijun Yang, Baopeng Yuan, Caiyun Qiu, Zhaolong Xing, Haofei Yuan, Haipeng Zhang, Yuzhang Guo, Weijie Ding...

work page arXiv 2025
[6]

Qwen3 Technical Report

An Yang et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Siavash Ameli, Siyuan Zhuang, Ion Stoica, and Michael W

Marah Abdin, Sahaj Agarwal, Ahmed Awadallah, Vidhisha Balachandran, Harkirat Behl, Lingjiao Chen, Gustavo de Rosa, Suriya Gunasekar, Mojan Javaheripi, Neel Joshi, Piero Kauffmann, Yash Lara, Caio César Teodoro Mendes, Arindam Mitra, Besmira Nushi, Dim- itris Papailiopoulos, Olli Saarikivi, Shital Shah, Vaishnavi Shrivastava, Vibhav Vineet, Yue Wu, Safoora...

work page arXiv 2025
[8]

Claude 3.7 Sonnet and Claude Code

Anthropic. Claude 3.7 Sonnet and Claude Code. https://www.anthropic.com/news/ claude-3-7-sonnet, 2025. Accessed: 2026-04-29

work page 2025
[9]

CoRR , volume =

Zeyu Cui, Jianxin Ma, Chang Zhou, Jingren Zhou, and Hongxia Yang. M6-Rec: Generative pretrained language models are open-ended recommender systems, 2022. URL https:// arxiv.org/abs/2205.08084

work page arXiv 2022
[10]

Tallrec: An effective and efficient tuning framework to align large language model with recommendation

Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He. TALLRec: An effective and efficient tuning framework to align large language model with recommen- dation. InProceedings of the 17th ACM Conference on Recommender Systems, RecSys ’23, page 1007–1014, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9798400702419...

work page doi:10.1145/3604915.3608857 2023
[11]

LLMTreeRec: Unleashing the power of large language models for cold-start recommendations, 2024

Wenlin Zhang, Chuhan Wu, Xiangyang Li, Yuhao Wang, Kuicai Dong, Yichao Wang, Xinyi Dai, Xiangyu Zhao, Huifeng Guo, and Ruiming Tang. LLMTreeRec: Unleashing the power of large language models for cold-start recommendations, 2024. URL https://arxiv.org/ abs/2404.00702

work page arXiv 2024
[12]

A bi-step grounding paradigm for large language models in recommendation systems.ACM Trans

Keqin Bao, Jizhi Zhang, Wenjie Wang, Yang Zhang, Zhengyi Yang, Yanchen Luo, Chong Chen, Fuli Feng, and Qi Tian. A bi-step grounding paradigm for large language models in recommendation systems.ACM Trans. Recomm. Syst., 3(4), April 2025. doi: 10.1145/3716393. URLhttps://doi.org/10.1145/3716393. 10

work page doi:10.1145/3716393 2025
[13]

On softmax direct preference optimization for recommendation

Yuxin Chen, Junfei Tan, An Zhang, Zhengyi Yang, Leheng Sheng, Enzhi Zhang, Xiang Wang, and Tat-Seng Chua. On softmax direct preference optimization for recommendation. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NIPS ’24, Red Hook, NY , USA, 2024. Curran Associates Inc. ISBN 9798331314385

work page 2024
[14]

InteraRec: Interactive recommendations us- ing multimodal large language models

Saketh Reddy Karra and Theja Tulabandhula. InteraRec: Interactive recommendations us- ing multimodal large language models. InTrends and Applications in Knowledge Discov- ery and Data Mining: PAKDD 2024 Workshops, RAFDA and IWTA, Taipei, Taiwan, May 7–10, 2024, Proceedings, page 32–43, Berlin, Heidelberg, 2024. Springer-Verlag. ISBN 978- 981-97-2649-3. do...

work page doi:10.1007/978-981-97-2650-9_3 2024
[15]

Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5),

Shijie Geng, Shuchang Liu, Zuohui Fu, Yingqiang Ge, and Yongfeng Zhang. Recommendation as language processing (rlp): A unified pretrain, personalized prompt & predict paradigm (p5),

work page
[16]

URLhttps://arxiv.org/abs/2203.13366

work page arXiv
[17]

arXiv:2308.10837 , year=

Zhixuan Chu, Hongyan Hao, Xin Ouyang, Simeng Wang, Yan Wang, Yue Shen, Jinjie Gu, Qing Cui, Longfei Li, Siqiao Xue, James Y Zhang, and Sheng Li. Leveraging large language models for pre-trained recommender systems, 2023. URLhttps://arxiv.org/abs/2308.10837

work page arXiv 2023
[18]

Adapting large language models by integrating collaborative semantics for recommen- dation

Bowen Zheng, Yupeng Hou, Hongyu Lu, Yu Chen, Wayne Xin Zhao, Ming Chen, and Ji-Rong Wen. Adapting large language models by integrating collaborative semantics for recommen- dation. In2024 IEEE 40th International Conference on Data Engineering (ICDE), pages 1435–1448, 2024. doi: 10.1109/ICDE60146.2024.00118

work page doi:10.1109/icde60146.2024.00118 2024
[19]

PLUM: Adapting pre-trained language models for industrial-scale generative recommendations

Ruining He, Lukasz Heldt, Lichan Hong, Raghunandan Keshavan, Shifan Mao, Nikhil Mehta, Zhengyang Su, Alicia Tsai, Yueqi Wang, Shao-Chuan Wang, Xinyang Yi, Lexi Baugher, Baykal Cakici, Ed Chi, Cristos Goodrow, Ningren Han, He Ma, Romer Rosales, Abby Van Soest, Devansh Tandon, Su-Lin Wu, Weilong Yang, and Yilin Zheng. PLUM: Adapting pre-trained language mod...

work page doi:10.1145/3774904.3792802 2026
[20]

Fine-grained semantics integration for large language model-based recommendation, 2026

Jiawei Feng, Xiaoyu Kong, Leheng Sheng, Bin Wu, Chao Yi, Feifang Yang, Xiang-Rong Sheng, Han Zhu, Xiang Wang, Jiancan Wu, and Xiangnan He. Fine-grained semantics integration for large language model-based recommendation, 2026. URL https://arxiv.org/abs/2602. 22632

work page 2026
[21]

Reasoning over semantic IDs enhances generative recommenda- tion, 2026

Yingzhi He, Yan Sun, Junfei Tan, Yuxin Chen, Xiaoyu Kong, Chunxu Shen, Xiang Wang, An Zhang, and Tat-Seng Chua. Reasoning over semantic IDs enhances generative recommenda- tion, 2026. URLhttps://arxiv.org/abs/2603.23183

work page arXiv 2026
[22]

arXiv:2411.13789 [cs.IR] https://arxiv.org/ abs/2411.13789

Fengxin Li, Yi Li, Yue Liu, Chao Zhou, Yuan Wang, Xiaoxiang Deng, Wei Xue, Dapeng Liu, Lei Xiao, Haijie Gu, Jie Jiang, Hongyan Liu, Biao Qin, and Jun He. Leadre: Multi-faceted knowledge enhanced llm empowered display advertisement recommender system, 2025. URL https://arxiv.org/abs/2411.13789

work page arXiv 2025
[23]

Thomas, Alexandra Ranieri, Matthew N

Edoardo D’Amico, Marco De Nadai, Praveen Chandar, Divita V ohra, Shawn Lin, Max Lefarov, Paul Gigioli, Gustavo Penha, Ilya Kopysitsky, Ivo Joel Senese, Darren Mei, Francesco Fab- bri, Oguz Semerci, Yu Zhao, Vincent Tang, Brian St. Thomas, Alexandra Ranieri, Matthew N. K. Smith, Aaron Bernkopf, Bryan Leung, Ghazal Fazelnia, Mark VanMiddlesworth, Timo- thy ...

work page arXiv 2026
[24]

Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, 11 Andreas Damianou, Vladan Radosavljevic, Paul N

Marco De Nadai, Edoardo D’Amico, Max Lefarov, Alexandre Tamborrino, Divita V ohra, Mark VanMiddlesworth, Shawn Lin, Jacqueline Wood, Jan Stypka, Eliza Klyce, Keshi Dai, Timothy Christopher Heath, Martin D. Gould, Yves Raimond, Sandeep Ghael, Tony Jebara, 11 Andreas Damianou, Vladan Radosavljevic, Paul N. Bennett, Mounia Lalmas, and Praveen Chandar. A unif...

work page arXiv 2026
[25]

Generative reasoning re-ranker, 2026

Mingfu Liang, Yufei Li, Jay Xu, Kavosh Asadi, Xi Liu, Shuo Gu, Kaushik Rangadurai, Frank Shyu, Shuaiwen Wang, Song Yang, Zhijing Li, Jiang Liu, Mengying Sun, Fei Tian, Xiaohan Wei, Chonglin Sun, Jacob Tao, Shike Mei, Wenlin Chen, Santanu Kolay, Sandeep Pandey, Hamed Firooz, and Luke Simon. Generative reasoning re-ranker, 2026. URL https: //arxiv.org/abs/2...

work page arXiv 2026
[26]

Minh-Duc Nguyen, Hai-Dang Kieu, and Dung D. Le. AMEM4Rec: Leveraging cross-user sim- ilarity for memory evolution in agentic LLM recommenders.arXiv preprint arXiv:2602.08837, 2026

work page arXiv 2026
[27]

RecGPT-V2 technical report.arXiv preprint arXiv:2512.14503, 2025

Chao Yi, Dian Chen, Gaoyang Guo, Jiakai Tang, Jian Wu, Jing Yu, Mao Zhang, Wen Chen, Wenjun Yang, Yujie Luo, Yuning Jiang, Zhujin Gao, Bo Zheng, Binbin Cao, Changfa Wu, Dixuan Wang, Han Wu, Haoyi Hu, Kewei Zhu, Lang Tian, Lin Yang, Qiqi Huang, Siqi Yang, Wenbo Su, Xiaoxiao He, Xin Tong, Xu Chen, Xunke Xi, Xiaowei Huang, Yaxuan Wu, Yeqiu Yang, Yi Hu, Yujin...

work page arXiv 2025
[28]

Recbot: Agent-based recommendation system.arXiv preprint arXiv:2509.21317,

Jiakai Tang, Yujie Luo, Xunke Xi, Fei Sun, Xueyang Feng, Sunhao Dai, Chao Yi, Dian Chen, Zhujin Gao, Yang Li, Xu Chen, Wen Chen, Jian Wu, Yuning Jiang, and Bo Zheng. Interactive recommendation agent with active user commands.arXiv preprint arXiv:2509.21317, 2025

work page arXiv 2025
[29]

TalkPlay-Tools: Conversational music recommendation with LLM tool calling.arXiv preprint arXiv:2510.01698, 2025

Seungheon Doh, Keunwoo Choi, and Juhan Nam. TalkPlay-Tools: Conversational music recommendation with LLM tool calling.arXiv preprint arXiv:2510.01698, 2025

work page arXiv 2025
[30]

Deep interest network for click-through rate prediction

Guorui Zhou, Chengru Song, Xiaoqiang Zhu, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai. Deep interest network for click-through rate prediction. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’18, pages 1059–1068, New York, NY , USA, 2018. Association for Computing Machi...

work page doi:10.1145/3219819.3219823 2018
[31]

Qwen3.5: Towards native multimodal agents, February 2026

Qwen Team. Qwen3.5: Towards native multimodal agents, February 2026. URL https: //qwen.ai/blog?id=qwen3.5

work page 2026
[32]

Image-based recommendations on styles and substitutes, 2015

Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. Image-based recommendations on styles and substitutes, 2015. URL https://arxiv.org/abs/1506. 04757

work page 2015
[33]

Hierarchical gating networks for sequential recommenda- tion, 2019

Chen Ma, Peng Kang, and Xue Liu. Hierarchical gating networks for sequential recommenda- tion, 2019. URLhttps://arxiv.org/abs/1906.09217

work page arXiv 2019
[34]

Session-based Recommendations with Recurrent Neural Networks

Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk. Session-based recommendations with recurrent neural networks, 2016. URL https://arxiv.org/abs/ 1511.06939

work page internal anchor Pith review Pith/arXiv arXiv 2016
[35]

Self-attentive sequential recommendation, 2018

Wang-Cheng Kang and Julian McAuley. Self-attentive sequential recommendation, 2018. URL https://arxiv.org/abs/1808.09781

work page arXiv 2018
[36]

Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan H. Keshavan, Trung Vu, Lukasz Heldt, Lichan Hong, Yi Tay, Vinh Q. Tran, Jonah Samost, Maciej Kula, Ed H. Chi, and Maheswaran Sathiamoorthy. Recommender systems with generative retrieval, 2023. URL https://arxiv.org/abs/2305.05065

work page arXiv 2023
[37]

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, Xuan Cao, Leon Gao, Zhaojie Gong, Fangda Gu, Michael He, Yinghai Lu, and Yu Shi. Actions speak louder than words: Trillion-parameter sequential transducers for generative recommendations, 2024. URL https: //arxiv.org/abs/2402.17152

work page internal anchor Pith review arXiv 2024
[38]

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019. URL https://arxiv.org/ abs/1810.04805. 12

work page internal anchor Pith review Pith/arXiv arXiv 2019
[39]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y . K. Li, Y . Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv.org/abs/ 2402.03300. 13 A Detailed Reward Design for GRPO Training We describe the reward function used to train t...

work page internal anchor Pith review Pith/arXiv arXiv 2024