OneReason Technical Report

An Zhang; Biao Yang; Boyang Ding; Changxin Lao; Chaoyi Ma; Chenglong Chu; Chengru Song; Defu Lian; Dunju Zang; Fan Yang

arxiv: 2606.06260 · v1 · pith:LNWMZL64new · submitted 2026-06-04 · 💻 cs.IR · cs.AI· cs.CL

OneReason Technical Report

OneRec Team , Biao Yang , Boyang Ding , Chenglong Chu , Dunju Zang , Fei Pan , Han Li , Hao Jiang

show 76 more authors

Honghui Bao Huanjie Wang Jian Liang Jiangxia Cao Jiao Ou Jiaxin Deng Jinghao Zhang Kun Gai Lu Ren Peiru Du Pengfei Zheng Rongzhou Zhang Ruiming Tang Shiyao Wang Siyang Mao Siyuan Lou Teng Shi Wei Yuan Wenlong Xu Xingchen Liu Xingmei Wang Xinqi Jin Yan Sun Yan Wang Yifei Hu Yingzhi He Yufei Ye Yuhao Wang Yunhao Zhou Yuqin Dai Zhao Liu Zhipeng Wei Zhixin Ling Ziming Li Zixing Zhang Ziyuan Liu An Zhang Changxin Lao Chaoyi Ma Chengru Song Defu Lian Fan Yang Guowang Zhang Hao Peng Jiayao Shen Jie Chen Jun Xu Junmin Chen Kun Zhang Kuo Cai Mingxing Wen Minmao Wang Minxuan Lv Qi Zhang Qiang Luo Sheng Yu Shijie Li Shijie Yi Shuang Yang Shugui Liu Shuni Chen Tinghai Zhang Tingting Gao Xiang Wang Xiangyu Wu Xiangyu Zhao Xiao Lv Xiaoyou Zhou Xuming Wang Yong Du Zejian Zhang Zhaojie Liu Zhiyang Zhang Zhuang Zhuang Ziqi Wang Ziyi Zhao

This is my paper

classification 💻 cs.IR cs.AIcs.CL

keywords abilityrecommendationgenerativeitemicmodelsreasoninglanguagemode

0 comments

read the original abstract

Generative recommendation models in the OneRec family have been widely deployed in many real-world services, such as short-video, live-streaming, advertising, and e-commerce. However, these generative models can only benefit from the scaling advantage, while their reasoning ability is hard to activate, since we cannot construct meaningful Chain-of-Thought (CoT) sequences consisting of itemic tokens only. Inspired by the success of the reasoning-style ``think before answer'' paradigm in the LLM field, we conduct preliminary studies (i.e., OneRec-Think, OpenOneRec) to explore reasoning capability in generative recommendation. Nevertheless, we notice an unexpected phenomenon: the thinking mode does not show advantages over the non-thinking mode. Drawing insights from recent findings on CoT robustness in multi-modal language models, we argue that effective reasoning in recommendation rests on two factors: perception, the ability to ground itemic tokens in their underlying language semantics, and cognition, the ability to reorganize a user's behavior sequence into coherent latent interest points. We therefore propose OneReason, which includes: (1) strong itemic token perception in pre-training, (2) a three-level cognition-enhanced CoT format for recommendation tasks in SFT, and (3) a specialize-then-unify training recipe in RL to enhance the thinking ability.

This paper has not been read by Pith yet.

OneReason Technical Report

discussion (0)