pith. machine review for the scientific record. sign in

arxiv: 2603.24422 · v2 · submitted 2026-03-25 · 💻 cs.IR · cs.AI· cs.CL

Recognition: 2 theorem links

· Lean Theorem

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:16 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL
keywords generative retrievalself-distillatione-commerce searchlatent reasoningquery understandingpreference alignmentinformation bubbleslong-tail sparsity
0
0 comments X

The pith

OneSearch-V2 adds latent reasoning and self-distillation to generative search to capture deeper user intents in e-commerce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes OneSearch-V2 to fix three limits in existing generative retrieval systems: shallow handling of complex queries, failure to extract latent intents from logs, and overfitting to narrow past preferences. It introduces three modules that together enable deeper query understanding through thought augmentation, internalize reasoning via self-distillation during training, and align outputs with direct user feedback to avoid reward hacking. These changes produce measurable lifts in click-through and order rates during live A/B tests while keeping inference costs flat. The gains also reduce common problems such as information bubbles and poor coverage of long-tail items.

Core claim

OneSearch-V2 is a generative search framework that integrates a thought-augmented complex query understanding module, a reasoning-internalized self-distillation training pipeline, and a behavior preference alignment optimization system. The modules overcome shallow semantic matching, uncover precise yet unlogged e-commerce intentions through implicit in-context learning, and mitigate single-metric reward hacking by incorporating direct user feedback. Offline evaluations show stronger query recognition and user profiling; online A/B tests record +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume, with added improvements in page good rate and relevance and no increase in serving-lat

What carries the argument

The reasoning-internalized self-distillation training pipeline that uses implicit in-context learning to surface latent user intentions beyond direct log fitting.

If this is right

  • Item CTR rises by 3.98 percent in live traffic.
  • Buyer and order volumes increase by roughly 2 percent each.
  • Information bubbles shrink and long-tail item coverage improves.
  • Search experience metrics such as page good rate and query-item relevance rise without added latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same self-distillation pattern could transfer to non-e-commerce search or recommendation tasks where intent signals are similarly sparse.
  • Because inference cost stays flat, the method scales to high-traffic platforms where latency budgets are tight.
  • Explicit alignment to user feedback may reduce the need for heavy post-hoc reranking stages in future systems.

Load-bearing premise

The measured gains in CTR and volume are produced by the three new modules rather than by unstated changes in data collection or serving stack.

What would settle it

An ablation experiment that removes each module in turn and re-runs the same online A/B test, showing whether CTR and order-volume lifts disappear when any one module is absent.

Figures

Figures reproduced from arXiv: 2603.24422 by Ben Chen, Bochao Liu, Chenyi Lei, Han Li, Huangyu Dai, Hui Kong, Jing Chen, Jingshan Lv, Kun Gai, Lingtao Mao, Siyuan Wang, Tong Zhao, Wenwu Ou, Xiao Liang, Xinyu Sun, Xuxin Zhang, Yang Zhao, Ying Yang, Yue Lv, Yufei Ma, Zhipeng Qian, Zhixin Zhai, Zihan Liang.

Figure 1
Figure 1. Figure 1: OneSearch-V2 vs. V1. OneSearch-V2 extends the [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The Overall Framework of OneSearch V2. It contains (a) a thought-augmented complex query understanding module, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Three-step keyword-based CoT extraction pipeline [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The sid rate of the proposed innovations with One [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: The online CTR relative gains for top/middle/tail [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The CTR relative gains for various user/query/items. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
read the original abstract

Generative Retrieval (GR) has emerged as a promising paradigm for modern search systems. Compared to multi-stage cascaded architecture, it offers advantages such as end-to-end joint optimization and high computational efficiency. OneSearch, as a representative industrial-scale deployed generative search framework, has brought significant commercial and operational benefits. However, its inadequate understanding of complex queries, inefficient exploitation of latent user intents, and overfitting to narrow historical preferences have limited its further performance improvement. To address these challenges, we propose OneSearch-V2, a latent reasoning enhanced self-distillation generative search framework. It contains three key innovations: (1) a thought-augmented complex query understanding module, which enables deep query understanding and overcomes the shallow semantic matching limitations of direct inference; (2) a reasoning-internalized self-distillation training pipeline, which uncovers users' potential yet precise e-commerce intentions beyond log-fitting through implicit in-context learning; (3) a behavior preference alignment optimization system, which mitigates reward hacking arising from the single conversion metric, and addresses personal preference via direct user feedback. Extensive offline evaluations demonstrate OneSearch-V2's strong query recognition and user profiling capabilities. Online A/B tests further validate its business effectiveness, yielding +3.98\% item CTR, +2.07\% buyer volume, and +2.11\% order volume. Manual evaluation further confirms gains in search experience quality, with +1.37\% in page good rate and +1.65\% in query-item relevance. More importantly, OneSearch-V2 effectively mitigates common search system issues such as information bubbles and long-tail sparsity, without incurring additional inference costs or serving latency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces OneSearch-V2, a latent reasoning enhanced self-distillation generative search framework for e-commerce that extends the prior OneSearch system. It proposes three innovations: (1) a thought-augmented complex query understanding module for deeper semantic processing, (2) a reasoning-internalized self-distillation training pipeline that uses implicit in-context learning to uncover latent user intents beyond log-fitting, and (3) a behavior preference alignment optimization system that incorporates direct user feedback to mitigate reward hacking from single-metric optimization. The paper claims these yield stronger query recognition and user profiling in offline evaluations, plus concrete online A/B test lifts of +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume, along with +1.37% page good rate and +1.65% query-item relevance in manual evaluations, while reducing information bubbles and long-tail sparsity without added inference latency.

Significance. If the reported gains can be rigorously attributed to the three modules, the work would offer a meaningful industrial contribution to generative retrieval by demonstrating scalable techniques for complex query handling and preference alignment in production search systems. The provision of online A/B results with specific business metrics (CTR, buyer/order volume) and manual quality assessments is a strength, as is the emphasis on zero additional serving cost.

major comments (2)
  1. [Abstract and evaluation sections] Abstract and evaluation sections: The headline performance claims (+3.98% item CTR, +2.07% buyer volume, +2.11% order volume) are presented as resulting from the three proposed modules, yet no ablation table, incremental rollout data, or explicit statement confirms that the control arm used identical training data, model size, hyperparameters, and serving stack. Without these controls the attribution remains unisolated and is load-bearing for the central claim that the innovations drive the observed deltas.
  2. [Behavior preference alignment optimization system (abstract and §3)] Behavior preference alignment optimization system (described in abstract and §3): The system is said to mitigate reward hacking via direct user feedback, but the manuscript provides no concrete loss formulation, alignment coefficients, or comparison against a single-conversion baseline, making it impossible to assess whether the reported mitigation of overfitting is technically substantiated or merely asserted.
minor comments (2)
  1. [Abstract] Abstract: The statement that 'extensive offline evaluations demonstrate OneSearch-V2's strong query recognition and user profiling capabilities' is unsupported by any quantitative metrics, tables, or figure references; a one-sentence summary of key offline results (e.g., NDCG or recall deltas) would improve readability.
  2. [Abstract and evaluation sections] The paper asserts mitigation of information bubbles and long-tail sparsity but does not describe the measurement methodology (e.g., diversity metrics or tail-item coverage) used in either offline or online evaluations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on attribution and technical substantiation. We address each major comment below, providing clarifications from the experimental setup and committing to revisions that strengthen the manuscript without altering the core claims.

read point-by-point responses
  1. Referee: [Abstract and evaluation sections] Abstract and evaluation sections: The headline performance claims (+3.98% item CTR, +2.07% buyer volume, +2.11% order volume) are presented as resulting from the three proposed modules, yet no ablation table, incremental rollout data, or explicit statement confirms that the control arm used identical training data, model size, hyperparameters, and serving stack. Without these controls the attribution remains unisolated and is load-bearing for the central claim that the innovations drive the observed deltas.

    Authors: We agree that explicit confirmation of control conditions is essential for rigorous attribution. The online A/B test deployed the complete OneSearch-V2 system against the prior OneSearch baseline within the identical production serving stack, using the same training corpus, model parameter count, hyperparameter settings, and inference infrastructure; the only differences were the three proposed modules. Production constraints limited simultaneous multi-arm rollouts, so incremental data per module is not available. To address this, we will revise the evaluation section to include an explicit statement confirming these identical conditions and add offline ablation results isolating each module's contribution to the reported metrics. revision: yes

  2. Referee: [Behavior preference alignment optimization system (abstract and §3)] Behavior preference alignment optimization system (described in abstract and §3): The system is said to mitigate reward hacking via direct user feedback, but the manuscript provides no concrete loss formulation, alignment coefficients, or comparison against a single-conversion baseline, making it impossible to assess whether the reported mitigation of overfitting is technically substantiated or merely asserted.

    Authors: We acknowledge the need for greater technical detail. Section 3.3 formulates the alignment loss as a weighted sum of the primary conversion objective and a direct-feedback alignment term, with coefficient λ set to 0.3; offline results in Table 3 already compare against single-metric baselines. To ensure the mitigation is fully substantiated, we will expand §3.3 with the exact loss equation, the chosen coefficient value, and an explicit statement of the single-conversion baseline comparison in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical A/B claims are independent of module definitions

full rationale

The paper describes an industrial generative search framework whose central claims rest on offline metrics and online A/B test lifts (+3.98% CTR, +2.07% buyer volume, +2.11% order volume) rather than any closed-form derivation. The three modules (thought-augmented understanding, reasoning-internalized self-distillation, behavior preference alignment) are presented as engineering innovations trained on user logs; their performance is measured externally via controlled experiments, not derived by construction from the same logs. No equations, uniqueness theorems, or self-citations are invoked to force the reported gains. The framework is therefore self-contained against external benchmarks and receives a zero circularity score.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard machine-learning training assumptions plus domain assumptions about e-commerce user behavior; no new physical entities are introduced.

free parameters (2)
  • distillation hyperparameters
    Temperature and loss weights in the self-distillation pipeline are tuned to fit user log data.
  • preference alignment coefficients
    Weights balancing conversion metric against direct feedback are chosen to mitigate reward hacking.
axioms (2)
  • domain assumption User interaction logs contain recoverable latent intents that self-distillation can surface via in-context learning
    Invoked in the description of the reasoning-internalized self-distillation training pipeline.
  • domain assumption Single conversion metric optimization leads to reward hacking that direct feedback can correct
    Stated as motivation for the behavior preference alignment optimization system.

pith-pipeline@v0.9.0 · 5680 in / 1468 out tokens · 51954 ms · 2026-05-15T07:16:49.897908+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization

    cs.LG 2026-05 unverdicted novelty 7.0

    PBSD derives a reward-reweighted teacher distribution as the analytic optimum of a reward-regularized objective, yielding better stability and performance than KL-based self-distillation on math reasoning and tool-use tasks.

  2. UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute

    cs.IR 2026-04 unverdicted novelty 6.0

    UniRec bridges the expressive gap in generative recommendation by prefixing semantic ID sequences with structured attribute tokens, recovering explicit feature crossing and yielding +22.6% HR@50 gains plus online lift...

  3. Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

    cs.AI 2026-04 unverdicted novelty 5.0

    Bian Que deploys an agentic system with flexible skills and self-evolution on a major e-commerce search engine, cutting alerts by 75%, reaching 80% root-cause accuracy, and halving resolution time.

  4. Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations

    cs.AI 2026-04 unverdicted novelty 5.0

    Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production d...

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 3 Pith papers · 13 internal anchors

  1. [1]

    Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Frontier Large Vision- Language Model with Versatile Abilities.arXiv preprint arXiv:2308.12966(2023)

  2. [2]

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, and et al. 2025. Qwen3-VL Technical Report.arXiv preprint arXiv:2511.21631(2025)

  3. [3]

    Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, et al. 2024. Graph of Thoughts: Solving Elaborate Problems with Large Language Models.Proceedings of the AAAI Conference on Artificial Intelligence38, 16 (Mar. 2024), 17682–17690. doi:10.1609/aaai.v38i16.29720

  4. [4]

    Language Models are Few-Shot Learners

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell et al. 2020. Language Models are Few-Shot Learners. https://arxiv.org/abs/2005.14165(2020). arXiv:2005.14165

  5. [5]

    Ben Chen, Xian Guo, Siyuan Wang, Zihan Liang, Yue Lv, Yufei Ma, Xinlong Xiao, and et al. 2025. OneSearch: A Preliminary Exploration of the Unified End-to- End Generative Framework for E-commerce Search. arXiv:2509.03236 [cs.IR] https://arxiv.org/abs/2509.03236

  6. [6]

    Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. OneRec: Unifying Retrieve and Rank with Genera- tive Recommender and Iterative Preference Alignment. arXiv:2502.18965 [cs.IR] https://arxiv.org/abs/2502.18965

  7. [7]

    Chenhe Dong, Shaowei Yao, Pengkun Jiao, Jianhui Yang, Yiming Jin, Zerui Huang, Xiaojiang Zhou, Dan Ou, Haihong Tang, and Bo Zheng. 2026. TaoSR1: The Think- ing Model for E-commerce Relevance Search.arXiv preprint arXiv:2508.12365 (2026)

  8. [8]

    Xian Guo, Ben Chen, Siyuan Wang, Ying Yang, Chenyi Lei, and et al. 2025. One- Sug: The Unified End-to-End Generative Framework for E-commerce Query Suggestion.CoRRabs/2506.06913 (2025). doi:10.48550/ARXIV.2506.06913 arXiv:2506.06913

  9. [9]

    Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, Yueming Han, MengLei Zhou, Lei Yu, Chuan Liu, and Wei Lin. 2025. MTGR: Industrial-Scale Generative Rec- ommendation Framework in Meituan. InProceedings of the 34th ACM Interna- tional Conference on Information and Knowledge Management (CI...

  10. [10]

    Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason E Weston, and Yuandong Tian. 2025. Training Large Language Models to Reason in a Continuous Latent Space. InSecond Conference on Language Modeling

  11. [11]

    Jonas Hübotter, Frederike Lübeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, and Andreas Krause. 2026. Reinforcement Learning via Self-Distillation. arXiv preprint arXiv:2601.20802(2026)

  12. [12]

    2021.OpenCLIP

    Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. 2021.OpenCLIP. doi:10.5281/zenodo.5143773 If you use this software, please cite it as below

  13. [13]

    Jian Jia, Jingtong Gao, Ben Xue, Junhao Wang, Qingpeng Cai, Quan Chen, Xi- angyu Zhao, Peng Jiang, and Kun Gai. 2025. From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval.arXiv preprint arXiv:2502.12448 (2025)

  14. [14]

    Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mo- hamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.arXiv preprint arXiv:1910.13461(2019)

  15. [15]

    Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, and et al. [n. d.]. A Survey of Generative Recommendation from a Tri-Decoupled Perspective: Tokenization, Architecture, and Optimization. Preprints([n. d.])

  16. [16]

    xiaobo liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, and Tie-Yan Liu. 2021. R-Drop: Regularized Dropout for Neural Networks. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 10890–10905

  17. [17]

    Zihan Liang, Yufei Ma, Zhipeng Qian, Huangyu Dai, Zihan Wang, Ben Chen, Chenyi Lei, Yuqing Ding, and Han Li. 2025. UniECS: Unified Multimodal E- Commerce Search Framework with Gated Cross-modal Fusion(CIKM ’25). New York, NY, USA, 1788–1797. doi:10.1145/3746252.3761170

  18. [18]

    Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal Loss for Dense Object Detection.arXiv preprint arXiv:1708.02002(2018). arXiv:1708.02002

  19. [19]

    Xinyu Lin, Chaoqun Yang, Wenjie Wang, Yongqi Li, Cunxiao Du, Fuli Feng, See-Kiong Ng, and Tat-Seng Chua. 2025. Efficient Inference for Large Language Model-based Generative Recommendation. InICLR

  20. [20]

    Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai, and Guorui Zhou. 2025. OneRec-Think: In-Text Rea...

  21. [21]

    Dai, and Ian Goodfellow

    Takeru Miyato, Andrew M. Dai, and Ian Goodfellow. 2021. Adversarial Training Methods for Semi-Supervised Text Classification.arXiv preprint arXiv:1605.07725 (2021). arXiv:1605.07725

  22. [22]

    Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Kesha- van, Trung Vu, and et al. 2023. Recommender Systems with Generative Re- trieval. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Asso- ciates, Inc., 10299–10315. https://proceedings.neurips...

  23. [24]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.arXiv preprint arXiv:2402.03300(2024). arXiv:2402.03300

  24. [25]

    Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, and Yulan He

  25. [26]

    CODI: Compressing Chain-of-Thought into Continuous Space via Self- Distillation.arXiv preprint arXiv:2502.21074(2025)

  26. [27]

    Idan Shenfeld, Mehul Damani, Jonas Hübotter, and Pulkit Agrawal. 2026. Self- Distillation Enables Continual Learning.arXiv preprint arXiv:2601.19897(2026)

  27. [28]

    Yuhui Sun, Xiyao Wang, Zixi Li, YiTian Ding, Tianyang Ling, Jialuo Chen, Tianyi Yu, Zhenlong Yuan, and Jinman Zhao. 2026. Listwise Direct Preference Optimiza- tion with Multi-Dimensional Preference Mixing.arXiv preprint arXiv:2506.19780 (2026). arXiv:2506.19780

  28. [29]

    Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, and Yongfeng Zhang. 2024. IDGenRec: LLM-RecSys Alignment with Textual ID Learning. In Proceedings of the 47th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 355–364....

  29. [30]

    Jianting Tang, Dongshuai Li, Tao Wen, Fuyu Lv, Dan Ou, and Linli Xu. 2025. Large Reasoning Embedding Models: Towards Next-Generation Dense Retrieval Paradigm.arXiv preprint arXiv:2510.14321(2025)

  30. [31]

    Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transform- ers & distillation through attention.https://arxiv.org/abs/2012.12877(2021). arXiv:2012.12877

  31. [32]

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, and et al

  32. [33]

    InAdvances in Neural Information Processing Systems, S

    Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/ 9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf

  33. [34]

    Zhipeng Wei, Kuo Cai, Junda She, Jie Chen, Minghao Chen, and et al. 2025. OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service. (2025). arXiv:2508.14646 [cs.IR] https://arxiv.org/abs/2508.14646

  34. [35]

    Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, and Jian- Yun Nie. 2024. C-Pack: Packed Resources For General Chinese Embeddings. In Proceedings of the 47th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, 641–649

  35. [37]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, and et al. 2025. Qwen3 Technical Report.arXiv preprint arXiv:2505.09388 (2025)

  36. [38]

    Chaoqun Yang, Xinyu Lin, Wenjie Wang, Yongqi Li, Teng Sun, Xianjing Han, and Tat-Seng Chua. 2025. EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2(Toronto ON, Canada)(KDD ’25). Association for Computing Machinery, Ne...

  37. [39]

    Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. InAdvances in Neu- ral Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 11809–11822. h...

  38. [40]

    Xinhao Yao, Ruifeng Ren, Yun Liao, Lizhong Ding, and Yong Liu. 2026. Com- positional Generalization from Learned Skills via CoT Training: A Theoretical Ben Chen et al. and Structural Analysis for Reasoning.arXiv preprint arXiv:2502.04667(2026). arXiv:2502.04667

  39. [41]

    Runyang You, Yongqi Li, Xinyu Lin, Xin Zhang, Wenjie Wang, Wenjie Li, and Liqiang Nie. 2025. R 2ec: Towards Large Recommender Models with Reasoning. InNeurIPS

  40. [42]

    Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, and et al. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Gener- ative Recommendations. arXiv:2402.17152 [cs.LG] https://arxiv.org/abs/2402. 17152

  41. [43]

    Deep Mutual Learning

    Ying Zhang, Tao Xiang, Timothy M. Hospedales, and Huchuan Lu. 2017. Deep Mutual Learning.https://arxiv.org/abs/1706.00384(2017). arXiv:1706.00384

  42. [44]

    Yang Zhang, Wenxin Xu, Xiaoyan Zhao, Wenjie Wang, Fuli Feng, Xiangnan He, and Tat-Seng Chua. 2026. Reinforced Latent Reasoning for LLM-based Recommendation. InThe Fourteenth International Conference on Learning Repre- sentations

  43. [45]

    Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, and Aditya Grover. 2026. Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models.arXiv preprint arXiv:2601.18734(2026)

  44. [46]

    Guorui Zhou, Hengrui Hu, Hongtao Cheng, Huanjie Wang, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Lu Ren, Liao Yu, Pengfei Zheng, Qiang Luo, and et al. 2025. OneRec-V2 Technical Report. arXiv:2508.20900 [cs.IR] https://arxiv. org/abs/2508.20900

  45. [47]

    (S)” and “(T)

    Jie Zhu, Zhifang Fan, Xiaoxie Zhu, Yuchen Jiang, and et al. 2025. RankMixer: Scaling Up Ranking Models in Industrial Recommenders. arXiv:2507.15551 [cs.IR] https://arxiv.org/abs/2507.15551 A Cross-Architecture Generalization To verify that the proposed innovations generalize across different model architectures, we conduct experiments on both GPT-2 [4] an...

  46. [48]

    •Product Search: most common (e.g

    Intent Underdtanding— Identify the user’ssingleprimary intent. •Product Search: most common (e.g. dress, smartphone). •Functional Need: platform features (e.g. track parcel). •Note: If intent≠product search, skip remaining steps

  47. [49]

    women’s windbreaker

    Category Identification— Identify one or more product categories. • Top-level categories: women’s wear, mobile & electronics, home goods, bags, accessories, men’s wear, personal care, snacks, skincare, sports & outdoors, cosmetics, underwear, home apparel, women’s shoes, toys, gaming peripherals, fresh produce, instant food, home appliances, etc. •Sub-cat...

  48. [50]

    •Common attributes: entity, model, brand, audience, color, material, style, season, scene, function, price, etc

    Attribute Recognition— Extract attributesexplicitlystated in the query without any expansion. •Common attributes: entity, model, brand, audience, color, material, style, season, scene, function, price, etc. •Note: The search system must return products that match the query, so strictly retain the attributes that are relevant in the query

  49. [51]

    plaid skirt

    Topic Recommendation— Suggest candidate topics satisfying the query, like categories or specific products. •Note: need meet its categories, and attribute constraints. Donotover-recommend. •Good cases: ◦“plaid skirt”→plaid wrap skirt, plaid A-line skirt. ◦‘La Mer dupe”→Estée Lauder serum, SK-II, Lancôme cream. ◦“knitwear, no turtleneck”→V-neck knitwear, cr...

  50. [52]

    Product Search

    Source Constraints: •Extractonlyunder “Product Search” intent; otherwise outputNot extractableand stop. •Extractonlyfrom theTopic Recommendationsection. •If empty, fall back to keywords fromAttribute RecognitionandCategory Identification

  51. [53]

    Hisense TV

    Extraction Criteria: •Remove off-query items (e.g. query “Hisense TV”⇒exclude “TCL TV”). •Keep specific attributes (e.g. “plaidskirt”). •Remove marketing terms (e.g. “bestseller”, “good quality”). •Merge synonymous attributes (e.g. “woolens” merged into “wool sweater”). •Preserve model details (e.g. “iPhone 15 Pro Max”)

  52. [54]

    autumn-winter outfit

    Output Format: •Comma-separated; at most8 keywords; Each keyword can have a maximum of 10 Chinese characters. •by popularity (descending). •Each keyword must be independently retrievable. Please extract keywords from the analysis results based on the query: Query:{} Analysis Result:{} Step 3 — Preference Calibration Continued on next page Ben Chen et al. ...