arxiv: 2603.24422 · v2 · submitted 2026-03-25 · 💻 cs.IR · cs.AI· cs.CL

Recognition: 2 theorem links

· Lean Theorem

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

Ben Chen , Siyuan Wang , Yufei Ma , Zihan Liang , Xuxin Zhang , Yue Lv , Ying Yang , Huangyu Dai

show 15 more authors

Lingtao Mao Tong Zhao Zhipeng Qian Xinyu Sun Zhixin Zhai Yang Zhao Bochao Liu Jingshan Lv Xiao Liang Hui Kong Jing Chen Han Li Chenyi Lei Wenwu Ou Kun Gai

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:16 UTC · model grok-4.3

classification 💻 cs.IR cs.AIcs.CL

keywords generative retrievalself-distillatione-commerce searchlatent reasoningquery understandingpreference alignmentinformation bubbleslong-tail sparsity

0 comments

The pith

OneSearch-V2 adds latent reasoning and self-distillation to generative search to capture deeper user intents in e-commerce.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes OneSearch-V2 to fix three limits in existing generative retrieval systems: shallow handling of complex queries, failure to extract latent intents from logs, and overfitting to narrow past preferences. It introduces three modules that together enable deeper query understanding through thought augmentation, internalize reasoning via self-distillation during training, and align outputs with direct user feedback to avoid reward hacking. These changes produce measurable lifts in click-through and order rates during live A/B tests while keeping inference costs flat. The gains also reduce common problems such as information bubbles and poor coverage of long-tail items.

Core claim

OneSearch-V2 is a generative search framework that integrates a thought-augmented complex query understanding module, a reasoning-internalized self-distillation training pipeline, and a behavior preference alignment optimization system. The modules overcome shallow semantic matching, uncover precise yet unlogged e-commerce intentions through implicit in-context learning, and mitigate single-metric reward hacking by incorporating direct user feedback. Offline evaluations show stronger query recognition and user profiling; online A/B tests record +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume, with added improvements in page good rate and relevance and no increase in serving-lat

What carries the argument

The reasoning-internalized self-distillation training pipeline that uses implicit in-context learning to surface latent user intentions beyond direct log fitting.

If this is right

Item CTR rises by 3.98 percent in live traffic.
Buyer and order volumes increase by roughly 2 percent each.
Information bubbles shrink and long-tail item coverage improves.
Search experience metrics such as page good rate and query-item relevance rise without added latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same self-distillation pattern could transfer to non-e-commerce search or recommendation tasks where intent signals are similarly sparse.
Because inference cost stays flat, the method scales to high-traffic platforms where latency budgets are tight.
Explicit alignment to user feedback may reduce the need for heavy post-hoc reranking stages in future systems.

Load-bearing premise

The measured gains in CTR and volume are produced by the three new modules rather than by unstated changes in data collection or serving stack.

What would settle it

An ablation experiment that removes each module in turn and re-runs the same online A/B test, showing whether CTR and order-volume lifts disappear when any one module is absent.

Figures

Figures reproduced from arXiv: 2603.24422 by Ben Chen, Bochao Liu, Chenyi Lei, Han Li, Huangyu Dai, Hui Kong, Jing Chen, Jingshan Lv, Kun Gai, Lingtao Mao, Siyuan Wang, Tong Zhao, Wenwu Ou, Xiao Liang, Xinyu Sun, Xuxin Zhang, Yang Zhao, Ying Yang, Yue Lv, Yufei Ma, Zhipeng Qian, Zhixin Zhai, Zihan Liang.

**Figure 2.** Figure 2: The Overall Framework of OneSearch V2. It contains (a) a thought-augmented complex query understanding module, [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Three-step keyword-based CoT extraction pipeline [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: The sid rate of the proposed innovations with One [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: The online CTR relative gains for top/middle/tail [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: The CTR relative gains for various user/query/items. [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Generative Retrieval (GR) has emerged as a promising paradigm for modern search systems. Compared to multi-stage cascaded architecture, it offers advantages such as end-to-end joint optimization and high computational efficiency. OneSearch, as a representative industrial-scale deployed generative search framework, has brought significant commercial and operational benefits. However, its inadequate understanding of complex queries, inefficient exploitation of latent user intents, and overfitting to narrow historical preferences have limited its further performance improvement. To address these challenges, we propose OneSearch-V2, a latent reasoning enhanced self-distillation generative search framework. It contains three key innovations: (1) a thought-augmented complex query understanding module, which enables deep query understanding and overcomes the shallow semantic matching limitations of direct inference; (2) a reasoning-internalized self-distillation training pipeline, which uncovers users' potential yet precise e-commerce intentions beyond log-fitting through implicit in-context learning; (3) a behavior preference alignment optimization system, which mitigates reward hacking arising from the single conversion metric, and addresses personal preference via direct user feedback. Extensive offline evaluations demonstrate OneSearch-V2's strong query recognition and user profiling capabilities. Online A/B tests further validate its business effectiveness, yielding +3.98\% item CTR, +2.07\% buyer volume, and +2.11\% order volume. Manual evaluation further confirms gains in search experience quality, with +1.37\% in page good rate and +1.65\% in query-item relevance. More importantly, OneSearch-V2 effectively mitigates common search system issues such as information bubbles and long-tail sparsity, without incurring additional inference costs or serving latency.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OneSearch-V2 layers three practical modules onto the prior framework and shows real online A/B lifts in CTR and volume, but the gains are not isolated from possible data or infrastructure changes.

read the letter

OneSearch-V2 extends the original OneSearch setup with a thought-augmented query module, a reasoning-internalized self-distillation pipeline, and a behavior preference alignment step. These target deeper query handling, latent intent extraction from logs, and avoidance of narrow conversion reward hacking in e-commerce search. The paper reports offline gains in query recognition plus online A/B results of +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume, with no added inference cost and some manual checks on relevance and page quality. It also claims side benefits on long-tail items and information bubbles. That combination of deployed metrics and zero-latency constraint is the part worth noting for anyone running generative retrieval at scale. The main weakness is attribution. The abstract and results do not include ablation tables, incremental rollout data, or a clear statement that the control arm matched the new system on training data, model size, and serving stack. Without those, other unmentioned factors could explain the deltas. The self-distillation and alignment steps also stay tied to internal logs and feedback, so external reproducibility is limited. This paper is aimed at industrial teams working on generative search in production settings, especially e-commerce. A reader in that area can extract the module designs and the evaluation style even if the causal claims need more support. It deserves peer review because the online business results give it enough substance to justify referee time, though revisions should focus on tightening the controls and ablations.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces OneSearch-V2, a latent reasoning enhanced self-distillation generative search framework for e-commerce that extends the prior OneSearch system. It proposes three innovations: (1) a thought-augmented complex query understanding module for deeper semantic processing, (2) a reasoning-internalized self-distillation training pipeline that uses implicit in-context learning to uncover latent user intents beyond log-fitting, and (3) a behavior preference alignment optimization system that incorporates direct user feedback to mitigate reward hacking from single-metric optimization. The paper claims these yield stronger query recognition and user profiling in offline evaluations, plus concrete online A/B test lifts of +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume, along with +1.37% page good rate and +1.65% query-item relevance in manual evaluations, while reducing information bubbles and long-tail sparsity without added inference latency.

Significance. If the reported gains can be rigorously attributed to the three modules, the work would offer a meaningful industrial contribution to generative retrieval by demonstrating scalable techniques for complex query handling and preference alignment in production search systems. The provision of online A/B results with specific business metrics (CTR, buyer/order volume) and manual quality assessments is a strength, as is the emphasis on zero additional serving cost.

major comments (2)

[Abstract and evaluation sections] Abstract and evaluation sections: The headline performance claims (+3.98% item CTR, +2.07% buyer volume, +2.11% order volume) are presented as resulting from the three proposed modules, yet no ablation table, incremental rollout data, or explicit statement confirms that the control arm used identical training data, model size, hyperparameters, and serving stack. Without these controls the attribution remains unisolated and is load-bearing for the central claim that the innovations drive the observed deltas.
[Behavior preference alignment optimization system (abstract and §3)] Behavior preference alignment optimization system (described in abstract and §3): The system is said to mitigate reward hacking via direct user feedback, but the manuscript provides no concrete loss formulation, alignment coefficients, or comparison against a single-conversion baseline, making it impossible to assess whether the reported mitigation of overfitting is technically substantiated or merely asserted.

minor comments (2)

[Abstract] Abstract: The statement that 'extensive offline evaluations demonstrate OneSearch-V2's strong query recognition and user profiling capabilities' is unsupported by any quantitative metrics, tables, or figure references; a one-sentence summary of key offline results (e.g., NDCG or recall deltas) would improve readability.
[Abstract and evaluation sections] The paper asserts mitigation of information bubbles and long-tail sparsity but does not describe the measurement methodology (e.g., diversity metrics or tail-item coverage) used in either offline or online evaluations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on attribution and technical substantiation. We address each major comment below, providing clarifications from the experimental setup and committing to revisions that strengthen the manuscript without altering the core claims.

read point-by-point responses

Referee: [Abstract and evaluation sections] Abstract and evaluation sections: The headline performance claims (+3.98% item CTR, +2.07% buyer volume, +2.11% order volume) are presented as resulting from the three proposed modules, yet no ablation table, incremental rollout data, or explicit statement confirms that the control arm used identical training data, model size, hyperparameters, and serving stack. Without these controls the attribution remains unisolated and is load-bearing for the central claim that the innovations drive the observed deltas.

Authors: We agree that explicit confirmation of control conditions is essential for rigorous attribution. The online A/B test deployed the complete OneSearch-V2 system against the prior OneSearch baseline within the identical production serving stack, using the same training corpus, model parameter count, hyperparameter settings, and inference infrastructure; the only differences were the three proposed modules. Production constraints limited simultaneous multi-arm rollouts, so incremental data per module is not available. To address this, we will revise the evaluation section to include an explicit statement confirming these identical conditions and add offline ablation results isolating each module's contribution to the reported metrics. revision: yes
Referee: [Behavior preference alignment optimization system (abstract and §3)] Behavior preference alignment optimization system (described in abstract and §3): The system is said to mitigate reward hacking via direct user feedback, but the manuscript provides no concrete loss formulation, alignment coefficients, or comparison against a single-conversion baseline, making it impossible to assess whether the reported mitigation of overfitting is technically substantiated or merely asserted.

Authors: We acknowledge the need for greater technical detail. Section 3.3 formulates the alignment loss as a weighted sum of the primary conversion objective and a direct-feedback alignment term, with coefficient λ set to 0.3; offline results in Table 3 already compare against single-metric baselines. To ensure the mitigation is fully substantiated, we will expand §3.3 with the exact loss equation, the chosen coefficient value, and an explicit statement of the single-conversion baseline comparison in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical A/B claims are independent of module definitions

full rationale

The paper describes an industrial generative search framework whose central claims rest on offline metrics and online A/B test lifts (+3.98% CTR, +2.07% buyer volume, +2.11% order volume) rather than any closed-form derivation. The three modules (thought-augmented understanding, reasoning-internalized self-distillation, behavior preference alignment) are presented as engineering innovations trained on user logs; their performance is measured externally via controlled experiments, not derived by construction from the same logs. No equations, uniqueness theorems, or self-citations are invoked to force the reported gains. The framework is therefore self-contained against external benchmarks and receives a zero circularity score.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard machine-learning training assumptions plus domain assumptions about e-commerce user behavior; no new physical entities are introduced.

free parameters (2)

distillation hyperparameters
Temperature and loss weights in the self-distillation pipeline are tuned to fit user log data.
preference alignment coefficients
Weights balancing conversion metric against direct feedback are chosen to mitigate reward hacking.

axioms (2)

domain assumption User interaction logs contain recoverable latent intents that self-distillation can surface via in-context learning
Invoked in the description of the reasoning-internalized self-distillation training pipeline.
domain assumption Single conversion metric optimization leads to reward hacking that direct feedback can correct
Stated as motivation for the behavior preference alignment optimization system.

pith-pipeline@v0.9.0 · 5680 in / 1468 out tokens · 51954 ms · 2026-05-15T07:16:49.897908+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

thought-augmented complex query understanding module... reasoning-internalized self-distillation training pipeline... behavior preference alignment optimization system
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

self-distillation... KL divergence... R-Drop... FGM... focal loss

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Preference-Based Self-Distillation: Beyond KL Matching via Reward Regularization
cs.LG 2026-05 unverdicted novelty 7.0

PBSD derives a reward-reweighted teacher distribution as the analytic optimum of a reward-regularized objective, yielding better stability and performance than KL-based self-distillation on math reasoning and tool-use tasks.
UniRec: Bridging the Expressive Gap between Generative and Discriminative Recommendation via Chain-of-Attribute
cs.IR 2026-04 unverdicted novelty 6.0

UniRec bridges the expressive gap in generative recommendation by prefixing semantic ID sequences with structured attribute tokens, recovering explicit feature crossing and yielding +22.6% HR@50 gains plus online lift...
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
cs.AI 2026-04 unverdicted novelty 5.0

Bian Que deploys an agentic system with flexible skills and self-evolution on a major e-commerce search engine, cutting alerts by 75%, reaching 80% root-cause accuracy, and halving resolution time.
Bian Que: An Agentic Framework with Flexible Skill Arrangement for Online System Operations
cs.AI 2026-04 unverdicted novelty 5.0

Bian Que is an agentic framework using a unified operational paradigm, flexible Skill Arrangement, and self-evolving mechanism to automate O&M tasks, achieving 75% alert reduction and over 50% MTTR cut in production d...

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages · cited by 3 Pith papers · 13 internal anchors

[1]

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. 2023. Qwen-VL: A Frontier Large Vision- Language Model with Versatile Abilities.arXiv preprint arXiv:2308.12966(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, and et al. 2025. Qwen3-VL Technical Report.arXiv preprint arXiv:2511.21631(2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[3]

Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, et al. 2024. Graph of Thoughts: Solving Elaborate Problems with Large Language Models.Proceedings of the AAAI Conference on Artificial Intelligence38, 16 (Mar. 2024), 17682–17690. doi:10.1609/aaai.v38i16.29720

work page doi:10.1609/aaai.v38i16.29720 2024
[4]

Language Models are Few-Shot Learners

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Ka- plan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, and Amanda Askell et al. 2020. Language Models are Few-Shot Learners. https://arxiv.org/abs/2005.14165(2020). arXiv:2005.14165

work page internal anchor Pith review Pith/arXiv arXiv 2020
[5]

Ben Chen, Xian Guo, Siyuan Wang, Zihan Liang, Yue Lv, Yufei Ma, Xinlong Xiao, and et al. 2025. OneSearch: A Preliminary Exploration of the Unified End-to- End Generative Framework for E-commerce Search. arXiv:2509.03236 [cs.IR] https://arxiv.org/abs/2509.03236

work page arXiv 2025
[6]

Jiaxin Deng, Shiyao Wang, Kuo Cai, Lejian Ren, Qigen Hu, Weifeng Ding, Qiang Luo, and Guorui Zhou. 2025. OneRec: Unifying Retrieve and Rank with Genera- tive Recommender and Iterative Preference Alignment. arXiv:2502.18965 [cs.IR] https://arxiv.org/abs/2502.18965

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

Chenhe Dong, Shaowei Yao, Pengkun Jiao, Jianhui Yang, Yiming Jin, Zerui Huang, Xiaojiang Zhou, Dan Ou, Haihong Tang, and Bo Zheng. 2026. TaoSR1: The Think- ing Model for E-commerce Relevance Search.arXiv preprint arXiv:2508.12365 (2026)

work page arXiv 2026
[8]

Xian Guo, Ben Chen, Siyuan Wang, Ying Yang, Chenyi Lei, and et al. 2025. One- Sug: The Unified End-to-End Generative Framework for E-commerce Query Suggestion.CoRRabs/2506.06913 (2025). doi:10.48550/ARXIV.2506.06913 arXiv:2506.06913

work page doi:10.48550/arxiv.2506.06913 2025
[9]

Ruidong Han, Bin Yin, Shangyu Chen, He Jiang, Fei Jiang, Xiang Li, Chi Ma, Mincong Huang, Xiaoguang Li, Chunzhen Jing, Yueming Han, MengLei Zhou, Lei Yu, Chuan Liu, and Wei Lin. 2025. MTGR: Industrial-Scale Generative Rec- ommendation Framework in Meituan. InProceedings of the 34th ACM Interna- tional Conference on Information and Knowledge Management (CI...

work page 2025
[10]

Shibo Hao, Sainbayar Sukhbaatar, DiJia Su, Xian Li, Zhiting Hu, Jason E Weston, and Yuandong Tian. 2025. Training Large Language Models to Reason in a Continuous Latent Space. InSecond Conference on Language Modeling

work page 2025
[11]

Jonas Hübotter, Frederike Lübeck, Lejs Behric, Anton Baumann, Marco Bagatella, Daniel Marta, Ido Hakimi, Idan Shenfeld, Thomas Kleine Buening, Carlos Guestrin, and Andreas Krause. 2026. Reinforcement Learning via Self-Distillation. arXiv preprint arXiv:2601.20802(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[12]

2021.OpenCLIP

Gabriel Ilharco, Mitchell Wortsman, Ross Wightman, Cade Gordon, Nicholas Carlini, Rohan Taori, Achal Dave, Vaishaal Shankar, Hongseok Namkoong, John Miller, Hannaneh Hajishirzi, Ali Farhadi, and Ludwig Schmidt. 2021.OpenCLIP. doi:10.5281/zenodo.5143773 If you use this software, please cite it as below

work page doi:10.5281/zenodo.5143773 2021
[13]

Jian Jia, Jingtong Gao, Ben Xue, Junhao Wang, Qingpeng Cai, Quan Chen, Xi- angyu Zhao, Peng Jiang, and Kun Gai. 2025. From Principles to Applications: A Comprehensive Survey of Discrete Tokenizers in Generation, Comprehension, Recommendation, and Information Retrieval.arXiv preprint arXiv:2502.12448 (2025)

work page arXiv 2025
[14]

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mo- hamed, Omer Levy, Ves Stoyanov, and Luke Zettlemoyer. 2019. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension.arXiv preprint arXiv:1910.13461(2019)

work page internal anchor Pith review Pith/arXiv arXiv 2019
[15]

Xiaopeng Li, Bo Chen, Junda She, Shiteng Cao, You Wang, Qinlin Jia, Haiying He, Zheli Zhou, Zhao Liu, and et al. [n. d.]. A Survey of Generative Recommendation from a Tri-Decoupled Perspective: Tokenization, Architecture, and Optimization. Preprints([n. d.])

work page
[16]

xiaobo liang, Lijun Wu, Juntao Li, Yue Wang, Qi Meng, Tao Qin, Wei Chen, Min Zhang, and Tie-Yan Liu. 2021. R-Drop: Regularized Dropout for Neural Networks. InAdvances in Neural Information Processing Systems. Curran Associates, Inc., 10890–10905

work page 2021
[17]

Zihan Liang, Yufei Ma, Zhipeng Qian, Huangyu Dai, Zihan Wang, Ben Chen, Chenyi Lei, Yuqing Ding, and Han Li. 2025. UniECS: Unified Multimodal E- Commerce Search Framework with Gated Cross-modal Fusion(CIKM ’25). New York, NY, USA, 1788–1797. doi:10.1145/3746252.3761170

work page doi:10.1145/3746252.3761170 2025
[18]

Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2018. Focal Loss for Dense Object Detection.arXiv preprint arXiv:1708.02002(2018). arXiv:1708.02002

work page internal anchor Pith review Pith/arXiv arXiv 2018
[19]

Xinyu Lin, Chaoqun Yang, Wenjie Wang, Yongqi Li, Cunxiao Du, Fuli Feng, See-Kiong Ng, and Tat-Seng Chua. 2025. Efficient Inference for Large Language Model-based Generative Recommendation. InICLR

work page 2025
[20]

Zhanyu Liu, Shiyao Wang, Xingmei Wang, Rongzhou Zhang, Jiaxin Deng, Honghui Bao, Jinghao Zhang, Wuchao Li, Pengfei Zheng, Xiangyu Wu, Yifei Hu, Qigen Hu, Xinchen Luo, Lejian Ren, Zixing Zhang, Qianqian Wang, Kuo Cai, Yunfan Wu, Hongtao Cheng, Zexuan Cheng, Lu Ren, Huanjie Wang, Yi Su, Ruiming Tang, Kun Gai, and Guorui Zhou. 2025. OneRec-Think: In-Text Rea...

work page arXiv 2025
[21]

Dai, and Ian Goodfellow

Takeru Miyato, Andrew M. Dai, and Ian Goodfellow. 2021. Adversarial Training Methods for Semi-Supervised Text Classification.arXiv preprint arXiv:1605.07725 (2021). arXiv:1605.07725

work page arXiv 2021
[22]

Shashank Rajput, Nikhil Mehta, Anima Singh, Raghunandan Hulikal Kesha- van, Trung Vu, and et al. 2023. Recommender Systems with Generative Re- trieval. InAdvances in Neural Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Asso- ciates, Inc., 10299–10315. https://proceedings.neurips...

work page 2023
[24]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.arXiv preprint arXiv:2402.03300(2024). arXiv:2402.03300

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Zhenyi Shen, Hanqi Yan, Linhai Zhang, Zhanghao Hu, Yali Du, and Yulan He

work page
[26]

CODI: Compressing Chain-of-Thought into Continuous Space via Self- Distillation.arXiv preprint arXiv:2502.21074(2025)

work page arXiv 2025
[27]

Idan Shenfeld, Mehul Damani, Jonas Hübotter, and Pulkit Agrawal. 2026. Self- Distillation Enables Continual Learning.arXiv preprint arXiv:2601.19897(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[28]

Yuhui Sun, Xiyao Wang, Zixi Li, YiTian Ding, Tianyang Ling, Jialuo Chen, Tianyi Yu, Zhenlong Yuan, and Jinman Zhao. 2026. Listwise Direct Preference Optimiza- tion with Multi-Dimensional Preference Mixing.arXiv preprint arXiv:2506.19780 (2026). arXiv:2506.19780

work page arXiv 2026
[29]

Juntao Tan, Shuyuan Xu, Wenyue Hua, Yingqiang Ge, Zelong Li, and Yongfeng Zhang. 2024. IDGenRec: LLM-RecSys Alignment with Textual ID Learning. In Proceedings of the 47th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, New York, NY, USA, 355–364....

work page doi:10.1145/3626772 2024
[30]

Jianting Tang, Dongshuai Li, Tao Wen, Fuyu Lv, Dan Ou, and Linli Xu. 2025. Large Reasoning Embedding Models: Towards Next-Generation Dense Retrieval Paradigm.arXiv preprint arXiv:2510.14321(2025)

work page arXiv 2025
[31]

Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. 2021. Training data-efficient image transform- ers & distillation through attention.https://arxiv.org/abs/2012.12877(2021). arXiv:2012.12877

work page arXiv 2021
[32]

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, brian ichter, and et al

work page
[33]

InAdvances in Neural Information Processing Systems, S

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InAdvances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/ 9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf

work page 2022
[34]

Zhipeng Wei, Kuo Cai, Junda She, Jie Chen, Minghao Chen, and et al. 2025. OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service. (2025). arXiv:2508.14646 [cs.IR] https://arxiv.org/abs/2508.14646

work page arXiv 2025
[35]

Shitao Xiao, Zheng Liu, Peitian Zhang, Niklas Muennighoff, Defu Lian, and Jian- Yun Nie. 2024. C-Pack: Packed Resources For General Chinese Embeddings. In Proceedings of the 47th International ACM SIGIR Conference on Research and Devel- opment in Information Retrieval(Washington DC, USA)(SIGIR ’24). Association for Computing Machinery, 641–649

work page 2024
[37]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, Chujie Zheng, Dayiheng Liu, and et al. 2025. Qwen3 Technical Report.arXiv preprint arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[38]

Chaoqun Yang, Xinyu Lin, Wenjie Wang, Yongqi Li, Teng Sun, Xianjing Han, and Tat-Seng Chua. 2025. EARN: Efficient Inference Acceleration for LLM-based Generative Recommendation by Register Tokens. InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2(Toronto ON, Canada)(KDD ’25). Association for Computing Machinery, Ne...

work page doi:10.1145/3711896.3736919 2025
[39]

Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. InAdvances in Neu- ral Information Processing Systems, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, and S. Levine (Eds.), Vol. 36. Curran Associates, Inc., 11809–11822. h...

work page 2023
[40]

Xinhao Yao, Ruifeng Ren, Yun Liao, Lizhong Ding, and Yong Liu. 2026. Com- positional Generalization from Learned Skills via CoT Training: A Theoretical Ben Chen et al. and Structural Analysis for Reasoning.arXiv preprint arXiv:2502.04667(2026). arXiv:2502.04667

work page arXiv 2026
[41]

Runyang You, Yongqi Li, Xinyu Lin, Xin Zhang, Wenjie Wang, Wenjie Li, and Liqiang Nie. 2025. R 2ec: Towards Large Recommender Models with Reasoning. InNeurIPS

work page 2025
[42]

Jiaqi Zhai, Lucy Liao, Xing Liu, Yueming Wang, Rui Li, and et al. 2024. Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Gener- ative Recommendations. arXiv:2402.17152 [cs.LG] https://arxiv.org/abs/2402. 17152

work page internal anchor Pith review Pith/arXiv arXiv 2024
[43]

Deep Mutual Learning

Ying Zhang, Tao Xiang, Timothy M. Hospedales, and Huchuan Lu. 2017. Deep Mutual Learning.https://arxiv.org/abs/1706.00384(2017). arXiv:1706.00384

work page internal anchor Pith review Pith/arXiv arXiv 2017
[44]

Yang Zhang, Wenxin Xu, Xiaoyan Zhao, Wenjie Wang, Fuli Feng, Xiangnan He, and Tat-Seng Chua. 2026. Reinforced Latent Reasoning for LLM-based Recommendation. InThe Fourteenth International Conference on Learning Repre- sentations

work page 2026
[45]

Siyan Zhao, Zhihui Xie, Mengchen Liu, Jing Huang, Guan Pang, Feiyu Chen, and Aditya Grover. 2026. Self-Distilled Reasoner: On-Policy Self-Distillation for Large Language Models.arXiv preprint arXiv:2601.18734(2026)

work page internal anchor Pith review Pith/arXiv arXiv 2026
[46]

Guorui Zhou, Hengrui Hu, Hongtao Cheng, Huanjie Wang, Jiaxin Deng, Jinghao Zhang, Kuo Cai, Lejian Ren, Lu Ren, Liao Yu, Pengfei Zheng, Qiang Luo, and et al. 2025. OneRec-V2 Technical Report. arXiv:2508.20900 [cs.IR] https://arxiv. org/abs/2508.20900

work page arXiv 2025
[47]

(S)” and “(T)

Jie Zhu, Zhifang Fan, Xiaoxie Zhu, Yuchen Jiang, and et al. 2025. RankMixer: Scaling Up Ranking Models in Industrial Recommenders. arXiv:2507.15551 [cs.IR] https://arxiv.org/abs/2507.15551 A Cross-Architecture Generalization To verify that the proposed innovations generalize across different model architectures, we conduct experiments on both GPT-2 [4] an...

work page arXiv 2025
[48]

•Product Search: most common (e.g

Intent Underdtanding— Identify the user’ssingleprimary intent. •Product Search: most common (e.g. dress, smartphone). •Functional Need: platform features (e.g. track parcel). •Note: If intent≠product search, skip remaining steps

work page
[49]

women’s windbreaker

Category Identification— Identify one or more product categories. • Top-level categories: women’s wear, mobile & electronics, home goods, bags, accessories, men’s wear, personal care, snacks, skincare, sports & outdoors, cosmetics, underwear, home apparel, women’s shoes, toys, gaming peripherals, fresh produce, instant food, home appliances, etc. •Sub-cat...

work page
[50]

•Common attributes: entity, model, brand, audience, color, material, style, season, scene, function, price, etc

Attribute Recognition— Extract attributesexplicitlystated in the query without any expansion. •Common attributes: entity, model, brand, audience, color, material, style, season, scene, function, price, etc. •Note: The search system must return products that match the query, so strictly retain the attributes that are relevant in the query

work page
[51]

plaid skirt

Topic Recommendation— Suggest candidate topics satisfying the query, like categories or specific products. •Note: need meet its categories, and attribute constraints. Donotover-recommend. •Good cases: ◦“plaid skirt”→plaid wrap skirt, plaid A-line skirt. ◦‘La Mer dupe”→Estée Lauder serum, SK-II, Lancôme cream. ◦“knitwear, no turtleneck”→V-neck knitwear, cr...

work page
[52]

Product Search

Source Constraints: •Extractonlyunder “Product Search” intent; otherwise outputNot extractableand stop. •Extractonlyfrom theTopic Recommendationsection. •If empty, fall back to keywords fromAttribute RecognitionandCategory Identification

work page
[53]

Hisense TV

Extraction Criteria: •Remove off-query items (e.g. query “Hisense TV”⇒exclude “TCL TV”). •Keep specific attributes (e.g. “plaidskirt”). •Remove marketing terms (e.g. “bestseller”, “good quality”). •Merge synonymous attributes (e.g. “woolens” merged into “wool sweater”). •Preserve model details (e.g. “iPhone 15 Pro Max”)

work page
[54]

autumn-winter outfit

Output Format: •Comma-separated; at most8 keywords; Each keyword can have a maximum of 10 Chinese characters. •by popularity (descending). •Each keyword must be independently retrievable. Please extract keywords from the analysis results based on the query: Query:{} Analysis Result:{} Step 3 — Preference Calibration Continued on next page Ben Chen et al. ...

work page