Echoes in Filter Bubble: Diagnosing and Curing Popularity Bias in Generative Recommenders

Bangguo Zhu; Chengqi Zhang; Hao Chen; Jun Yin; Peng Huo; Ruochen Liu; Senzhang Wang; Shirui Pan

arxiv: 2605.16825 · v1 · pith:M5WLHRDPnew · submitted 2026-05-16 · 💻 cs.IR · cs.AI

Echoes in Filter Bubble: Diagnosing and Curing Popularity Bias in Generative Recommenders

Jun Yin , Bangguo Zhu , Peng Huo , Ruochen Liu , Hao Chen , Senzhang Wang , Shirui Pan , Chengqi Zhang This is my paper

Pith reviewed 2026-05-19 20:36 UTC · model grok-4.3

classification 💻 cs.IR cs.AI

keywords generative recommenderspopularity biasdebiasing methodstokenizationasymmetric optimizationfair recommendationsrecommender systemsitem exposure

0 comments

The pith

Generative recommenders develop severe popularity bias from token-level optimization flaws and uniform item tokenization, which Ghost corrects using asymmetric unlikelihood optimization and skeleton-founded tokenization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates why generative recommender systems, which frame item prediction as generating sequences of tokens, strongly favor already-popular items. Theoretical analysis traces the bias to two design elements: an optimization process that operates at the token level without balancing popular and rare items, and a tokenization scheme that treats all items similarly without structural distinctions. The authors introduce Ghost, which applies asymmetric unlikelihood training to down-weight popular items differently and uses skeleton-founded tokenization to incorporate item structure. Experiments on three datasets show Ghost reduces popularity bias more effectively than adapted traditional debiasing methods while causing only minor drops in overall recommendation accuracy. This matters because fairer exposure of items can prevent filter bubbles and give users access to a broader range of recommendations.

Core claim

The severe popularity bias emerges from the confluence of a token-level optimization flaw and the undifferentiated property of item tokenization. The proposed Ghost system, built on asymmetric unlikelihood optimization and skeleton-founded tokenization, substantially alleviates popularity bias and promotes fairer recommendations across three datasets while incurring only slight degradation to the overall recommendation utility.

What carries the argument

Asymmetric unlikelihood optimization together with skeleton-founded tokenization, which rebalances token probabilities during training and differentiates items through structural skeletons to address the identified flaws.

If this is right

Generative recommenders can achieve substantially fairer item exposure by changing only the optimization objective and tokenization step inside their existing end-to-end framework.
Token-level optimization flaws can be directly corrected by applying asymmetric penalties that reduce the likelihood assigned to popular items relative to less popular ones.
Skeleton-founded tokenization mitigates the undifferentiated property by embedding structural item information into the token assignment process.
Ghost outperforms several state-of-the-art baselines adapted from traditional debiasing methods on three public datasets.
Overall recommendation utility experiences only slight degradation while bias metrics improve markedly.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same combination of asymmetric objectives and structure-aware tokenization could be tested in other generative sequence models used for tasks such as next-item prediction in non-recommendation domains.
Integrating Ghost's components with post-processing fairness constraints might further reduce bias without additional utility loss.
Wider adoption could shift recommendation platforms toward surfacing more long-tail items, changing the distribution of user attention over time.

Load-bearing premise

The theoretical analyses correctly pinpoint the root causes of popularity bias in generative recommenders, and the proposed asymmetric unlikelihood optimization together with skeleton-founded tokenization effectively mitigate these causes without introducing new biases.

What would settle it

An experiment that replaces asymmetric unlikelihood with standard likelihood training or switches to fully undifferentiated tokenization and then measures whether popularity bias returns to the levels seen in baseline generative recommenders.

Figures

Figures reproduced from arXiv: 2605.16825 by Bangguo Zhu, Chengqi Zhang, Hao Chen, Jun Yin, Peng Huo, Ruochen Liu, Senzhang Wang, Shirui Pan.

**Figure 1.** Figure 1: a). Comparison of Hit-Rate@10 (i.e., HR@10) between head and tail items. b). Comparison between the number of head and tail items in the recommendation list provided by three GRs. c). Tendency of HR@10 as the backbone parameters of LC-Rec scaling up. a). Trade-off between overall performance and tail performance b). Dilemma between recommendation performance and fairness [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗

**Figure 2.** Figure 2: Limitations of current popularity debiasing methods on GRs. [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the Ghost model. First, textual representations are encoded based on item features. After categorizing items into head and tail sets, SKT [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Analysis of SID lengths, including head length [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Numbers of head and tail items in the recommendation results provided by [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Tendency of Ghost performance on Ins dataset, under different AUO weights [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Tendency of Ghost performance on Ins dataset, under different undesired collection sizes. The [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗

**Figure 8.** Figure 8: Long tail distribution of the item popularity. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Number of tail items that inherit SID prefix from the same head items. To provide a more fine-grained understanding of where the performance improvements originate, we analyze the recommendation results across different popularity segments [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Performance comparison of each equal-sized grouping on Ins dataset. [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Tendency of Ghost performance on Ins dataset, under different learning rates. The [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Tendency of Ghost performance on Games dataset, under different optimization epochs. The [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

read the original abstract

Recently, Generative Recommenders (GRs), characterized by a unified end-to-end framework, have exhibited astonishing potential in transforming the recommendation paradigm. Despite their effectiveness, we recognize that GRs are still susceptible to the long-standing issue of popularity bias that has pervaded the recommendation community. Although a few studies have attempted to extend traditional debiasing methods to GRs, their effectiveness is marginal, and the fundamental reason why GRs suffer from popularity bias remains under-explored. To bridge this gap, this study focuses on two core aspects in GRs: the optimization of generative framework and the item tokenization based on semantic index. Based on theoretical analyses, we identify that the severe popularity bias emerges from the confluence of a token-level optimization flaw and the undifferentiated property of item tokenization. Accordingly, this study develops a novel generative recommender system, called Ghost, by designing the asymmetric unlikelihood optimization and the skeleton-founded tokenization. Extensive empirical evaluations across three datasets, alongside multiple SOTA baselines, reveal that Ghost substantially alleviates popularity bias and promotes fairer recommendations, while incurring slight degradation to the overall recommendation utility.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ghost diagnoses popularity bias in generative recommenders via token optimization and tokenization flaws but the theory needs tighter isolation from other GR properties.

read the letter

The one or two things to know about this paper are that it identifies a token-level optimization flaw and undifferentiated item tokenization as the sources of popularity bias in generative recommenders, and proposes Ghost as a solution using asymmetric unlikelihood optimization and skeleton-founded tokenization. The work does a good job of focusing on the generative recommender setting, which is relatively new. The theoretical analyses connect the optimization and tokenization issues to the bias problem, and the empirical results on three datasets indicate that Ghost reduces popularity bias more effectively than prior methods while only slightly affecting overall recommendation performance. This provides concrete evidence that targeted changes in the generative framework can help with fairness without major trade-offs. The soft spots are mainly around the strength of the theoretical claims. The analysis of per-token gradients is a start, but it does not include a closed-form derivation that shows bias amplification specifically under realistic item popularity distributions. This leaves open the possibility that the flaw is not fully isolated from other generative modeling choices, such as autoregressive decoding or embedding geometry. If the bias is more tied to token frequencies in the semantic index, the asymmetric unlikelihood approach could be masking the issue rather than resolving it at the source. The evaluations are described as extensive, but details on controls for post-hoc selection or statistical significance would help confirm the robustness. This paper is for people in recommender systems research who are interested in generative models and bias mitigation. A reader focused on fairness in recommendations or new optimization techniques for GRs would get useful ideas from it. It deserves a serious referee because it introduces specific new methods with supporting experiments in an active area, even if the theory could use more development. I would recommend sending this to peer review.

Referee Report

3 major / 3 minor

Summary. The paper claims that popularity bias in Generative Recommenders (GRs) arises from the confluence of a token-level optimization flaw in the standard negative log-likelihood objective and the undifferentiated property of semantic-index-based item tokenization. It proposes Ghost, which introduces asymmetric unlikelihood optimization and skeleton-founded tokenization to mitigate these issues, and reports that Ghost reduces popularity bias while incurring only slight degradation in overall recommendation utility across three datasets and multiple SOTA baselines.

Significance. If the theoretical identification of the root causes holds and the proposed fixes are shown to be robust, the work would offer a principled, GR-specific approach to popularity bias that improves on marginal extensions of traditional debiasing methods. The multi-dataset empirical evaluation against strong baselines provides a reasonable test of practical impact, though the slight utility trade-off requires careful quantification.

major comments (3)

[§3] §3 (theoretical analysis): The per-token gradient analysis identifies a token-level optimization flaw but does not derive a closed-form expression showing bias amplification under realistic item popularity distributions (e.g., power-law). Without this isolation from autoregressive decoding and embedding geometry, it remains unclear whether the flaw is the primary driver or an artifact of token frequency correlations in the semantic index.
[§4.2] §4.2 (asymmetric unlikelihood optimization): The claim that the proposed loss cures the identified flaw without introducing new biases lacks an ablation that holds the tokenization fixed while varying only the optimization; the current experiments conflate the two contributions, weakening the causal link to the diagnosed root cause.
[Table 2] Table 2 (main results): The reported improvements in fairness metrics (e.g., popularity bias reduction) are not accompanied by statistical significance tests across the three datasets; with only point estimates shown, it is difficult to assess whether the alleviation is reliable or sensitive to random seeds and hyperparameter choices.

minor comments (3)

[§2.2] Notation for the semantic index and skeleton tokens is introduced without a clear diagram or example in §2.2; adding a small illustrative figure would improve readability.
[Abstract and §5] The abstract states 'slight degradation to the overall recommendation utility' but the main text does not quantify this trade-off with a single scalar (e.g., average NDCG drop across datasets); a summary table row would help.
[§1.1] A few citations to prior generative recommender work (e.g., on autoregressive decoding) appear in the related-work section but are not referenced when discussing the optimization flaw; cross-references would strengthen the positioning.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and valuable suggestions. We have carefully considered each comment and provide our responses below, along with planned revisions to the manuscript.

read point-by-point responses

Referee: [§3] §3 (theoretical analysis): The per-token gradient analysis identifies a token-level optimization flaw but does not derive a closed-form expression showing bias amplification under realistic item popularity distributions (e.g., power-law). Without this isolation from autoregressive decoding and embedding geometry, it remains unclear whether the flaw is the primary driver or an artifact of token frequency correlations in the semantic index.

Authors: We thank the referee for this observation. Our gradient analysis highlights the token-level bias in the NLL objective. To address the request for a closed-form expression, we will include a derivation assuming a power-law (Zipf) distribution in the revised manuscript, showing how the bias amplifies with popularity skew. On isolating from autoregressive decoding and embedding geometry, the analysis is performed at the loss level prior to decoding; however, we will add a paragraph discussing potential interactions with these factors to clarify that the flaw is not merely an artifact of token frequencies. revision: yes
Referee: [§4.2] §4.2 (asymmetric unlikelihood optimization): The claim that the proposed loss cures the identified flaw without introducing new biases lacks an ablation that holds the tokenization fixed while varying only the optimization; the current experiments conflate the two contributions, weakening the causal link to the diagnosed root cause.

Authors: We concur that separating the contributions is important for establishing causality. Currently, the experiments evaluate the combined effect of asymmetric unlikelihood optimization and skeleton-founded tokenization. In the revision, we will introduce an ablation study that keeps the tokenization fixed and varies only the optimization objective, allowing us to isolate the impact of the asymmetric unlikelihood loss and confirm it addresses the diagnosed flaw without new biases. revision: yes
Referee: [Table 2] Table 2 (main results): The reported improvements in fairness metrics (e.g., popularity bias reduction) are not accompanied by statistical significance tests across the three datasets; with only point estimates shown, it is difficult to assess whether the alleviation is reliable or sensitive to random seeds and hyperparameter choices.

Authors: We appreciate this suggestion for improving the robustness of our empirical claims. The current Table 2 presents point estimates. We will update the table to include results from multiple runs with different random seeds, reporting means and standard deviations, along with statistical significance tests (such as t-tests) to demonstrate that the observed reductions in popularity bias are statistically significant and not sensitive to initialization. revision: yes

Circularity Check

0 steps flagged

No circularity in theoretical identification of bias sources or proposed mitigations

full rationale

The paper derives its central claim—that popularity bias arises from token-level optimization flaws and undifferentiated item tokenization—via theoretical analyses of the generative framework, then introduces asymmetric unlikelihood optimization and skeleton-founded tokenization as targeted remedies. No equations or self-citations reduce these diagnoses or fixes to fitted parameters, self-referential predictions, or ansatzes imported from prior author work; the analyses stand as independent examinations of per-token gradients and semantic indexing properties, with empirical results on three datasets providing external validation. The derivation chain is self-contained and does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

With only the abstract available, no explicit free parameters, axioms, or invented entities can be extracted. The new optimization and tokenization methods likely rest on unstated assumptions about the generative framework and item semantics that are not detailed here.

pith-pipeline@v0.9.0 · 5750 in / 1063 out tokens · 36850 ms · 2026-05-19T20:36:58.612047+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 2 internal anchors

[1]

Deep interest network for click-through rate prediction,

G. Zhou, C. Song, X. Zhu, Y . Fan, H. Zhu, X. Ma, Y . Yan, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” inProceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

work page 2018
[2]

Deep neural networks for youtube recommendations,

P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” inProceedings of the ACM Conference on Recommender Systems, 2016

work page 2016
[3]

Deepinf: Social influence prediction with deep learning,

J. Qiu, J. Tang, H. Ma, Y . Dong, K. Wang, and J. Tang, “Deepinf: Social influence prediction with deep learning,” inProceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

work page 2018
[4]

Recommender systems with generative retrieval,

S. Rajput, N. Mehta, A. Singh, R. H. Keshavan, T. Vu, L. Heldt, L. Hong, Y . Tay, V . Q. Tran, J. Samost, M. Kula, E. H. Chi, and M. Sathiamoorthy, “Recommender systems with generative retrieval,” inProceedings of the International Conference on Neural Information Processing Systems, 2023

work page 2023
[5]

Adapt- ing large language models by integrating collaborative semantics for recommendation,

B. Zheng, Y . Hou, H. Lu, Y . Chen, W. X. Zhao, and M. Chen, “Adapt- ing large language models by integrating collaborative semantics for recommendation,” inProceedings of the IEEE International Conference on Data Engineering, 2024

work page 2024
[6]

Learnable item tokenization for generative recommendation,

W. Wang, H. Bao, X. Lin, J. Zhang, Y . Li, F. Feng, S.-K. Ng, and T.-S. Chua, “Learnable item tokenization for generative recommendation,” in Proceedings of the ACM International Conference on Information and Knowledge Management, 2024

work page 2024
[7]

Unleash llms potential for sequential recommendation by coordinating dual dynamic index mechanism,

J. Yin, Z. Zeng, M. Li, H. Yan, C. Li, W. Han, J. Zhang, R. Liu, H. Sun, W. Deng, F. Sun, Q. Zhang, S. Pan, and S. Wang, “Unleash llms potential for sequential recommendation by coordinating dual dynamic index mechanism,” inProceedings of the ACM on Web Conference, 2025

work page 2025
[8]

Multimodal quantitative language for generative recommendation,

J. Zhai, Z.-F. Mai, C.-D. Wang, F. Yang, X. Zheng, H. Li, and Y . Tian, “Multimodal quantitative language for generative recommendation,” in Proceedings of the International Conference on Learning Representa- tions, 2025

work page 2025
[9]

Lightgcn: Simplifying and powering graph convolution network for recommenda- tion,

X. He, K. Deng, X. Wang, Y . Li, Y . Zhang, and M. Wang, “Lightgcn: Simplifying and powering graph convolution network for recommenda- tion,” inProceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020

work page 2020
[10]

Self-attentive sequential recommenda- tion,

W.-C. Kang and J. McAuley, “Self-attentive sequential recommenda- tion,” inProceedings of the IEEE International Conference on Data Mining, 2018

work page 2018
[11]

Neural discrete representation learning,

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inProceedings of the International Conference on Neural Information Processing Systems, 2017

work page 2017
[12]

Autoregressive image generation using residual quantization,

D. Lee, C. Kim, S. Kim, M. Cho, and W.-S. Han, “Autoregressive image generation using residual quantization,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

work page 2022
[13]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning,

D. Guo, D. Yang, H. Zhang, J. Song, P. Wanget al., “Deepseek-r1 incentivizes reasoning in llms through reinforcement learning,”Nature, vol. 645, pp. 633–638, 2025

work page 2025
[14]

Llama: Open and efficient foundation language models,

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, and F. Azhar, “Llama: Open and efficient foundation language models,” 2023

work page 2023
[15]

Improving language understanding by generative pre-training,

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018

work page 2018
[16]

How do recommendation models amplify popularity bias? an analysis from the spectral perspective,

S. Lin, C. Gao, J. Chen, S. Zhou, B. Hu, Y . Feng, C. Chen, and C. Wang, “How do recommendation models amplify popularity bias? an analysis from the spectral perspective,” inProceedings of the ACM International Conference on Web Search and Data Mining, 2025

work page 2025
[17]

Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system,

T. Wei, F. Feng, J. Chen, Z. Wu, J. Yi, and X. He, “Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system,” inProceedings of the ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021

work page 2021
[18]

Popularity- opportunity bias in collaborative filtering,

Z. Zhu, Y . He, X. Zhao, Y . Zhang, J. Wang, and J. Caverlee, “Popularity- opportunity bias in collaborative filtering,” inProceedings of the ACM International Conference on Web Search and Data Mining, 2021

work page 2021
[19]

Item-side fairness of large language model-based recommendation system,

M. Jiang, K. Bao, J. Zhang, W. Wang, Z. Yang, F. Feng, and X. He, “Item-side fairness of large language model-based recommendation system,” inProceedings of the ACM Web Conference, 2024

work page 2024
[20]

Causally debiased time-aware recommendation,

L. Wang, C. Ma, X. Wu, Z. Qiu, Y . Zheng, and X. Chen, “Causally debiased time-aware recommendation,” inProceedings of the ACM Web Conference, 2024

work page 2024
[21]

A model-agnostic popularity debias training framework for click-through rate prediction in recommender system,

F. Zhang and Q. Shen, “A model-agnostic popularity debias training framework for click-through rate prediction in recommender system,” in Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

work page 2023
[22]

Personalised reranking of paper recommendations using paper content and user behavior,

X. Li, Y . Chen, B. Pettit, and M. D. Rijke, “Personalised reranking of paper recommendations using paper content and user behavior,”ACM Transactions on Information Systems, vol. 37, no. 3, pp. 1–23, 2019

work page 2019
[23]

Enhancing recommendation diversity by re-ranking with large language models,

D. Carraro and D. Bridge, “Enhancing recommendation diversity by re-ranking with large language models,”ACM Transactions on Recom- mender Systems, vol. 4, no. 2, pp. 1–40, 2025

work page 2025
[24]

Miettinen,Nonlinear multiobjective optimization

K. Miettinen,Nonlinear multiobjective optimization. Springer Science & Business Media, 1999, vol. 12

work page 1999
[25]

Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization,

D. Mahapatra and V . Rajan, “Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization,” in Proceedings of the International Conference on Machine Learning, 2020

work page 2020
[26]

Neural collaborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” inProceedings of the International Conference on World Wide Web, 2017

work page 2017
[27]

Towards fair large language model-based recommender systems with- out costly retraining,

J. Li, H. Gu, S. Wang, Q. Zhang, S. Yu, C. Wang, X. Xu, and F. Chen, “Towards fair large language model-based recommender systems with- out costly retraining,” inProceedings of the ACM Web Conference, 2026

work page 2026
[28]

Bringing reasoning to generative recommendation through the lens of cascaded ranking,

X. Lin, P. Liu, W. Wang, Y . Hu, C. Xu, F. Feng, Q. Wang, and T.-S. Chua, “Bringing reasoning to generative recommendation through the lens of cascaded ranking,” inProceedings of the ACM Web Conference, 2026

work page 2026
[29]

Qarm: Quantitative alignment multi-modal recommendation at kuaishou,

X. Luo, J. Cao, T. Sun, J. Yu, R. Huang, W. Yuan, H. Lin, Y . Zheng, S. Wang, Q. Hu, C. Qiu, J. Zhang, X. Zhang, Z. Yan, J. Zhang, S. Zhang, M. Wen, Z. Liu, and G. Zhou, “Qarm: Quantitative alignment multi-modal recommendation at kuaishou,” inProceedings of the ACM International Conference on Information and Knowledge Management, 2025

work page 2025
[30]

Mitigating popularity bias in recommendation with unbalanced interactions: A gra- dient perspective,

W. Ren, L. Wang, K. Liu, R. Guo, L. E. Peng, and Y . Fu, “Mitigating popularity bias in recommendation with unbalanced interactions: A gra- dient perspective,” inProceedings of the IEEE International Conference on Data Mining, 2022

work page 2022
[31]

Gradient starvation: A learning proclivity in neural networks,

M. Pezeshki, O. Kaba, Y . Bengio, A. C. Courville, D. Precup, and G. La- joie, “Gradient starvation: A learning proclivity in neural networks,” inProceedings of the International Conference on Neural Information Processing Systems, 2021

work page 2021
[32]

Neural text generation with unlikelihood training,

S. Welleck, I. Kulikov, S. Roller, E. Dinan, K. Cho, and J. Weston, “Neural text generation with unlikelihood training,” inProceedings of the International Conference on Learning Representations, 2020

work page 2020
[33]

Implicit unlikelihood train- ing: Improving neural text generation with reinforcement learning,

E. Lagutin, D. Gavrilov, and P. Kalaidin, “Implicit unlikelihood train- ing: Improving neural text generation with reinforcement learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

work page 2021
[34]

Generative recommendation with semantic ids: A practi- tioner’s handbook,

C. M. Ju, L. Collins, L. Neves, B. Kumar, L. Y . Wang, T. Zhao, and N. Shah, “Generative recommendation with semantic ids: A practi- tioner’s handbook,” inProceedings of the ACM International Conference on Information and Knowledge Management, 2025

work page 2025
[35]

Bias and debias in recommender system: A survey and future directions,

J. Chen, H. Dong, X. Wang, F. Feng, M. Wang, and X. He, “Bias and debias in recommender system: A survey and future directions,”ACM Transactions on Information Systems, vol. 41, no. 3, pp. 1–39, 2023

work page 2023
[36]

Recommendations as treatments: Debiasing learning and evaluation,

T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims, “Recommendations as treatments: Debiasing learning and evaluation,” inProceedings of the International Conference on Machine Learning, 2016

work page 2016
[37]

Causal inference in recommender systems: A survey and future directions,

C. Gao, Y . Zheng, W. Wang, F. Feng, X. He, and Y . Li, “Causal inference in recommender systems: A survey and future directions,” ACM Transactions on Information Systems,, vol. 42, no. 4, pp. 1–32, 2024

work page 2024
[38]

Causal intervention for leveraging popularity bias in recommendation,

Y . Zhang, F. Feng, X. He, T. Wei, C. Song, G. Ling, and Y . Zhang, “Causal intervention for leveraging popularity bias in recommendation,” inProceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

work page 2021
[39]

An algorithm for vector quantizer design,

Y . Linde, A. Buzo, and R. Gray, “An algorithm for vector quantizer design,”IEEE Transactions on communications, vol. 28, no. 1, pp. 84– 95, 1980

work page 1980
[40]

Taming transformers for high- resolution image synthesis,

P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2021, pp. 12 873– 12 883

work page 2021
[41]

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,

R. He and J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” inProceedings of the International Conference on World Wide Web, 2016

work page 2016
[42]

Sinkhorn distances: lightspeed computation of optimal transport,

M. Cuturi, “Sinkhorn distances: lightspeed computation of optimal transport,” inProceedings of the International Conference on Neural Information Processing Systems, vol. 2, 2013, pp. 2292–2300. PREPRINT MANUSCRIPT 12

work page 2013
[43]

Qwen2 Technical Report

A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, C. Li, C. Li, D. Liu, F. Huanget al., “Qwen2 technical report,”arXiv preprint arXiv:2407.10671, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

Qwen2.5: A party of foundation models,

Q. Team, “Qwen2.5: A party of foundation models,” September 2024. [Online]. Available: https://qwenlm.github.io/blog/qwen2.5/

work page 2024
[45]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inProceedings of the International Conference on Learning Represen- tations, 2015

work page 2015
[47]

Openonerec technical report.arXiv preprint arXiv:2512.24762, 2025

G. Zhou, H. Bao, J. Huang, J. Deng, J. Zhang, J. She, K. Cai, L. Ren, L. Ren, Q. Luoet al., “Openonerec technical report,”arXiv preprint arXiv:2512.24762, 2025

work page arXiv 2025
[48]

Sprec: Self- play to debias llm-based recommendation,

C. Gao, R. Chen, S. Yuan, K. Huang, Y . Yu, and X. He, “Sprec: Self- play to debias llm-based recommendation,” inProceedings of the ACM on Web Conference, 2025. PREPRINT MANUSCRIPT 13 APPENDIXA NOTATION The notations and corresponding descriptions are summarized in Table IV. TABLE IV SUMMARY OFMATHEMATICAL ANDMODELNOTATIONS Symbol Description Symbol Descr...

work page 2025

[1] [1]

Deep interest network for click-through rate prediction,

G. Zhou, C. Song, X. Zhu, Y . Fan, H. Zhu, X. Ma, Y . Yan, J. Jin, H. Li, and K. Gai, “Deep interest network for click-through rate prediction,” inProceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

work page 2018

[2] [2]

Deep neural networks for youtube recommendations,

P. Covington, J. Adams, and E. Sargin, “Deep neural networks for youtube recommendations,” inProceedings of the ACM Conference on Recommender Systems, 2016

work page 2016

[3] [3]

Deepinf: Social influence prediction with deep learning,

J. Qiu, J. Tang, H. Ma, Y . Dong, K. Wang, and J. Tang, “Deepinf: Social influence prediction with deep learning,” inProceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018

work page 2018

[4] [4]

Recommender systems with generative retrieval,

S. Rajput, N. Mehta, A. Singh, R. H. Keshavan, T. Vu, L. Heldt, L. Hong, Y . Tay, V . Q. Tran, J. Samost, M. Kula, E. H. Chi, and M. Sathiamoorthy, “Recommender systems with generative retrieval,” inProceedings of the International Conference on Neural Information Processing Systems, 2023

work page 2023

[5] [5]

Adapt- ing large language models by integrating collaborative semantics for recommendation,

B. Zheng, Y . Hou, H. Lu, Y . Chen, W. X. Zhao, and M. Chen, “Adapt- ing large language models by integrating collaborative semantics for recommendation,” inProceedings of the IEEE International Conference on Data Engineering, 2024

work page 2024

[6] [6]

Learnable item tokenization for generative recommendation,

W. Wang, H. Bao, X. Lin, J. Zhang, Y . Li, F. Feng, S.-K. Ng, and T.-S. Chua, “Learnable item tokenization for generative recommendation,” in Proceedings of the ACM International Conference on Information and Knowledge Management, 2024

work page 2024

[7] [7]

Unleash llms potential for sequential recommendation by coordinating dual dynamic index mechanism,

J. Yin, Z. Zeng, M. Li, H. Yan, C. Li, W. Han, J. Zhang, R. Liu, H. Sun, W. Deng, F. Sun, Q. Zhang, S. Pan, and S. Wang, “Unleash llms potential for sequential recommendation by coordinating dual dynamic index mechanism,” inProceedings of the ACM on Web Conference, 2025

work page 2025

[8] [8]

Multimodal quantitative language for generative recommendation,

J. Zhai, Z.-F. Mai, C.-D. Wang, F. Yang, X. Zheng, H. Li, and Y . Tian, “Multimodal quantitative language for generative recommendation,” in Proceedings of the International Conference on Learning Representa- tions, 2025

work page 2025

[9] [9]

Lightgcn: Simplifying and powering graph convolution network for recommenda- tion,

X. He, K. Deng, X. Wang, Y . Li, Y . Zhang, and M. Wang, “Lightgcn: Simplifying and powering graph convolution network for recommenda- tion,” inProceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020

work page 2020

[10] [10]

Self-attentive sequential recommenda- tion,

W.-C. Kang and J. McAuley, “Self-attentive sequential recommenda- tion,” inProceedings of the IEEE International Conference on Data Mining, 2018

work page 2018

[11] [11]

Neural discrete representation learning,

A. van den Oord, O. Vinyals, and K. Kavukcuoglu, “Neural discrete representation learning,” inProceedings of the International Conference on Neural Information Processing Systems, 2017

work page 2017

[12] [12]

Autoregressive image generation using residual quantization,

D. Lee, C. Kim, S. Kim, M. Cho, and W.-S. Han, “Autoregressive image generation using residual quantization,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

work page 2022

[13] [13]

Deepseek-r1 incentivizes reasoning in llms through reinforcement learning,

D. Guo, D. Yang, H. Zhang, J. Song, P. Wanget al., “Deepseek-r1 incentivizes reasoning in llms through reinforcement learning,”Nature, vol. 645, pp. 633–638, 2025

work page 2025

[14] [14]

Llama: Open and efficient foundation language models,

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, and F. Azhar, “Llama: Open and efficient foundation language models,” 2023

work page 2023

[15] [15]

Improving language understanding by generative pre-training,

A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” 2018

work page 2018

[16] [16]

How do recommendation models amplify popularity bias? an analysis from the spectral perspective,

S. Lin, C. Gao, J. Chen, S. Zhou, B. Hu, Y . Feng, C. Chen, and C. Wang, “How do recommendation models amplify popularity bias? an analysis from the spectral perspective,” inProceedings of the ACM International Conference on Web Search and Data Mining, 2025

work page 2025

[17] [17]

Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system,

T. Wei, F. Feng, J. Chen, Z. Wu, J. Yi, and X. He, “Model-agnostic counterfactual reasoning for eliminating popularity bias in recommender system,” inProceedings of the ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021

work page 2021

[18] [18]

Popularity- opportunity bias in collaborative filtering,

Z. Zhu, Y . He, X. Zhao, Y . Zhang, J. Wang, and J. Caverlee, “Popularity- opportunity bias in collaborative filtering,” inProceedings of the ACM International Conference on Web Search and Data Mining, 2021

work page 2021

[19] [19]

Item-side fairness of large language model-based recommendation system,

M. Jiang, K. Bao, J. Zhang, W. Wang, Z. Yang, F. Feng, and X. He, “Item-side fairness of large language model-based recommendation system,” inProceedings of the ACM Web Conference, 2024

work page 2024

[20] [20]

Causally debiased time-aware recommendation,

L. Wang, C. Ma, X. Wu, Z. Qiu, Y . Zheng, and X. Chen, “Causally debiased time-aware recommendation,” inProceedings of the ACM Web Conference, 2024

work page 2024

[21] [21]

A model-agnostic popularity debias training framework for click-through rate prediction in recommender system,

F. Zhang and Q. Shen, “A model-agnostic popularity debias training framework for click-through rate prediction in recommender system,” in Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2023

work page 2023

[22] [22]

Personalised reranking of paper recommendations using paper content and user behavior,

X. Li, Y . Chen, B. Pettit, and M. D. Rijke, “Personalised reranking of paper recommendations using paper content and user behavior,”ACM Transactions on Information Systems, vol. 37, no. 3, pp. 1–23, 2019

work page 2019

[23] [23]

Enhancing recommendation diversity by re-ranking with large language models,

D. Carraro and D. Bridge, “Enhancing recommendation diversity by re-ranking with large language models,”ACM Transactions on Recom- mender Systems, vol. 4, no. 2, pp. 1–40, 2025

work page 2025

[24] [24]

Miettinen,Nonlinear multiobjective optimization

K. Miettinen,Nonlinear multiobjective optimization. Springer Science & Business Media, 1999, vol. 12

work page 1999

[25] [25]

Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization,

D. Mahapatra and V . Rajan, “Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization,” in Proceedings of the International Conference on Machine Learning, 2020

work page 2020

[26] [26]

Neural collaborative filtering,

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.-S. Chua, “Neural collaborative filtering,” inProceedings of the International Conference on World Wide Web, 2017

work page 2017

[27] [27]

Towards fair large language model-based recommender systems with- out costly retraining,

J. Li, H. Gu, S. Wang, Q. Zhang, S. Yu, C. Wang, X. Xu, and F. Chen, “Towards fair large language model-based recommender systems with- out costly retraining,” inProceedings of the ACM Web Conference, 2026

work page 2026

[28] [28]

Bringing reasoning to generative recommendation through the lens of cascaded ranking,

X. Lin, P. Liu, W. Wang, Y . Hu, C. Xu, F. Feng, Q. Wang, and T.-S. Chua, “Bringing reasoning to generative recommendation through the lens of cascaded ranking,” inProceedings of the ACM Web Conference, 2026

work page 2026

[29] [29]

Qarm: Quantitative alignment multi-modal recommendation at kuaishou,

X. Luo, J. Cao, T. Sun, J. Yu, R. Huang, W. Yuan, H. Lin, Y . Zheng, S. Wang, Q. Hu, C. Qiu, J. Zhang, X. Zhang, Z. Yan, J. Zhang, S. Zhang, M. Wen, Z. Liu, and G. Zhou, “Qarm: Quantitative alignment multi-modal recommendation at kuaishou,” inProceedings of the ACM International Conference on Information and Knowledge Management, 2025

work page 2025

[30] [30]

Mitigating popularity bias in recommendation with unbalanced interactions: A gra- dient perspective,

W. Ren, L. Wang, K. Liu, R. Guo, L. E. Peng, and Y . Fu, “Mitigating popularity bias in recommendation with unbalanced interactions: A gra- dient perspective,” inProceedings of the IEEE International Conference on Data Mining, 2022

work page 2022

[31] [31]

Gradient starvation: A learning proclivity in neural networks,

M. Pezeshki, O. Kaba, Y . Bengio, A. C. Courville, D. Precup, and G. La- joie, “Gradient starvation: A learning proclivity in neural networks,” inProceedings of the International Conference on Neural Information Processing Systems, 2021

work page 2021

[32] [32]

Neural text generation with unlikelihood training,

S. Welleck, I. Kulikov, S. Roller, E. Dinan, K. Cho, and J. Weston, “Neural text generation with unlikelihood training,” inProceedings of the International Conference on Learning Representations, 2020

work page 2020

[33] [33]

Implicit unlikelihood train- ing: Improving neural text generation with reinforcement learning,

E. Lagutin, D. Gavrilov, and P. Kalaidin, “Implicit unlikelihood train- ing: Improving neural text generation with reinforcement learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, 2021

work page 2021

[34] [34]

Generative recommendation with semantic ids: A practi- tioner’s handbook,

C. M. Ju, L. Collins, L. Neves, B. Kumar, L. Y . Wang, T. Zhao, and N. Shah, “Generative recommendation with semantic ids: A practi- tioner’s handbook,” inProceedings of the ACM International Conference on Information and Knowledge Management, 2025

work page 2025

[35] [35]

Bias and debias in recommender system: A survey and future directions,

J. Chen, H. Dong, X. Wang, F. Feng, M. Wang, and X. He, “Bias and debias in recommender system: A survey and future directions,”ACM Transactions on Information Systems, vol. 41, no. 3, pp. 1–39, 2023

work page 2023

[36] [36]

Recommendations as treatments: Debiasing learning and evaluation,

T. Schnabel, A. Swaminathan, A. Singh, N. Chandak, and T. Joachims, “Recommendations as treatments: Debiasing learning and evaluation,” inProceedings of the International Conference on Machine Learning, 2016

work page 2016

[37] [37]

Causal inference in recommender systems: A survey and future directions,

C. Gao, Y . Zheng, W. Wang, F. Feng, X. He, and Y . Li, “Causal inference in recommender systems: A survey and future directions,” ACM Transactions on Information Systems,, vol. 42, no. 4, pp. 1–32, 2024

work page 2024

[38] [38]

Causal intervention for leveraging popularity bias in recommendation,

Y . Zhang, F. Feng, X. He, T. Wei, C. Song, G. Ling, and Y . Zhang, “Causal intervention for leveraging popularity bias in recommendation,” inProceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, 2021

work page 2021

[39] [39]

An algorithm for vector quantizer design,

Y . Linde, A. Buzo, and R. Gray, “An algorithm for vector quantizer design,”IEEE Transactions on communications, vol. 28, no. 1, pp. 84– 95, 1980

work page 1980

[40] [40]

Taming transformers for high- resolution image synthesis,

P. Esser, R. Rombach, and B. Ommer, “Taming transformers for high- resolution image synthesis,” inProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2021, pp. 12 873– 12 883

work page 2021

[41] [41]

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,

R. He and J. McAuley, “Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering,” inProceedings of the International Conference on World Wide Web, 2016

work page 2016

[42] [42]

Sinkhorn distances: lightspeed computation of optimal transport,

M. Cuturi, “Sinkhorn distances: lightspeed computation of optimal transport,” inProceedings of the International Conference on Neural Information Processing Systems, vol. 2, 2013, pp. 2292–2300. PREPRINT MANUSCRIPT 12

work page 2013

[43] [43]

Qwen2 Technical Report

A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, C. Li, C. Li, D. Liu, F. Huanget al., “Qwen2 technical report,”arXiv preprint arXiv:2407.10671, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [44]

Qwen2.5: A party of foundation models,

Q. Team, “Qwen2.5: A party of foundation models,” September 2024. [Online]. Available: https://qwenlm.github.io/blog/qwen2.5/

work page 2024

[45] [45]

Qwen3 Technical Report

A. Yang, A. Li, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Gao, C. Huang, C. Lvet al., “Qwen3 technical report,”arXiv preprint arXiv:2505.09388, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

Adam: A method for stochastic optimization,

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” inProceedings of the International Conference on Learning Represen- tations, 2015

work page 2015

[47] [47]

Openonerec technical report.arXiv preprint arXiv:2512.24762, 2025

G. Zhou, H. Bao, J. Huang, J. Deng, J. Zhang, J. She, K. Cai, L. Ren, L. Ren, Q. Luoet al., “Openonerec technical report,”arXiv preprint arXiv:2512.24762, 2025

work page arXiv 2025

[48] [48]

Sprec: Self- play to debias llm-based recommendation,

C. Gao, R. Chen, S. Yuan, K. Huang, Y . Yu, and X. He, “Sprec: Self- play to debias llm-based recommendation,” inProceedings of the ACM on Web Conference, 2025. PREPRINT MANUSCRIPT 13 APPENDIXA NOTATION The notations and corresponding descriptions are summarized in Table IV. TABLE IV SUMMARY OFMATHEMATICAL ANDMODELNOTATIONS Symbol Description Symbol Descr...

work page 2025