Recognition: unknown
The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems
Pith reviewed 2026-05-07 12:35 UTC · model grok-4.3
The pith
How user states are represented matters more for recommendations than the bandit algorithm chosen.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that different embedding-based representations of user state, built from matrix factorization of interaction data, affect the performance of bandit algorithms in recommenders more than the algorithms themselves do. The experiments demonstrate greater gains from representation changes and show that the best representation depends on the dataset.
What carries the argument
Embedding-based representations of user state, created using matrix factorization models and various aggregation methods for history.
Load-bearing premise
The tested embedding methods and ways of combining them are typical of what is used, and the results from the specific datasets used will hold for other recommendation problems.
What would settle it
Running the same comparisons on new datasets or with other common embedding techniques and finding that algorithm changes consistently lead to larger improvements than state representation changes.
Figures
read the original abstract
With the increasing availability of online information, recommender systems have become an important tool for many web-based systems. Due to the continuous aspect of recommendation environments, these systems increasingly rely on contextual multi-armed bandits (CMAB) to deliver personalized and real-time suggestions. A critical yet underexplored component in these systems is the representation of user state, which typically encapsulates the user's interaction history and is deeply correlated with the model's decisions and learning. In this paper, we investigate the impact of different embedding-based state representations derived from matrix factorization models on the performance of traditional CMAB algorithms. Our large-scale experiments reveal that variations in state representation can lead to improvements greater than those achieved by changing the bandit algorithm itself. Furthermore, no single embedding or aggregation strategy consistently dominates across datasets, underscoring the need for domain-specific evaluation. These results expose a substantial gap in the literature and emphasize that advancing bandit-based recommender systems requires a holistic approach that prioritizes embedding quality and state construction alongside algorithmic innovation. The source code for our experiments is publicly available on https://github.com/UFSCar-LaSID/bandits_blind_spot.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that in contextual multi-armed bandit (CMAB) recommender systems, variations in embedding-based user state representations (derived from matrix factorization models) produce larger performance gains than changes to the underlying bandit algorithm. Large-scale experiments across multiple datasets show that no single embedding or aggregation strategy dominates, highlighting the need for domain-specific evaluation and a more holistic focus on state construction alongside algorithmic advances. Source code is released publicly.
Significance. If the empirical comparisons hold, the result would be significant for the CMAB recommender literature by shifting attention from algorithm selection to state-representation design, a component the authors argue has been underexplored. The public code release strengthens reproducibility and allows direct verification of the reported deltas between state and algorithm effects.
major comments (3)
- [Experiments] Experiments section: the central claim that state-representation variations yield larger improvements than algorithm swaps is not accompanied by reported statistical tests, number of independent runs, confidence intervals, or error bars on the performance deltas. Without these, it is impossible to assess whether the observed ranking of effects is robust or could be explained by variance in the large-scale runs.
- [Methodology] Methodology and Experiments: all tested state representations are constructed from matrix-factorization embeddings; the paper does not include non-MF baselines (e.g., sequence models or graph embeddings) or ablation on embedding dimensionality and training data. This limits the strength of the generalization that state representation is broadly more important than algorithm choice.
- [Results] Results: the claim of “improvements greater than those achieved by changing the bandit algorithm itself” requires explicit quantification (e.g., average relative lift and standard deviation across datasets and algorithms). The current presentation leaves unclear whether the effect size is consistent or driven by particular dataset–algorithm pairs.
minor comments (2)
- [Abstract] The abstract and introduction use “large-scale experiments” without defining scale (number of users, interactions, or time steps). Adding these figures would improve clarity.
- [Methodology] Notation for aggregation strategies (e.g., mean, attention, etc.) should be introduced with a small table or explicit equations in the methodology section to avoid ambiguity when comparing results.
Simulated Author's Rebuttal
We thank the referee for the thoughtful and constructive comments. We address each major point below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: Experiments section: the central claim that state-representation variations yield larger improvements than algorithm swaps is not accompanied by reported statistical tests, number of independent runs, confidence intervals, or error bars on the performance deltas. Without these, it is impossible to assess whether the observed ranking of effects is robust or could be explained by variance in the large-scale runs.
Authors: We agree that the absence of statistical tests and variability measures weakens the presentation of the central claim. Our experiments were run with 5 independent random seeds per configuration, but these details and the resulting standard deviations were not reported. In the revision we will add error bars (standard deviation), explicitly state the number of runs, and include paired statistical tests comparing the magnitude of state-representation effects versus algorithm effects. These additions will be placed in the Experiments and Results sections. revision: yes
-
Referee: Methodology and Experiments: all tested state representations are constructed from matrix-factorization embeddings; the paper does not include non-MF baselines (e.g., sequence models or graph embeddings) or ablation on embedding dimensionality and training data. This limits the strength of the generalization that state representation is broadly more important than algorithm choice.
Authors: The manuscript deliberately restricts its scope to embedding-based user states derived from matrix factorization, which remains the dominant practical approach for initializing CMAB recommenders. We therefore did not evaluate sequence or graph embeddings. We acknowledge that this choice limits broader generalization claims. In the revised version we will (i) clarify the intended scope in the introduction and abstract, (ii) add a dedicated limitations paragraph, and (iii) suggest future work on non-MF representations. No new experiments will be added at this stage. revision: partial
-
Referee: Results: the claim of “improvements greater than those achieved by changing the bandit algorithm itself” requires explicit quantification (e.g., average relative lift and standard deviation across datasets and algorithms). The current presentation leaves unclear whether the effect size is consistent or driven by particular dataset–algorithm pairs.
Authors: We will revise the Results section to provide the requested quantification. New summary tables will report, for each dataset, the average relative improvement (and standard deviation) obtained by varying state representations versus varying the bandit algorithm. We will also highlight any dataset–algorithm pairs that drive the overall trend. These changes will make the effect sizes transparent and allow readers to judge consistency. revision: yes
Circularity Check
No circularity: empirical comparisons of state representations vs. algorithms rest on direct experimental outcomes
full rationale
The paper reports results from large-scale experiments that directly measure performance deltas when varying embedding-based state representations (derived from matrix factorization) versus swapping CMAB algorithms across multiple datasets. No derivation chain, first-principles prediction, or fitted parameter is claimed; the central finding that state-representation changes can exceed algorithm changes is presented as an observed empirical pattern, not a mathematical reduction. Self-citations are absent from the load-bearing claims, and the work is self-contained against the reported benchmarks without any step that renames or re-derives its own inputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Matrix factorization models produce useful embeddings of user interaction history for state representation
- domain assumption Contextual multi-armed bandits are an appropriate model for sequential personalized recommendation
Reference graph
Works this paper leans on
-
[1]
Advances in Engineering Software42(12), 1020–1034 (2011)
J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender systems survey.Knowledge-Based Systems46 (2013), 109–132. doi:10.1016/j. knosys.2013.03.012
work page doi:10.1016/j 2013
-
[2]
Elisa Celis, Sayash Kapoor, Farnood Salehi, and Nisheeth Vishnoi
L. Elisa Celis, Sayash Kapoor, Farnood Salehi, and Nisheeth Vishnoi. 2019. Control- ling Polarization in Personalization: An Algorithmic Framework. InProceedings of the Conference on Fairness, Accountability, and Transparency(Atlanta, GA, USA) (FAT ’19). Association for Computing Machinery, New York, NY, USA, 160–169. doi:10.1145/3287560.3287601
-
[3]
Haw-Shiuan Chang, Nikhil Agarwal, and Andrew McCallum. 2024. To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Se- quential Recommenders. InProceedings of the 17th ACM International Conference on Web Search and Data Mining(Merida, Mexico)(WSDM ’24). Association for Computing Machinery, New York, NY, USA, 67–76. doi:10...
-
[4]
Minmin Chen, Yuyan Wang, Can Xu, Ya Le, Mohit Sharma, Lee Richardson, Su- Lin Wu, and Ed Chi. 2021. Values of User Exploration in Recommender Systems. InProceedings of the 15th ACM Conference on Recommender Systems(Amsterdam, Netherlands)(RecSys ’21). Association for Computing Machinery, New York, NY, USA, 85–95. doi:10.1145/3460231.3474236
-
[5]
Ziqiang Cui, Haolun Wu, Bowei He, Ji Cheng, and Chen Ma. 2024. Diffusion- based Contrastive Learning for Sequential Recommendation. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 404–414. doi:10.1145/3627673.3679655 arXiv:2405.09369 [cs]. SAC’26, March 23–27, 2026, Thessaloniki, Greece P. R. Pires et al
-
[6]
Hamid Dadkhahi and Sahand Negahban. 2018. Alternating Linear Bandits for Online Matrix-Factorization Recommendation.arXiv preprint(2018), 1–10. arXiv:1810.09401 [cs.IR] doi:10.48550/arXiv.1810.09401
-
[7]
Xinyu Du, Huanhuan Yuan, Pengpeng Zhao, Jianfeng Qu, Fuzhen Zhuang, Guan- feng Liu, Yanchi Liu, and Victor S. Sheng. 2023. Frequency Enhanced Hybrid Attention Network for Sequential Recommendation. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval(Taipei, Taiwan)(SIGIR ’23). Association for C...
-
[8]
Diego Feijer, Himan Abdollahpouri, Sanket Gupta, Alexander Clare, Yuxiao Wen, Todd Wasson, Maria Dimakopoulou, Zahra Nazari, Kyle Kretschman, and Mounia Lalmas. 2025. Calibrated Recommendations with Contextual Bandits. (Sept. 2025). doi:10.48550/arXiv.2509.05460 arXiv:2509.05460 [cs]
-
[9]
Zhichao Feng, Pengfei Wang, Kaiyuan Li, Chenliang Li, and Shangguang Wang
-
[10]
InProceedings of the 17th ACM International Conference on Web Search and Data Mining
Contextual MAB Oriented Embedding Denoising for Sequential Recom- mendation. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. ACM, Merida Mexico, 199–207. doi:10.1145/3616855.3635798
-
[11]
João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. 2012. On evaluating stream learning algorithms.Machine Learning90, 3 (2012), 317–346. doi:10.1007/ s10994-012-5320-9
2012
-
[12]
Mingxin Gan and O-Chol Kwon. 2022. A knowledge-enhanced contextual bandit approach for personalized recommendation in dynamic domains.Knowledge- Based Systems251, C (2022), 1–11. doi:10.1016/j.knosys.2022.109158
-
[13]
Dalin Guo, Sofia Ira Ktena, Pranay Kumar Myana, Ferenc Huszar, Wenzhe Shi, Alykhan Tejani, Michael Kneier, and Sourav Das. 2020. Deep Bayesian Bandits: Exploring in Online Personalized Recommendations. InProceedings of the 14th ACM Conference on Recommender Systems(Virtual Event, Brazil)(RecSys ’20). Association for Computing Machinery, New York, NY, USA,...
-
[14]
Hongbo Guo and Zheqing Zhu. 2024. Uncertainty of Joint Neural Contextual Bandit. (2024)
2024
-
[15]
Bowei He, Xu He, Renrui Zhang, Yingxue Zhang, Ruiming Tang, and Chen Ma
-
[16]
Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management(Birmingham, United Kingdom) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 741–750. doi:10.1145/3583780.3615135
-
[17]
Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. InProceedings of the 26th International Conference on World Wide Web (WWW ‘17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173—-182. doi:10.1145/3038912.3052569
-
[18]
Meisam Hejazinia, Kyler Eastman, Shuqin Ye, Abbas Amirabadi, and Ravi Divvela
-
[19]
Accelerated learning from recommender systems using multi-armed bandit. arXiv preprint(2019), 1–8. arXiv:1908.06158 [cs.IR] doi:10.48550/arXiv.1908.06158
-
[20]
Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. InProceedings of the 8th IEEE International Conference on Data Mining(Pisa, Italy)(ICDM’08). IEEE Computer Society, New York, NY, USA, 263–272. doi:10.1109/ICDM.2008.22
- [21]
-
[22]
Rolf Jagerman, Ilya Markov, and Maarten de Rijke. 2019. When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environ- ments. InProceedings of the 12th ACM International Conference on Web Search and Data Mining(Melbourne, Australia)(WSDM ’19). Association for Computing Machinery, New York, NY, USA, 447–455. doi:10.1145/3289...
-
[23]
Serdar Kadıoğlu and Bernard Kleynhans. 2024. Building Higher-Order Abstrac- tions from the Components of Recommender Systems. InProceedings of the AAAI Conference on Artificial Intelligence(Vancouver, Canada)(AAAI-24). AAAI Press, Washington, DC, USA, 22998–23004. doi:10.1609/aaai.v38i21.30341
-
[24]
Ashutosh Kakadiya, Sriraam Natarajan, and Balaraman Ravindran. 2020. Re- lational Boosted Bandits.arXiv preprint(2020), 1–8. arXiv:2012.09220 [cs.LG] doi:10.48550/arXiv.2012.09220
-
[25]
Khashayar Khosravi, Renato Paes Leme, Chara Podimata, and Apostolis Tsor- vantzis. 2024. Preferences Evolve and so Should Your Bandits: Bandits with Evolving States for Online Platforms. InProceedings of the 25th ACM Conference on Economics and Computation(New Haven, CT, USA)(EC ’24). Association for Computing Machinery, New York, NY, USA, 101. doi:10.114...
-
[26]
Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Tech- niques For Recommender Systems.Computer42, 8 (2009), 30–37. doi:10.1109/ MC.2009.263
2009
-
[27]
John Langford and Tong Zhang. 2007. The Epoch-Greedy algorithm for contextual multi-armed bandits. InProceedings of the 20th International Conference on Neural Information Processing Systems(Vancouver, Canada)(NIPS ’07). Curran Associates Inc., Red Hook, NY, USA, 817–824. doi:10.5555/2981562.2981665
-
[28]
Chang Li, Haoyun Feng, and Maarten de Rijke. 2020. Cascading Hybrid Bandits: Online Learning to Rank for Relevance and Diversity. InProceedings of the 14th ACM Conference on Recommender Systems (RecSys ’20). ACM, Virtual Event, Brazil, 33–42. doi:10.1145/3383313.3412214
-
[29]
Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A Contextual- Bandit Approach to Personalized News Article Recommendation. InProceedings of the 19th International Conference on World Wide Web(Madrid, Spain)(WWW ’09). Association for Computing Machinery, New York, NY, USA, 661–670. doi:10. 1145/1772690.1772758
-
[30]
Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative Filtering Bandits. InProceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(Pisa, Italy)(SIGIR ’16). Association for Computing Machinery, New York, NY, USA, 539–548. doi:10. 1145/2911451.2911548
-
[31]
Bo Liu, Ying Wei, Yu Zhang, Zhixian Yan, and Qiang Yang. 2018. Transferable Contextual Bandit for Cross-Domain Recommendation. InProceedings of the 32nd AAAI Conference on Artificial Intelligence(New Orleans, LA, USA)(AAAI ’18). AAAI Press, Palo Alto, CA, USA, 3619–3626. doi:10.1609/aaai.v32i1.11699
-
[32]
Jérémie Mary, Romaric Gaudel, and Preux Philippe. 2014. Bandits Warm-up Cold Recommender Systems.arXiv preprint(2014), 1–18. arXiv:1407.2806 [cs.LG] doi:10.48550/arXiv.1407.2806
-
[33]
James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. 2018. Explore, exploit, and explain: personalizing explainable recommendations with bandits. InProceedings of the 12th ACM Conference on Recommender Systems(Vancouver, Canada)(RecSys ’18). Association for Computing Machinery, New York, NY, ...
-
[34]
doi:10.1007/978-3-540-72079-9_10
Michael J. Pazzani and Daniel Billsus. 2007. Content-Based Recommendation Systems.The Adaptive WebLecture Notes in Computer Science, vol 4321 (2007), 325–341. doi:10.1007/978-3-540-72079-9_10
-
[35]
Yunzhe Qi, Yikun Ban, and Jingrui He. 2023. Graph Neural Bandits. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA)(KDD ’23). Association for Computing Machinery, New York, NY, USA, 1920–1931. doi:10.1145/3580305.3599371
-
[36]
Stefen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme
-
[37]
InProceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI ‘09)
BPR: Bayesian Personalized Ranking from Implicit Feedback. InProceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI ‘09). AUAI Press, Montreal, Canada, 452–461. doi:10.5555/1795114.1795167
-
[38]
Ilya Shenbin, Anton Alekseev, Elena Tutubalina, Valentin Malykh, and Sergey I. Nikolenko. 2020. RecVAE: A New Variational Autoencoder for Top-N Recommen- dations with Implicit Feedback. InProceedings of the 13th International Conference on Web Search and Data Mining(Houston, TX, USA)(WSDM ’20). Association for Computing Machinery, New York, NY, USA, 528–5...
-
[39]
Yehjin Shin, Jeongwhan Choi, Hyowon Wi, and Noseong Park. 2024. An atten- tive inductive bias for sequential recommendation beyond the self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8984–8992
2024
-
[40]
Nícollas Silva, Heitor Werneck, Thiago Silva, Adriano C. M. Pereira, and Leonardo Rocha. 2022. Multi-Armed Bandits in Recommendation Systems: A survey of the state-of-the-art and future directions.Expert Systems with Applications197, 1 (2022), 1–17. doi:10.1016/j.eswa.2022.116669
-
[41]
João Vinagre, Alípio Mário Jorge, and João Gama. 2015. An overview on the exploitation of time in collaborative filtering.WIREs Data Mining and Knowledge Discovery5 (2015), 195–215. doi:10.1002/widm.1160
-
[42]
Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017. Factorization Bandits for Interactive Recommendation. InProceedings of the 31st AAAI Conference on Artificial Intelligence(San Francisco, CA, USA)(AAAI ’17). AAAI Press, Palo Alto, CA, USA, 2695–2702. doi:10.1609/aaai.v31i1.10936
-
[43]
Shenghao Xu. 2021. BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System.arXiv preprint(2021), 1–56. arXiv:2106.10898 [cs.IR] doi:10.48550/arXiv.2106.10898
-
[44]
Li Zhou and Emma Brunskill. 2016. Latent contextual bandits and their appli- cation to personalized recommendations for new users. InProceedings of the 25th International Joint Conference on Artificial Intelligence(New York, NY, USA) (IJCAI’16). AAAI Press, 3646–3653
2016
- [45]
-
[46]
Chenhao Zhu. 2025. Comparative Analysis of Multi-armed Bandits Models for Recommendation Systems. InProceedings of the 4th International Conference on Computing Innovation and Applied Physics(Eskisehir, Turkey)(CONF-CIAP ’25). EWA Publishing, Oxford, United Kingdom, 95–100. doi:10.54254/2753-8818/2025. 20167
-
[47]
Zheqing Zhu and Benjamin Van Roy. 2023. Scalable Neural Contextual Bandit for Recommender Systems. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management(Birmingham, United Kingdom) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 3636–
2023
-
[48]
doi:10.1145/3583780.3615048
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.