pith. machine review for the scientific record. sign in

arxiv: 2604.26651 · v1 · submitted 2026-04-29 · 💻 cs.IR · cs.LG

Recognition: unknown

The Bandit's Blind Spot: The Critical Role of User State Representation in Recommender Systems

Authors on Pith no claims yet

Pith reviewed 2026-05-07 12:35 UTC · model grok-4.3

classification 💻 cs.IR cs.LG
keywords recommender systemsbandit algorithmsuser statematrix factorizationembeddingspersonalizationperformance evaluation
0
0 comments X

The pith

How user states are represented matters more for recommendations than the bandit algorithm chosen.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that in systems recommending items in real time, the method used to summarize a user's past actions into a current state has a bigger effect on success than which bandit algorithm is used to pick the next item. Tests with large amounts of data showed that different ways of creating these states from user-item matrices produced bigger improvements than changing the decision algorithm. No single method for building the states was best on every dataset, so each application needs its own checks. This suggests that getting the state right is a key but often overlooked part of making good recommenders.

Core claim

The central claim is that different embedding-based representations of user state, built from matrix factorization of interaction data, affect the performance of bandit algorithms in recommenders more than the algorithms themselves do. The experiments demonstrate greater gains from representation changes and show that the best representation depends on the dataset.

What carries the argument

Embedding-based representations of user state, created using matrix factorization models and various aggregation methods for history.

Load-bearing premise

The tested embedding methods and ways of combining them are typical of what is used, and the results from the specific datasets used will hold for other recommendation problems.

What would settle it

Running the same comparisons on new datasets or with other common embedding techniques and finding that algorithm changes consistently lead to larger improvements than state representation changes.

Figures

Figures reproduced from arXiv: 2604.26651 by Gregorio F. Azevedo, Pedro R. Pires, Pietro L. Campos, Rafael T. Sereicikas, Tiago A. Almeida.

Figure 1
Figure 1. Figure 1: Experimental protocol used to benchmark the models. view at source ↗
Figure 2
Figure 2. Figure 2: Cumulative NDCG@20 for every partition on the test set. view at source ↗
read the original abstract

With the increasing availability of online information, recommender systems have become an important tool for many web-based systems. Due to the continuous aspect of recommendation environments, these systems increasingly rely on contextual multi-armed bandits (CMAB) to deliver personalized and real-time suggestions. A critical yet underexplored component in these systems is the representation of user state, which typically encapsulates the user's interaction history and is deeply correlated with the model's decisions and learning. In this paper, we investigate the impact of different embedding-based state representations derived from matrix factorization models on the performance of traditional CMAB algorithms. Our large-scale experiments reveal that variations in state representation can lead to improvements greater than those achieved by changing the bandit algorithm itself. Furthermore, no single embedding or aggregation strategy consistently dominates across datasets, underscoring the need for domain-specific evaluation. These results expose a substantial gap in the literature and emphasize that advancing bandit-based recommender systems requires a holistic approach that prioritizes embedding quality and state construction alongside algorithmic innovation. The source code for our experiments is publicly available on https://github.com/UFSCar-LaSID/bandits_blind_spot.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper claims that in contextual multi-armed bandit (CMAB) recommender systems, variations in embedding-based user state representations (derived from matrix factorization models) produce larger performance gains than changes to the underlying bandit algorithm. Large-scale experiments across multiple datasets show that no single embedding or aggregation strategy dominates, highlighting the need for domain-specific evaluation and a more holistic focus on state construction alongside algorithmic advances. Source code is released publicly.

Significance. If the empirical comparisons hold, the result would be significant for the CMAB recommender literature by shifting attention from algorithm selection to state-representation design, a component the authors argue has been underexplored. The public code release strengthens reproducibility and allows direct verification of the reported deltas between state and algorithm effects.

major comments (3)
  1. [Experiments] Experiments section: the central claim that state-representation variations yield larger improvements than algorithm swaps is not accompanied by reported statistical tests, number of independent runs, confidence intervals, or error bars on the performance deltas. Without these, it is impossible to assess whether the observed ranking of effects is robust or could be explained by variance in the large-scale runs.
  2. [Methodology] Methodology and Experiments: all tested state representations are constructed from matrix-factorization embeddings; the paper does not include non-MF baselines (e.g., sequence models or graph embeddings) or ablation on embedding dimensionality and training data. This limits the strength of the generalization that state representation is broadly more important than algorithm choice.
  3. [Results] Results: the claim of “improvements greater than those achieved by changing the bandit algorithm itself” requires explicit quantification (e.g., average relative lift and standard deviation across datasets and algorithms). The current presentation leaves unclear whether the effect size is consistent or driven by particular dataset–algorithm pairs.
minor comments (2)
  1. [Abstract] The abstract and introduction use “large-scale experiments” without defining scale (number of users, interactions, or time steps). Adding these figures would improve clarity.
  2. [Methodology] Notation for aggregation strategies (e.g., mean, attention, etc.) should be introduced with a small table or explicit equations in the methodology section to avoid ambiguity when comparing results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: Experiments section: the central claim that state-representation variations yield larger improvements than algorithm swaps is not accompanied by reported statistical tests, number of independent runs, confidence intervals, or error bars on the performance deltas. Without these, it is impossible to assess whether the observed ranking of effects is robust or could be explained by variance in the large-scale runs.

    Authors: We agree that the absence of statistical tests and variability measures weakens the presentation of the central claim. Our experiments were run with 5 independent random seeds per configuration, but these details and the resulting standard deviations were not reported. In the revision we will add error bars (standard deviation), explicitly state the number of runs, and include paired statistical tests comparing the magnitude of state-representation effects versus algorithm effects. These additions will be placed in the Experiments and Results sections. revision: yes

  2. Referee: Methodology and Experiments: all tested state representations are constructed from matrix-factorization embeddings; the paper does not include non-MF baselines (e.g., sequence models or graph embeddings) or ablation on embedding dimensionality and training data. This limits the strength of the generalization that state representation is broadly more important than algorithm choice.

    Authors: The manuscript deliberately restricts its scope to embedding-based user states derived from matrix factorization, which remains the dominant practical approach for initializing CMAB recommenders. We therefore did not evaluate sequence or graph embeddings. We acknowledge that this choice limits broader generalization claims. In the revised version we will (i) clarify the intended scope in the introduction and abstract, (ii) add a dedicated limitations paragraph, and (iii) suggest future work on non-MF representations. No new experiments will be added at this stage. revision: partial

  3. Referee: Results: the claim of “improvements greater than those achieved by changing the bandit algorithm itself” requires explicit quantification (e.g., average relative lift and standard deviation across datasets and algorithms). The current presentation leaves unclear whether the effect size is consistent or driven by particular dataset–algorithm pairs.

    Authors: We will revise the Results section to provide the requested quantification. New summary tables will report, for each dataset, the average relative improvement (and standard deviation) obtained by varying state representations versus varying the bandit algorithm. We will also highlight any dataset–algorithm pairs that drive the overall trend. These changes will make the effect sizes transparent and allow readers to judge consistency. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparisons of state representations vs. algorithms rest on direct experimental outcomes

full rationale

The paper reports results from large-scale experiments that directly measure performance deltas when varying embedding-based state representations (derived from matrix factorization) versus swapping CMAB algorithms across multiple datasets. No derivation chain, first-principles prediction, or fitted parameter is claimed; the central finding that state-representation changes can exceed algorithm changes is presented as an observed empirical pattern, not a mathematical reduction. Self-citations are absent from the load-bearing claims, and the work is self-contained against the reported benchmarks without any step that renames or re-derives its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard domain assumptions from matrix factorization and contextual bandits; no new free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Matrix factorization models produce useful embeddings of user interaction history for state representation
    Standard assumption invoked when deriving state representations from interaction data.
  • domain assumption Contextual multi-armed bandits are an appropriate model for sequential personalized recommendation
    Background assumption stated in the abstract for the experimental setup.

pith-pipeline@v0.9.0 · 5520 in / 1250 out tokens · 43049 ms · 2026-05-07T12:35:24.034629+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 38 canonical work pages

  1. [1]

    Advances in Engineering Software42(12), 1020–1034 (2011)

    J. Bobadilla, F. Ortega, A. Hernando, and A. Gutiérrez. 2013. Recommender systems survey.Knowledge-Based Systems46 (2013), 109–132. doi:10.1016/j. knosys.2013.03.012

  2. [2]

    Elisa Celis, Sayash Kapoor, Farnood Salehi, and Nisheeth Vishnoi

    L. Elisa Celis, Sayash Kapoor, Farnood Salehi, and Nisheeth Vishnoi. 2019. Control- ling Polarization in Personalization: An Algorithmic Framework. InProceedings of the Conference on Fairness, Accountability, and Transparency(Atlanta, GA, USA) (FAT ’19). Association for Computing Machinery, New York, NY, USA, 160–169. doi:10.1145/3287560.3287601

  3. [3]

    Haw-Shiuan Chang, Nikhil Agarwal, and Andrew McCallum. 2024. To Copy, or not to Copy; That is a Critical Issue of the Output Softmax Layer in Neural Se- quential Recommenders. InProceedings of the 17th ACM International Conference on Web Search and Data Mining(Merida, Mexico)(WSDM ’24). Association for Computing Machinery, New York, NY, USA, 67–76. doi:10...

  4. [4]

    Minmin Chen, Yuyan Wang, Can Xu, Ya Le, Mohit Sharma, Lee Richardson, Su- Lin Wu, and Ed Chi. 2021. Values of User Exploration in Recommender Systems. InProceedings of the 15th ACM Conference on Recommender Systems(Amsterdam, Netherlands)(RecSys ’21). Association for Computing Machinery, New York, NY, USA, 85–95. doi:10.1145/3460231.3474236

  5. [5]

    Ziqiang Cui, Haolun Wu, Bowei He, Ji Cheng, and Chen Ma. 2024. Diffusion- based Contrastive Learning for Sequential Recommendation. InProceedings of the 33rd ACM International Conference on Information and Knowledge Management. 404–414. doi:10.1145/3627673.3679655 arXiv:2405.09369 [cs]. SAC’26, March 23–27, 2026, Thessaloniki, Greece P. R. Pires et al

  6. [6]

    Hamid Dadkhahi and Sahand Negahban. 2018. Alternating Linear Bandits for Online Matrix-Factorization Recommendation.arXiv preprint(2018), 1–10. arXiv:1810.09401 [cs.IR] doi:10.48550/arXiv.1810.09401

  7. [7]

    Xinyu Du, Huanhuan Yuan, Pengpeng Zhao, Jianfeng Qu, Fuzhen Zhuang, Guan- feng Liu, Yanchi Liu, and Victor S. Sheng. 2023. Frequency Enhanced Hybrid Attention Network for Sequential Recommendation. InProceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval(Taipei, Taiwan)(SIGIR ’23). Association for C...

  8. [8]

    Diego Feijer, Himan Abdollahpouri, Sanket Gupta, Alexander Clare, Yuxiao Wen, Todd Wasson, Maria Dimakopoulou, Zahra Nazari, Kyle Kretschman, and Mounia Lalmas. 2025. Calibrated Recommendations with Contextual Bandits. (Sept. 2025). doi:10.48550/arXiv.2509.05460 arXiv:2509.05460 [cs]

  9. [9]

    Zhichao Feng, Pengfei Wang, Kaiyuan Li, Chenliang Li, and Shangguang Wang

  10. [10]

    InProceedings of the 17th ACM International Conference on Web Search and Data Mining

    Contextual MAB Oriented Embedding Denoising for Sequential Recom- mendation. InProceedings of the 17th ACM International Conference on Web Search and Data Mining. ACM, Merida Mexico, 199–207. doi:10.1145/3616855.3635798

  11. [11]

    João Gama, Raquel Sebastião, and Pedro Pereira Rodrigues. 2012. On evaluating stream learning algorithms.Machine Learning90, 3 (2012), 317–346. doi:10.1007/ s10994-012-5320-9

  12. [12]

    Mingxin Gan and O-Chol Kwon. 2022. A knowledge-enhanced contextual bandit approach for personalized recommendation in dynamic domains.Knowledge- Based Systems251, C (2022), 1–11. doi:10.1016/j.knosys.2022.109158

  13. [13]

    Dalin Guo, Sofia Ira Ktena, Pranay Kumar Myana, Ferenc Huszar, Wenzhe Shi, Alykhan Tejani, Michael Kneier, and Sourav Das. 2020. Deep Bayesian Bandits: Exploring in Online Personalized Recommendations. InProceedings of the 14th ACM Conference on Recommender Systems(Virtual Event, Brazil)(RecSys ’20). Association for Computing Machinery, New York, NY, USA,...

  14. [14]

    Hongbo Guo and Zheqing Zhu. 2024. Uncertainty of Joint Neural Contextual Bandit. (2024)

  15. [15]

    Bowei He, Xu He, Renrui Zhang, Yingxue Zhang, Ruiming Tang, and Chen Ma

  16. [16]

    InProceedings of the 32nd ACM International Conference on Information and Knowledge Management(Birmingham, United Kingdom) (CIKM ’23)

    Dynamic Embedding Size Search with Minimum Regret for Streaming Recommender System. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management(Birmingham, United Kingdom) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 741–750. doi:10.1145/3583780.3615135

  17. [17]

    Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. InProceedings of the 26th International Conference on World Wide Web (WWW ‘17). International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, 173—-182. doi:10.1145/3038912.3052569

  18. [18]

    Meisam Hejazinia, Kyler Eastman, Shuqin Ye, Abbas Amirabadi, and Ravi Divvela

  19. [19]

    arXiv preprint(2019), 1–8

    Accelerated learning from recommender systems using multi-armed bandit. arXiv preprint(2019), 1–8. arXiv:1908.06158 [cs.IR] doi:10.48550/arXiv.1908.06158

  20. [20]

    Yifan Hu, Yehuda Koren, and Chris Volinsky. 2008. Collaborative Filtering for Implicit Feedback Datasets. InProceedings of the 8th IEEE International Conference on Data Mining(Pisa, Italy)(ICDM’08). IEEE Computer Society, New York, NY, USA, 263–272. doi:10.1109/ICDM.2008.22

  21. [21]

    Taehyun Hwang, Kyuwook Chai, and Min hwan Oh. 2023. Combinatorial Neural Bandits. InProceedings of the 40 th International Conference on Machine Learning (Honolulu, HI, USA)(PMLR’23). MLResearchPress, Virtual Venue, 1–34. doi:10. 48550/arXiv.2306.00242

  22. [22]

    Rolf Jagerman, Ilya Markov, and Maarten de Rijke. 2019. When People Change their Mind: Off-Policy Evaluation in Non-stationary Recommendation Environ- ments. InProceedings of the 12th ACM International Conference on Web Search and Data Mining(Melbourne, Australia)(WSDM ’19). Association for Computing Machinery, New York, NY, USA, 447–455. doi:10.1145/3289...

  23. [23]

    Serdar Kadıoğlu and Bernard Kleynhans. 2024. Building Higher-Order Abstrac- tions from the Components of Recommender Systems. InProceedings of the AAAI Conference on Artificial Intelligence(Vancouver, Canada)(AAAI-24). AAAI Press, Washington, DC, USA, 22998–23004. doi:10.1609/aaai.v38i21.30341

  24. [24]

    Ashutosh Kakadiya, Sriraam Natarajan, and Balaraman Ravindran. 2020. Re- lational Boosted Bandits.arXiv preprint(2020), 1–8. arXiv:2012.09220 [cs.LG] doi:10.48550/arXiv.2012.09220

  25. [25]

    Khashayar Khosravi, Renato Paes Leme, Chara Podimata, and Apostolis Tsor- vantzis. 2024. Preferences Evolve and so Should Your Bandits: Bandits with Evolving States for Online Platforms. InProceedings of the 25th ACM Conference on Economics and Computation(New Haven, CT, USA)(EC ’24). Association for Computing Machinery, New York, NY, USA, 101. doi:10.114...

  26. [26]

    Yehuda Koren, Robert Bell, and Chris Volinsky. 2009. Matrix Factorization Tech- niques For Recommender Systems.Computer42, 8 (2009), 30–37. doi:10.1109/ MC.2009.263

  27. [27]

    John Langford and Tong Zhang. 2007. The Epoch-Greedy algorithm for contextual multi-armed bandits. InProceedings of the 20th International Conference on Neural Information Processing Systems(Vancouver, Canada)(NIPS ’07). Curran Associates Inc., Red Hook, NY, USA, 817–824. doi:10.5555/2981562.2981665

  28. [28]

    Chang Li, Haoyun Feng, and Maarten de Rijke. 2020. Cascading Hybrid Bandits: Online Learning to Rank for Relevance and Diversity. InProceedings of the 14th ACM Conference on Recommender Systems (RecSys ’20). ACM, Virtual Event, Brazil, 33–42. doi:10.1145/3383313.3412214

  29. [29]

    Schapire

    Lihong Li, Wei Chu, John Langford, and Robert E. Schapire. 2010. A Contextual- Bandit Approach to Personalized News Article Recommendation. InProceedings of the 19th International Conference on World Wide Web(Madrid, Spain)(WWW ’09). Association for Computing Machinery, New York, NY, USA, 661–670. doi:10. 1145/1772690.1772758

  30. [30]

    Shuai Li, Alexandros Karatzoglou, and Claudio Gentile. 2016. Collaborative Filtering Bandits. InProceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval(Pisa, Italy)(SIGIR ’16). Association for Computing Machinery, New York, NY, USA, 539–548. doi:10. 1145/2911451.2911548

  31. [31]

    Bo Liu, Ying Wei, Yu Zhang, Zhixian Yan, and Qiang Yang. 2018. Transferable Contextual Bandit for Cross-Domain Recommendation. InProceedings of the 32nd AAAI Conference on Artificial Intelligence(New Orleans, LA, USA)(AAAI ’18). AAAI Press, Palo Alto, CA, USA, 3619–3626. doi:10.1609/aaai.v32i1.11699

  32. [32]

    Jérémie Mary, Romaric Gaudel, and Preux Philippe. 2014. Bandits Warm-up Cold Recommender Systems.arXiv preprint(2014), 1–18. arXiv:1407.2806 [cs.LG] doi:10.48550/arXiv.1407.2806

  33. [33]

    James McInerney, Benjamin Lacker, Samantha Hansen, Karl Higley, Hugues Bouchard, Alois Gruson, and Rishabh Mehrotra. 2018. Explore, exploit, and explain: personalizing explainable recommendations with bandits. InProceedings of the 12th ACM Conference on Recommender Systems(Vancouver, Canada)(RecSys ’18). Association for Computing Machinery, New York, NY, ...

  34. [34]

    doi:10.1007/978-3-540-72079-9_10

    Michael J. Pazzani and Daniel Billsus. 2007. Content-Based Recommendation Systems.The Adaptive WebLecture Notes in Computer Science, vol 4321 (2007), 325–341. doi:10.1007/978-3-540-72079-9_10

  35. [35]

    Yunzhe Qi, Yikun Ban, and Jingrui He. 2023. Graph Neural Bandits. InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Long Beach, CA, USA)(KDD ’23). Association for Computing Machinery, New York, NY, USA, 1920–1931. doi:10.1145/3580305.3599371

  36. [36]

    Stefen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme

  37. [37]

    InProceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI ‘09)

    BPR: Bayesian Personalized Ranking from Implicit Feedback. InProceedings of the 25th Conference on Uncertainty in Artificial Intelligence (UAI ‘09). AUAI Press, Montreal, Canada, 452–461. doi:10.5555/1795114.1795167

  38. [38]

    Nikolenko

    Ilya Shenbin, Anton Alekseev, Elena Tutubalina, Valentin Malykh, and Sergey I. Nikolenko. 2020. RecVAE: A New Variational Autoencoder for Top-N Recommen- dations with Implicit Feedback. InProceedings of the 13th International Conference on Web Search and Data Mining(Houston, TX, USA)(WSDM ’20). Association for Computing Machinery, New York, NY, USA, 528–5...

  39. [39]

    Yehjin Shin, Jeongwhan Choi, Hyowon Wi, and Noseong Park. 2024. An atten- tive inductive bias for sequential recommendation beyond the self-attention. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8984–8992

  40. [40]

    Nícollas Silva, Heitor Werneck, Thiago Silva, Adriano C. M. Pereira, and Leonardo Rocha. 2022. Multi-Armed Bandits in Recommendation Systems: A survey of the state-of-the-art and future directions.Expert Systems with Applications197, 1 (2022), 1–17. doi:10.1016/j.eswa.2022.116669

  41. [41]

    João Vinagre, Alípio Mário Jorge, and João Gama. 2015. An overview on the exploitation of time in collaborative filtering.WIREs Data Mining and Knowledge Discovery5 (2015), 195–215. doi:10.1002/widm.1160

  42. [42]

    Huazheng Wang, Qingyun Wu, and Hongning Wang. 2017. Factorization Bandits for Interactive Recommendation. InProceedings of the 31st AAAI Conference on Artificial Intelligence(San Francisco, CA, USA)(AAAI ’17). AAAI Press, Palo Alto, CA, USA, 2695–2702. doi:10.1609/aaai.v31i1.10936

  43. [43]

    Shenghao Xu. 2021. BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System.arXiv preprint(2021), 1–56. arXiv:2106.10898 [cs.IR] doi:10.48550/arXiv.2106.10898

  44. [44]

    Li Zhou and Emma Brunskill. 2016. Latent contextual bandits and their appli- cation to personalized recommendations for new users. InProceedings of the 25th International Joint Conference on Artificial Intelligence(New York, NY, USA) (IJCAI’16). AAAI Press, 3646–3653

  45. [45]

    Tongxin Zhou, Yingfei Wang, Lu (Lucy) Yan, and Yong Tan. 2023. Spoiled for Choice? Personalized Recommendation for Healthcare Decisions: A Multiarmed Bandit Approach.Information Systems Research34, 4 (2023), 1493–1512. doi:10. 1287/isre.2022.1191

  46. [46]

    Chenhao Zhu. 2025. Comparative Analysis of Multi-armed Bandits Models for Recommendation Systems. InProceedings of the 4th International Conference on Computing Innovation and Applied Physics(Eskisehir, Turkey)(CONF-CIAP ’25). EWA Publishing, Oxford, United Kingdom, 95–100. doi:10.54254/2753-8818/2025. 20167

  47. [47]

    Zheqing Zhu and Benjamin Van Roy. 2023. Scalable Neural Contextual Bandit for Recommender Systems. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management(Birmingham, United Kingdom) (CIKM ’23). Association for Computing Machinery, New York, NY, USA, 3636–

  48. [48]

    doi:10.1145/3583780.3615048