Combating the Filter Bubble: Designing for Serendipity in a University Course Recommendation System
Pith reviewed 2026-05-25 10:29 UTC · model grok-4.3
The pith
A modified skip-gram model learns richer course embeddings to generate more novel recommendations than RNNs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that multifactor2vec, by learning embeddings for factors such as instructor in addition to primary course tokens during skip-gram training on enrollment sequences, improves the semantics of course representations. This leads to higher accuracy and recall on course similarity and analogy validation sets compared to standard skip-gram, with further gains from catalog text, and supports diversified recommendations that can increase novelty relative to RNN-based systems in a university production context.
What carries the argument
multifactor2vec: modification to the skip-gram model that learns primary token embeddings while also learning embeddings for conflated factors (e.g., instructor) from course enrollment sequences.
If this is right
- Improved accuracy and recall on course similarity and analogy validation sets over a standard skip-gram.
- Further improvements when course catalog description text is incorporated into the model.
- RNN recommendations exhibit a dramatic lack of novelty as rated by undergraduates in the user study.
- Serendipity requires navigating characteristic trade-offs among novelty, relevance, and other recommendation qualities.
- Recommendations can be diversified by computing similarity to a specified favorite course using the learned embeddings.
Where Pith is reading between the lines
- The multifactor approach could be applied to other sequential recommendation settings to reduce echo-chamber effects.
- Live deployment would need to test whether offline gains translate to students actually enrolling in more exploratory courses.
- Explicit constraints from degree requirements could be added to the similarity computation to ensure compliance.
- The model might be combined with progression-norm checks to balance exploration against graduation timelines.
Load-bearing premise
Similarity to a student's specified favorite course, derived from multifactor embeddings, produces recommendations that are both relevant and serendipitous without violating university progression norms.
What would settle it
A user study in which undergraduates rate multifactor2vec recommendations on novelty and relevance, showing no meaningful increase in novelty compared to the RNN baseline.
Figures
read the original abstract
Collaborative filtering based algorithms, including Recurrent Neural Networks (RNN), tend towards predicting a perpetuation of past observed behavior. In a recommendation context, this can lead to an overly narrow set of suggestions lacking in serendipity and inadvertently placing the user in what is known as a "filter bubble." In this paper, we grapple with the issue of the filter bubble in the context of a course recommendation system in production at a public university. Most universities in the United States encourage students to explore developing interests while simultaneously advising them to adhere to course taking norms which progress them towards graduation. These competing objectives, and the stakes involved for students, make this context a particularly meaningful one for investigating real-world recommendation strategies. We introduce a novel modification to the skip-gram model applied to nine years of historic course enrollment sequences to learn course vector representations used to diversify recommendations based on similarity to a student's specified favorite course. This model, which we call multifactor2vec, is intended to improve the semantics of the primary token embedding by also learning embeddings of potentially conflated factors of the token (e.g., instructor). Our offline testing found this model improved accuracy and recall on our course similarity and analogy validation sets over a standard skip-gram. Incorporating course catalog description text resulted in further improvements. We compare the performance of these models to the system's existing RNN-based recommendations with a user study of undergraduates (N = 70) rating six characteristics of their course recommendations. Results of the user study show a dramatic lack of novelty in RNN recommendations and depict the characteristic trade-offs that make serendipity difficult to achieve.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces multifactor2vec, a skip-gram variant that augments course embeddings with factor embeddings (e.g., instructor) learned from nine years of enrollment sequences, optionally combined with catalog text. It claims this yields improved accuracy and recall on course similarity and analogy validation sets relative to standard skip-gram, and that a user study (N=70) reveals the production RNN recommendations exhibit low novelty, highlighting trade-offs in achieving serendipity while respecting university progression norms.
Significance. If the embedding similarity approach can be shown to map to user-perceived serendipity without violating progression constraints, the work would offer a practical method for diversifying recommendations in educational settings. The real-world production context and inclusion of a user study are strengths that ground the problem in a high-stakes domain.
major comments (3)
- [Abstract] Abstract (user study paragraph): Results are reported exclusively for the existing RNN recommendations; no ratings, comparisons, or outputs from multifactor2vec are included in the N=70 study, leaving the central claim that the proposed model produces serendipitous recommendations without direct user evidence.
- [Abstract] Abstract (offline testing sentence): Accuracy and recall gains on the similarity and analogy validation sets are stated without error bars, validation-set sizes, statistical tests, or details on RNN baseline training, undermining assessment of whether the reported improvements are reliable or load-bearing for the model comparison.
- [Abstract] Abstract (validation sets): The similarity and analogy tasks are not shown to operationalize serendipity (unexpected yet useful) rather than co-occurrence statistics already present in enrollment sequences; this leaves untested whether multifactor2vec similarity yields norm-compliant, diversified recommendations.
minor comments (1)
- The manuscript would benefit from explicit dataset sizes for the validation sets and a description of how the RNN baseline was trained and tuned.
Simulated Author's Rebuttal
Thank you for the constructive feedback on our manuscript. We address each major comment point-by-point below, with proposed revisions to improve clarity and reporting.
read point-by-point responses
-
Referee: [Abstract] Abstract (user study paragraph): Results are reported exclusively for the existing RNN recommendations; no ratings, comparisons, or outputs from multifactor2vec are included in the N=70 study, leaving the central claim that the proposed model produces serendipitous recommendations without direct user evidence.
Authors: The user study evaluates the production RNN system to demonstrate the filter bubble issue in a real deployment, as motivated in the introduction. Multifactor2vec is validated via offline similarity and analogy tasks, with the design intended to support diversified recommendations based on favorite-course similarity. We agree the abstract should not imply direct user evidence for the new model. We will revise the abstract to explicitly state the scope of the user study and the role of offline results. revision: yes
-
Referee: [Abstract] Abstract (offline testing sentence): Accuracy and recall gains on the similarity and analogy validation sets are stated without error bars, validation-set sizes, statistical tests, or details on RNN baseline training, undermining assessment of whether the reported improvements are reliable or load-bearing for the model comparison.
Authors: We acknowledge the abstract omits these details. The full manuscript describes the validation sets and training, but we will revise the abstract to report validation-set sizes and note consistent improvements, while ensuring the results section includes error bars, statistical tests, and RNN baseline details. revision: yes
-
Referee: [Abstract] Abstract (validation sets): The similarity and analogy tasks are not shown to operationalize serendipity (unexpected yet useful) rather than co-occurrence statistics already present in enrollment sequences; this leaves untested whether multifactor2vec similarity yields norm-compliant, diversified recommendations.
Authors: These tasks validate that factor embeddings improve semantic capture beyond raw co-occurrence (e.g., disambiguating by instructor). The model is applied to generate recommendations similar to a specified favorite course while drawing from observed enrollment patterns that respect progression norms. The user study separately quantifies novelty. We will add discussion clarifying these tasks as proxies and their relation to serendipity within university constraints. revision: partial
Circularity Check
No circularity detected; derivation is self-contained
full rationale
The paper trains multifactor2vec (a skip-gram variant) on nine years of historic enrollment sequences, evaluates accuracy/recall gains on separate course similarity and analogy validation sets, and compares against an RNN baseline via a user study (N=70). No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are present in the provided text that reduce any claim to its inputs by construction. The chain from sequences to embeddings to similarity-based recommendations is standard and externally benchmarked, so the central claims remain independent of the training data.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Zeinab Abbassi, Sihem Amer-Yahia, Laks VS Lakshmanan, Sergei Vassilvitskii, and Cong Yu. 2009. Getting recommender systems to think outside the box. In Proceedings of the third ACM conference on Recommender systems . ACM, 285–288
work page 2009
-
[2]
Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , Vol. 1. 238–247
work page 2014
-
[3]
Marco Baroni and Alessandro Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics 36, 4 (2010), 673–721
work page 2010
-
[4]
Sorathan Chaturapruek, Thomas Dee, Ramesh Johari, René Kizilcec, and Mitchell Stevens. 2018. How a data-driven course planning tool affects college students’ GPA: evidence from two field experiments. (2018)
work page 2018
-
[5]
Hung-Hsuan Chen. 2018. Behavior2Vec: Generating Distributed Representations of UsersâĂŹ Behaviors on Products for Recommender Systems.ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 4 (2018), 43
work page 2018
-
[6]
D Manning Christopher, Raghavan Prabhakar, and Schacetzel Hinrich. 2008. Introduction to information retrieval. An Introduction To Information Retrieval 151, 177 (2008), 5
work page 2008
-
[7]
Martin Dillon. 1983. Introduction to modern information retrieval: G. Salton and M. McGill. McGraw-Hill, New York (1983). 448 pp., ISBN 0-07-054484-0
work page 1983
-
[8]
Rosta Farzan and Peter Brusilovsky. 2011. Encouraging user participation in a course recommender system: An impact on user behavior. Computers in Human Behavior 27, 1 (2011), 276–284
work page 2011
-
[9]
Li Fei-Fei, Rob Fergus, and Pietro Perona. 2006. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence 28, 4 (2006), 594–611
work page 2006
-
[10]
Kata Gábor, Haïfa Zargayouna, Isabelle Tellier, Davide Buscaldi, and Thierry Charnois. 2017. Exploring Vector Spaces for Semantic Relations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing . 1814–1823
work page 2017
-
[11]
Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel recurrent neural network architectures for feature-rich session-based recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 241–248
work page 2016
-
[12]
Noriaki Kawamae, Hitoshi Sakano, and Takeshi Yamada. 2009. Personalized recommendation based on the personal innovator degree. In Proceedings of the third ACM conference on Recommender systems . ACM, 329–332
work page 2009
-
[13]
Judy Kay. 2000. Stereotypes, student models and scrutability. In International Conference on Intelligent Tutoring Systems . Springer, 19–30
work page 2000
-
[14]
Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering 16, 4 (2010), 359–389
work page 2010
-
[15]
Omer Levy and Yoav Goldberg. 2014. Linguistic regularities in sparse and explicit word representations. InProceedings of the eighteenth conference on computational natural language learning. 171–180
work page 2014
-
[16]
Dekang Lin et al. 1998. An information-theoretic definition of similarity.. In Icml, Vol. 98. Citeseer, 296–304
work page 1998
-
[17]
Sean M McNee, John Riedl, and Joseph A Konstan. 2006. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI’06 extended abstracts on Human factors in computing systems . ACM, 1097–1101
work page 2006
-
[18]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems . 3111–3119
work page 2013
-
[19]
Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web. ACM, 677–686
work page 2014
-
[20]
Gaurav Pandey, Denis Kotkov, and Alexander Semenov. 2018. Recommending serendipitous items using transfer learning. In Proceedings of the 27th ACM international conference on information and knowledge management . ACM, 1771– 1774
work page 2018
-
[21]
Aditya Parameswaran, Petros Venetis, and Hector Garcia-Molina. 2011. Rec- ommendation systems with complex constraints: A course recommendation perspective. ACM Transactions on Information Systems (TOIS) 29, 4 (2011), 20
work page 2011
-
[22]
Zachary A Pardos, Zihao Fan, and Weijie Jiang. 2019. Connectionist recom- mendation in the wild: on the utility and scrutability of neural networks for personalized course guidance. User Modeling and User-Adapted Interaction 29, 2 (2019), 487–525
work page 2019
-
[23]
Zachary A Pardos and Andrew Joo Hun Nam. 2018. A Map of Knowledge. CoRR preprint, abs/1811.07974 (2018). https://arxiv.org/abs/1811.07974
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[24]
Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 183–190
work page 1993
-
[25]
Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 385–394
work page 2017
-
[26]
Guy Shani and Asela Gunawardana. 2011. Evaluating recommendation systems. In Recommender systems handbook. Springer, 257–297
work page 2011
-
[27]
Peter D Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research 37 (2010), 141–188
work page 2010
-
[28]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems . 5998–6008
work page 2017
-
[29]
Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. 2016. Match- ing networks for one shot learning. In Advances in Neural Information Processing Systems. 3630–3638
work page 2016
-
[30]
Yuan Cao Zhang, Diarmuid Ó Séaghdha, Daniele Quercia, and Tamas Jambor. 2012. Auralist: introducing serendipity into music recommendation. In Proceedings of the fifth ACM international conference on Web search and data mining . ACM, 13–22
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.