pith. sign in

arxiv: 1907.01591 · v1 · pith:YGPSDYY5new · submitted 2019-07-02 · 💻 cs.IR · cs.CY

Combating the Filter Bubble: Designing for Serendipity in a University Course Recommendation System

Pith reviewed 2026-05-25 10:29 UTC · model grok-4.3

classification 💻 cs.IR cs.CY
keywords course recommendationfilter bubbleserendipityskip-gramembeddingsrecurrent neural networkscollaborative filteringuniversity courses
0
0 comments X

The pith

A modified skip-gram model learns richer course embeddings to generate more novel recommendations than RNNs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces multifactor2vec to address the filter bubble in university course recommendations, where standard methods like RNNs reinforce past enrollment patterns and limit exploration. It modifies the skip-gram model to embed courses alongside potentially conflated factors such as instructors from nine years of historic sequences, then uses similarity to a student's favorite course for suggestions. Offline tests showed gains in accuracy and recall on similarity and analogy tasks, with added text descriptions yielding further gains. A user study of 70 undergraduates found RNN recommendations lacked novelty and highlighted trade-offs in achieving serendipity while respecting degree norms.

Core claim

The central claim is that multifactor2vec, by learning embeddings for factors such as instructor in addition to primary course tokens during skip-gram training on enrollment sequences, improves the semantics of course representations. This leads to higher accuracy and recall on course similarity and analogy validation sets compared to standard skip-gram, with further gains from catalog text, and supports diversified recommendations that can increase novelty relative to RNN-based systems in a university production context.

What carries the argument

multifactor2vec: modification to the skip-gram model that learns primary token embeddings while also learning embeddings for conflated factors (e.g., instructor) from course enrollment sequences.

If this is right

  • Improved accuracy and recall on course similarity and analogy validation sets over a standard skip-gram.
  • Further improvements when course catalog description text is incorporated into the model.
  • RNN recommendations exhibit a dramatic lack of novelty as rated by undergraduates in the user study.
  • Serendipity requires navigating characteristic trade-offs among novelty, relevance, and other recommendation qualities.
  • Recommendations can be diversified by computing similarity to a specified favorite course using the learned embeddings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The multifactor approach could be applied to other sequential recommendation settings to reduce echo-chamber effects.
  • Live deployment would need to test whether offline gains translate to students actually enrolling in more exploratory courses.
  • Explicit constraints from degree requirements could be added to the similarity computation to ensure compliance.
  • The model might be combined with progression-norm checks to balance exploration against graduation timelines.

Load-bearing premise

Similarity to a student's specified favorite course, derived from multifactor embeddings, produces recommendations that are both relevant and serendipitous without violating university progression norms.

What would settle it

A user study in which undergraduates rate multifactor2vec recommendations on novelty and relevance, showing no meaningful increase in novelty compared to the RNN baseline.

Figures

Figures reproduced from arXiv: 1907.01591 by Weijie Jiang, Zachary A. Pardos.

Figure 1
Figure 1. Figure 1: multi-factor course2vec model Probability p(ci+j |ci , fi1, fi2, ..., fih) of observing a neighboring course ci+j given the current course ci and its features fi1, fi2, ..., fih can also be defined via the softmax function, p(ci+j |ci) = exp(a T i v ′ i+j ) Ín k=1 exp(a T i v ′ k ) (2) ai = vi + Õ h j=1 Wnj×v fij (3) where ac is the vector sum of input course vector representation vc and all the features v… view at source ↗
Figure 2
Figure 2. Figure 2: Novelty rating proportions for BOW (div) [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Novelty rating proportions for RNN (non-div) [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: BOW (div) vs. Equivalency (non-div) comparison [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: The “Explore" Interface 8 DISCUSSION Surfacing courses that are of interest but not known before means expanding a student’s knowledge and understanding of the Univer￾sity’s offerings. As students are exposed to courses that veer further from their home department and nexus of interest and understand￾ing, recommendations become less familiar with descriptions that [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Collaborative filtering based algorithms, including Recurrent Neural Networks (RNN), tend towards predicting a perpetuation of past observed behavior. In a recommendation context, this can lead to an overly narrow set of suggestions lacking in serendipity and inadvertently placing the user in what is known as a "filter bubble." In this paper, we grapple with the issue of the filter bubble in the context of a course recommendation system in production at a public university. Most universities in the United States encourage students to explore developing interests while simultaneously advising them to adhere to course taking norms which progress them towards graduation. These competing objectives, and the stakes involved for students, make this context a particularly meaningful one for investigating real-world recommendation strategies. We introduce a novel modification to the skip-gram model applied to nine years of historic course enrollment sequences to learn course vector representations used to diversify recommendations based on similarity to a student's specified favorite course. This model, which we call multifactor2vec, is intended to improve the semantics of the primary token embedding by also learning embeddings of potentially conflated factors of the token (e.g., instructor). Our offline testing found this model improved accuracy and recall on our course similarity and analogy validation sets over a standard skip-gram. Incorporating course catalog description text resulted in further improvements. We compare the performance of these models to the system's existing RNN-based recommendations with a user study of undergraduates (N = 70) rating six characteristics of their course recommendations. Results of the user study show a dramatic lack of novelty in RNN recommendations and depict the characteristic trade-offs that make serendipity difficult to achieve.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper introduces multifactor2vec, a skip-gram variant that augments course embeddings with factor embeddings (e.g., instructor) learned from nine years of enrollment sequences, optionally combined with catalog text. It claims this yields improved accuracy and recall on course similarity and analogy validation sets relative to standard skip-gram, and that a user study (N=70) reveals the production RNN recommendations exhibit low novelty, highlighting trade-offs in achieving serendipity while respecting university progression norms.

Significance. If the embedding similarity approach can be shown to map to user-perceived serendipity without violating progression constraints, the work would offer a practical method for diversifying recommendations in educational settings. The real-world production context and inclusion of a user study are strengths that ground the problem in a high-stakes domain.

major comments (3)
  1. [Abstract] Abstract (user study paragraph): Results are reported exclusively for the existing RNN recommendations; no ratings, comparisons, or outputs from multifactor2vec are included in the N=70 study, leaving the central claim that the proposed model produces serendipitous recommendations without direct user evidence.
  2. [Abstract] Abstract (offline testing sentence): Accuracy and recall gains on the similarity and analogy validation sets are stated without error bars, validation-set sizes, statistical tests, or details on RNN baseline training, undermining assessment of whether the reported improvements are reliable or load-bearing for the model comparison.
  3. [Abstract] Abstract (validation sets): The similarity and analogy tasks are not shown to operationalize serendipity (unexpected yet useful) rather than co-occurrence statistics already present in enrollment sequences; this leaves untested whether multifactor2vec similarity yields norm-compliant, diversified recommendations.
minor comments (1)
  1. The manuscript would benefit from explicit dataset sizes for the validation sets and a description of how the RNN baseline was trained and tuned.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment point-by-point below, with proposed revisions to improve clarity and reporting.

read point-by-point responses
  1. Referee: [Abstract] Abstract (user study paragraph): Results are reported exclusively for the existing RNN recommendations; no ratings, comparisons, or outputs from multifactor2vec are included in the N=70 study, leaving the central claim that the proposed model produces serendipitous recommendations without direct user evidence.

    Authors: The user study evaluates the production RNN system to demonstrate the filter bubble issue in a real deployment, as motivated in the introduction. Multifactor2vec is validated via offline similarity and analogy tasks, with the design intended to support diversified recommendations based on favorite-course similarity. We agree the abstract should not imply direct user evidence for the new model. We will revise the abstract to explicitly state the scope of the user study and the role of offline results. revision: yes

  2. Referee: [Abstract] Abstract (offline testing sentence): Accuracy and recall gains on the similarity and analogy validation sets are stated without error bars, validation-set sizes, statistical tests, or details on RNN baseline training, undermining assessment of whether the reported improvements are reliable or load-bearing for the model comparison.

    Authors: We acknowledge the abstract omits these details. The full manuscript describes the validation sets and training, but we will revise the abstract to report validation-set sizes and note consistent improvements, while ensuring the results section includes error bars, statistical tests, and RNN baseline details. revision: yes

  3. Referee: [Abstract] Abstract (validation sets): The similarity and analogy tasks are not shown to operationalize serendipity (unexpected yet useful) rather than co-occurrence statistics already present in enrollment sequences; this leaves untested whether multifactor2vec similarity yields norm-compliant, diversified recommendations.

    Authors: These tasks validate that factor embeddings improve semantic capture beyond raw co-occurrence (e.g., disambiguating by instructor). The model is applied to generate recommendations similar to a specified favorite course while drawing from observed enrollment patterns that respect progression norms. The user study separately quantifies novelty. We will add discussion clarifying these tasks as proxies and their relation to serendipity within university constraints. revision: partial

Circularity Check

0 steps flagged

No circularity detected; derivation is self-contained

full rationale

The paper trains multifactor2vec (a skip-gram variant) on nine years of historic enrollment sequences, evaluates accuracy/recall gains on separate course similarity and analogy validation sets, and compares against an RNN baseline via a user study (N=70). No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are present in the provided text that reduce any claim to its inputs by construction. The chain from sequences to embeddings to similarity-based recommendations is standard and externally benchmarked, so the central claims remain independent of the training data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the central modeling choice (joint embedding of courses and factors) is treated as an engineering extension rather than a new theoretical entity.

pith-pipeline@v0.9.0 · 5822 in / 1109 out tokens · 32914 ms · 2026-05-25T10:29:42.611660+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages · 1 internal anchor

  1. [1]

    Zeinab Abbassi, Sihem Amer-Yahia, Laks VS Lakshmanan, Sergei Vassilvitskii, and Cong Yu. 2009. Getting recommender systems to think outside the box. In Proceedings of the third ACM conference on Recommender systems . ACM, 285–288

  2. [2]

    Marco Baroni, Georgiana Dinu, and Germán Kruszewski. 2014. Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , Vol. 1. 238–247

  3. [3]

    Marco Baroni and Alessandro Lenci. 2010. Distributional memory: A general framework for corpus-based semantics. Computational Linguistics 36, 4 (2010), 673–721

  4. [4]

    Sorathan Chaturapruek, Thomas Dee, Ramesh Johari, René Kizilcec, and Mitchell Stevens. 2018. How a data-driven course planning tool affects college students’ GPA: evidence from two field experiments. (2018)

  5. [5]

    Hung-Hsuan Chen. 2018. Behavior2Vec: Generating Distributed Representations of UsersâĂŹ Behaviors on Products for Recommender Systems.ACM Transactions on Knowledge Discovery from Data (TKDD) 12, 4 (2018), 43

  6. [6]

    D Manning Christopher, Raghavan Prabhakar, and Schacetzel Hinrich. 2008. Introduction to information retrieval. An Introduction To Information Retrieval 151, 177 (2008), 5

  7. [7]

    Martin Dillon. 1983. Introduction to modern information retrieval: G. Salton and M. McGill. McGraw-Hill, New York (1983). 448 pp., ISBN 0-07-054484-0

  8. [8]

    Rosta Farzan and Peter Brusilovsky. 2011. Encouraging user participation in a course recommender system: An impact on user behavior. Computers in Human Behavior 27, 1 (2011), 276–284

  9. [9]

    Li Fei-Fei, Rob Fergus, and Pietro Perona. 2006. One-shot learning of object categories. IEEE transactions on pattern analysis and machine intelligence 28, 4 (2006), 594–611

  10. [10]

    Kata Gábor, Haïfa Zargayouna, Isabelle Tellier, Davide Buscaldi, and Thierry Charnois. 2017. Exploring Vector Spaces for Semantic Relations. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing . 1814–1823

  11. [11]

    Balázs Hidasi, Massimo Quadrana, Alexandros Karatzoglou, and Domonkos Tikk. 2016. Parallel recurrent neural network architectures for feature-rich session-based recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems. ACM, 241–248

  12. [12]

    Noriaki Kawamae, Hitoshi Sakano, and Takeshi Yamada. 2009. Personalized recommendation based on the personal innovator degree. In Proceedings of the third ACM conference on Recommender systems . ACM, 329–332

  13. [13]

    Judy Kay. 2000. Stereotypes, student models and scrutability. In International Conference on Intelligent Tutoring Systems . Springer, 19–30

  14. [14]

    Lili Kotlerman, Ido Dagan, Idan Szpektor, and Maayan Zhitomirsky-Geffet. 2010. Directional distributional similarity for lexical inference. Natural Language Engineering 16, 4 (2010), 359–389

  15. [15]

    Omer Levy and Yoav Goldberg. 2014. Linguistic regularities in sparse and explicit word representations. InProceedings of the eighteenth conference on computational natural language learning. 171–180

  16. [16]

    Dekang Lin et al. 1998. An information-theoretic definition of similarity.. In Icml, Vol. 98. Citeseer, 296–304

  17. [17]

    Sean M McNee, John Riedl, and Joseph A Konstan. 2006. Being accurate is not enough: how accuracy metrics have hurt recommender systems. In CHI’06 extended abstracts on Human factors in computing systems . ACM, 1097–1101

  18. [18]

    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems . 3111–3119

  19. [19]

    Tien T Nguyen, Pik-Mai Hui, F Maxwell Harper, Loren Terveen, and Joseph A Konstan. 2014. Exploring the filter bubble: the effect of using recommender systems on content diversity. In Proceedings of the 23rd international conference on World wide web. ACM, 677–686

  20. [20]

    Gaurav Pandey, Denis Kotkov, and Alexander Semenov. 2018. Recommending serendipitous items using transfer learning. In Proceedings of the 27th ACM international conference on information and knowledge management . ACM, 1771– 1774

  21. [21]

    Aditya Parameswaran, Petros Venetis, and Hector Garcia-Molina. 2011. Rec- ommendation systems with complex constraints: A course recommendation perspective. ACM Transactions on Information Systems (TOIS) 29, 4 (2011), 20

  22. [22]

    Zachary A Pardos, Zihao Fan, and Weijie Jiang. 2019. Connectionist recom- mendation in the wild: on the utility and scrutability of neural networks for personalized course guidance. User Modeling and User-Adapted Interaction 29, 2 (2019), 487–525

  23. [23]

    Zachary A Pardos and Andrew Joo Hun Nam. 2018. A Map of Knowledge. CoRR preprint, abs/1811.07974 (2018). https://arxiv.org/abs/1811.07974

  24. [24]

    Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993. Distributional clustering of English words. In Proceedings of the 31st annual meeting on Association for Computational Linguistics. Association for Computational Linguistics, 183–190

  25. [25]

    Leonardo FR Ribeiro, Pedro HP Saverese, and Daniel R Figueiredo. 2017. struc2vec: Learning node representations from structural identity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . ACM, 385–394

  26. [26]

    Guy Shani and Asela Gunawardana. 2011. Evaluating recommendation systems. In Recommender systems handbook. Springer, 257–297

  27. [27]

    Peter D Turney and Patrick Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research 37 (2010), 141–188

  28. [28]

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems . 5998–6008

  29. [29]

    Oriol Vinyals, Charles Blundell, Tim Lillicrap, Daan Wierstra, et al. 2016. Match- ing networks for one shot learning. In Advances in Neural Information Processing Systems. 3630–3638

  30. [30]

    Yuan Cao Zhang, Diarmuid Ó Séaghdha, Daniele Quercia, and Tamas Jambor. 2012. Auralist: introducing serendipity into music recommendation. In Proceedings of the fifth ACM international conference on Web search and data mining . ACM, 13–22