pith. sign in

arxiv: 1907.11000 · v1 · pith:7EE7RREKnew · submitted 2019-07-25 · 💻 cs.IR · cs.HC· cs.LG

Personalised novel and explainable matrix factorisation

Pith reviewed 2026-05-24 16:01 UTC · model grok-4.3

classification 💻 cs.IR cs.HCcs.LG
keywords matrix factorizationrecommendation systemsnoveltyexplainabilitynDCG metricpersonalized recommendationsuser study
0
0 comments X

The pith

NEMF extends matrix factorization to trade off accuracy for novelty and explainability with minimal loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents NEMF, a new matrix factorization model that incorporates criteria for novelty and explainability into the optimization process. It also introduces an nDCG-based metric to quantify how explainable recommended items are. The approach allows recommendation systems to provide more novel suggestions that come with explanations, while accuracy remains close to standard matrix factorization. This matters because current systems often focus only on accuracy, missing opportunities to help users explore and understand recommendations. Experimental results and a user study back the feasibility of this multi-criteria optimization.

Core claim

The authors claim that their NEMF model, by augmenting the matrix factorization objective with novelty and a new nDCG-based explainability metric, enables trading off performance on novelty and explainability against accuracy, achieving high accuracy alongside novel and explainable recommendations.

What carries the argument

NEMF, the novel and explainable matrix factorization model that jointly optimizes accuracy, novelty, and an nDCG-based explainability metric.

If this is right

  • Recommendation systems can produce novel items that advance user exploration.
  • Explanations can be provided for recommendations based on the explainability metric.
  • Accuracy is only minimally compromised when adding novelty and explainability objectives.
  • The nDCG metric can distinguish more explainable items from less explainable ones.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method could generalize to other collaborative filtering techniques.
  • Domain-specific adaptations of the explainability metric might improve performance in different recommendation contexts.
  • Future work could explore dynamic adjustment of the trade-off parameters based on user preferences.

Load-bearing premise

The nDCG-based explainability metric correctly measures and ranks items by their explainability to users independently of accuracy, allowing joint optimization in matrix factorization without biasing the learned factors.

What would settle it

An experiment or user study showing that items with high nDCG explainability scores are not perceived as more explainable by users, or that optimizing for novelty and explainability causes a large drop in accuracy metrics.

Figures

Figures reproduced from arXiv: 1907.11000 by Ludovik Coba, Markus Zanker, Panagiotis Symeonidis.

Figure 1
Figure 1. Figure 1: User style explanation example of the online retailer Amazon. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An explanation interface that also uses the explainability power of nearest neigh [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: A second explanation interface that uses the explainability power of nearest neigh [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Preference weights of the multinomial logit model. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Relevant preference and standard deviation of explanation styles [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sensitivity of NMF to changes of the novelty regularization term for (a) the ML100K [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Sensitivity of EMF to changes of the explainability regularisation term for the (a) [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Sensitivity of the NEMF in terms of nDCG to changes of novelty and explainability [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
read the original abstract

Recommendation systems personalise suggestions to individuals to help them in their decision making and exploration tasks. In the ideal case, these recommendations, besides of being accurate, should also be novel and explainable. However, up to now most platforms fail to provide both, novel recommendations that advance users' exploration along with explanations to make their reasoning more transparent to them. For instance, a well-known recommendation algorithm, such as matrix factorisation (MF), optimises only the accuracy criterion, while disregarding other quality criteria such as the explainability or the novelty, of recommended items. In this paper, to the best of our knowledge, we propose a new model, denoted as NEMF, that allows to trade-off the MF performance with respect to the criteria of novelty and explainability, while only minimally compromising on accuracy. In addition, we recommend a new explainability metric based on nDCG, which distinguishes a more explainable item from a less explainable item. An initial user study indicates how users perceive the different attributes of these "user" style explanations and our extensive experimental results demonstrate that we attain high accuracy by recommending also novel and explainable items.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes NEMF, a matrix factorization extension that jointly optimizes accuracy with novelty and a new nDCG-based explainability metric. It claims this enables trading off the three criteria such that novel and explainable items can be recommended while only minimally compromising accuracy, backed by experiments and an initial user study on explanation perception.

Significance. If the central claims hold after verification, the work would offer a concrete multi-objective MF formulation and a novel proxy metric for explainability, potentially useful for platforms seeking transparent and exploratory recommendations. The user study component provides direct evidence on user perception, which is a strength.

major comments (2)
  1. [Abstract and model definition] The nDCG-based explainability metric (introduced to distinguish more from less explainable items) is integrated into the NEMF objective alongside novelty; however, no quantitative check is described to confirm it ranks items independently of accuracy signals such as popularity or reconstruction error. If the metric correlates with these, the reported 'minimal compromise on accuracy' may be an artifact of the joint optimization rather than a controlled trade-off.
  2. [Abstract] The abstract asserts experimental results attaining high accuracy with novel/explainable items and an initial user study, but supplies no dataset details, baseline comparisons, error bars, or description of how the novelty-explainability trade-off weights are chosen or validated. This prevents assessment of whether the multi-objective fit avoids post-hoc selection bias.
minor comments (2)
  1. Notation for the trade-off parameters and the precise form of the NEMF loss function should be made explicit with equations to allow reproduction.
  2. The user study description would benefit from more detail on participant numbers, task design, and statistical analysis to strengthen the perception claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below.

read point-by-point responses
  1. Referee: [Abstract and model definition] The nDCG-based explainability metric (introduced to distinguish more from less explainable items) is integrated into the NEMF objective alongside novelty; however, no quantitative check is described to confirm it ranks items independently of accuracy signals such as popularity or reconstruction error. If the metric correlates with these, the reported 'minimal compromise on accuracy' may be an artifact of the joint optimization rather than a controlled trade-off.

    Authors: We acknowledge that the manuscript does not include an explicit quantitative verification of the explainability metric's independence from accuracy signals. The metric is defined via nDCG on user-specific explanation lists derived from interaction history, which is conceptually distinct from global popularity or reconstruction error. To address the concern, we will add a correlation analysis (Spearman rank) between explainability scores, item popularity, and MF reconstruction errors in the revised version. revision: yes

  2. Referee: [Abstract] The abstract asserts experimental results attaining high accuracy with novel/explainable items and an initial user study, but supplies no dataset details, baseline comparisons, error bars, or description of how the novelty-explainability trade-off weights are chosen or validated. This prevents assessment of whether the multi-objective fit avoids post-hoc selection bias.

    Authors: The abstract is space-constrained and therefore omits these specifics, which are provided in the body: datasets in Section 4.1, baselines and comparisons in Sections 4.2 and 5, error bars (standard deviation over 5 runs) in all result figures, and weight selection via grid search on validation sets (detailed in Section 5.1) prior to test evaluation to avoid post-hoc bias. We will expand the abstract with a short clause on datasets and validation if permitted. revision: partial

Circularity Check

0 steps flagged

No circularity: derivation relies on external optimization and user study rather than self-definition

full rationale

The paper introduces NEMF as a multi-objective extension of matrix factorization and a new nDCG-based explainability metric. The central claim of trading off novelty/explainability against accuracy is presented as the outcome of joint optimization and experimental validation, not as a quantity defined in terms of itself. No equations reduce a reported prediction to a fitted parameter by construction, no uniqueness theorem is imported from self-citation, and the metric is motivated by a user study rather than renamed from prior results. The derivation chain therefore remains self-contained against the stated objectives.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the domain assumption that matrix factorization admits a multi-objective formulation whose trade-off parameters can be set without destroying the latent-factor semantics, plus the ad-hoc assumption that an nDCG-derived score measures explainability in a way that aligns with user perception.

free parameters (1)
  • novelty-explainability trade-off weights
    The model is described as allowing a trade-off, implying tunable weights that are fitted or chosen to achieve the reported balance between accuracy, novelty, and explainability.
axioms (2)
  • domain assumption Matrix factorization objectives can be extended to include novelty and explainability terms while preserving the core low-rank approximation property.
    Invoked when the paper states that NEMF trades off MF performance with novelty and explainability.
  • ad hoc to paper nDCG can serve as a valid proxy for item explainability that distinguishes more from less explainable items.
    The paper introduces this as a new metric without citing prior validation for the explainability use case.

pith-pipeline@v0.9.0 · 5739 in / 1556 out tokens · 30349 ms · 2026-05-24T16:01:04.173042+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Abdollahi and O

    B. Abdollahi and O. Nasraoui. Explainable matrix factorization for col- laborative filtering. In Proceedings of the 25th International Conference Companion on World Wide Web , WWW ’16 Companion, pages 5–6, 2016

  2. [2]

    Abdollahi and O

    B. Abdollahi and O. Nasraoui. Using explainability for constrained matrix factorization. In Proceedings of the Eleventh ACM Conference on Recom- mender Systems , RecSys ’17, pages 79–83, New York, NY, USA, 2017. ACM

  3. [3]

    Bilgic and R

    M. Bilgic and R. Mooney. Explaining recommendations: Satisfaction vs. promotion. In Proccedings Recommender Systems Workshop (IUI Confer- ence), 2005

  4. [4]

    Carbonell and J

    J. Carbonell and J. Goldstein. The use of MMR, diversity-based rerank- ing for reordering documents and producing summaries. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR 98 . ACM Press, 1998

  5. [5]

    Castells, N

    P. Castells, N. J. Hurley, and S. Vargas. Novelty and diversity in recom- mender systems. In F. Ricci, L. Rokach, and B. Shapira, editors, Recom- mender Systems Handbook, 2nd edition , pages 881–918, 2015

  6. [6]

    Charles, M

    C. Charles, M. Kolla, G. Cormack, O. Vechtomova, A. Ashkan, S. Buttcher, and I. MacKinnon. Novelty and diversity in information retrieval evalua- tion. In SIGIR Conference, SIGIR ’08, pages 659–666, 2008. 26

  7. [7]

    Cheng, S

    P. Cheng, S. Wang, J. Ma, J. Sun, and H. Xiong. Learning to recom- mend accurate and diverse items. In Proceedings of the 26th International Conference on World Wide Web , WWW ’17, pages 183–192, 2017

  8. [8]

    Cremonesi, Y

    P. Cremonesi, Y. Koren, and R. Turrin. Performance of recommender algorithms on top-n recommendation tasks. In Proceedings of the Fourth ACM Conference on Recommender Systems, RecSys ’10, pages 39–46, New York, NY, USA, 2010. ACM

  9. [9]

    Friedrich and M

    G. Friedrich and M. Zanker. A taxonomy for generating explanations in recommender systems. In AI Magazine. Citeseer, 2011

  10. [10]

    M. Ge, C. Delgado-Battenfeld, and D. Jannach. Beyond accuracy: Evalu- ating recommender systems by coverage and serendipity. In Proceedings of the Fourth ACM Conference on Recommender Systems , RecSys ’10, pages 257–260, New York, NY, USA, 2010. ACM

  11. [11]

    F. M. Harper and J. A. Konstan. The MovieLens Datasets. ACM Trans- actions on Interactive Intelligent Systems , 5(4):1–19, dec 2015

  12. [12]

    Herlocker, J

    J. Herlocker, J. Konstan, and J. Riedl. Explaining collaborative filtering recommendations. In Computer Supported Cooperative Work , pages 241– 250, 2000

  13. [13]

    N. Hurley. Personalised ranking with diversity. In Proceedings of the Sev- enth ACM Conference on Recommender Systems , RecSys ’13, Hong Kong, China, 2013. ACM

  14. [14]

    Jannach, L

    D. Jannach, L. Lerche, I. Kamehkhosh, and M. Jugovac. What recom- menders recommend: An analysis of recommendation biases and possible countermeasures. User Modeling and User-Adapted Interaction, 25(5):427– 491, 2015

  15. [15]

    Jannach, P

    D. Jannach, P. Resnick, A. Tuzhilin, and M. Zanker. Recommender systems — beyond matrix completion. Commun. ACM, 59(11):94–102, Oct. 2016

  16. [16]

    Kaya and D

    M. Kaya and D. Bridge. Intent-aware diversification using item-based sub- profiles. In Proceedings of the Poster Track of the 11th ACM Conference on Recommender Systems (RecSys 2017), Como, Italy, August 28, 2017. , 2017

  17. [17]

    Koren, R

    Y. Koren, R. Bell, and C. Volinsky. Matrix Factorization Techniques for Recommender Systems. Computer, 42(8):42–49, 2009

  18. [18]

    J. J. Louviere, T. N. Flynn, and R. T. Carson. Discrete choice experiments are not conjoint analysis. Journal of Choice Modelling , 3(3):57–72, 2010

  19. [19]

    Morup and L

    M. Morup and L. H. Clemmensen. Multiplicative updates for the lasso. In Machine Learning for Signal Processing, 2007 IEEE Workshop on , pages 33–38. IEEE, 2007. 27

  20. [20]

    Ning and G

    X. Ning and G. Karypis. Slim: Sparse linear methods for top-n recom- mender systems. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 497–506. IEEE, 2011

  21. [21]

    Ning and G

    X. Ning and G. Karypis. Sparse linear methods with side information for top-n recommendations. In Proceedings of the sixth ACM conference on Recommender systems, pages 155–162. ACM, 2012

  22. [22]

    Papadimitriou, P

    A. Papadimitriou, P. Symeonidis, and Y. Manolopoulos. A generalized taxonomy of explanations styles for traditional and social recommender systems. Data Mining and Knowledge Discovery , 24(3):555–583, 2012

  23. [23]

    Rendle, C

    S. Rendle, C. Freudenthaler, Z. Gantner, and S.-T. Lars. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty- Fifth Conference on Uncertainty in Artificial Intelligence , UAI ’09, pages 452–461, Arlington, Virginia, United States, 2009. AUAI Press

  24. [24]

    M. T. Ribeiro, S. Singh, and C. Guestrin. ”why should I trust you?”: Ex- plaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Min- ing, San Francisco, CA, USA, August 13-17, 2016 , pages 1135–1144, 2016

  25. [25]

    R. L. Santos, C. Macdonald, and I. Ounis. Exploiting query reformulations for web search result diversification. InProceedings of the 19th international conference on World wide web - WWW 10 . ACM Press, 2010

  26. [26]

    S. Vargas. New approaches to diversity and novelty in recommender sys- tems. In Fourth BCS-IRSG symposium on future directions in information access (FDIA 2011), Koblenz , volume 31, 2011

  27. [27]

    Vargas, L

    S. Vargas, L. Baltrunas, A. Karatzoglou, and P. Castells. Coverage, redun- dancy and size-awareness in genre diversity for recommender systems. In Proceedings of the 8th ACM Conference on Recommender systems - RecSys

  28. [28]

    Vargas and P

    S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics for recommender systems. In Proceedings of the Fifth ACM Con- ference on Recommender Systems , RecSys ’11, pages 109–116, New York, NY, USA, 2011. ACM

  29. [29]

    Wasilewski and N

    J. Wasilewski and N. Hurley. Incorporating diversity in a learning to rank recommender system. In Proceedings of the Twenty-Ninth International Flairs Conference, FLAIRS ’16, 2016

  30. [30]

    H. Yin, B. Cui, J. Li, J. Yao, and C. Chen. Challenging the long tail recommendation. Very Large Scale Data bases, 2012

  31. [31]

    Zhen, W.-J

    Y. Zhen, W.-J. Li, and D.-Y. Yeung. Tagicofi: tag informed collaborative filtering. In Proceedings of the third ACM conference on Recommender systems, pages 69–76. ACM, 2009. 28