Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

Baisong Liu; Chengkai Wang

arxiv: 2603.03080 · v2 · submitted 2026-03-03 · 💻 cs.AI

Beyond Factual Correctness: Mitigating Preference-Inconsistent Explanations in Explainable Recommendation

Chengkai Wang , Baisong Liu This is my paper

Pith reviewed 2026-05-15 16:46 UTC · model grok-4.3

classification 💻 cs.AI

keywords explainable recommendationpreference inconsistencyLLM explanationsmulti-hop reasoning pathsselect-then-generateuser-centric evaluationfactual grounding

0 comments

The pith

PURE selects compact multi-hop paths aligned with user preferences to cut inconsistent explanations in LLM recommenders.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM recommenders often generate fluent explanations that are factually correct yet cite attributes clashing with a user's past choices, producing unconvincing reasoning missed by standard checks. The paper proposes PURE, a select-then-generate approach that first picks a small set of multi-hop item-centric paths using intent, specificity, and diversity rules to keep evidence both grounded and preference-aligned. These paths feed into structure-aware prompting for the final output. Tests on three datasets show drops in inconsistent explanations and hallucinations while holding recommendation accuracy, explanation quality, and speed steady.

Core claim

PURE intervenes at evidence selection rather than only at generation: it extracts a compact collection of multi-hop item-centric reasoning paths that satisfy factual grounding and alignment with latent user preference structure, chosen via heuristics for user intent, specificity, and diversity, then injects them through structure-aware prompting that preserves relational constraints. A new feature-level user-centric metric quantifies the preference inconsistency overlooked by factuality-only measures.

What carries the argument

The select-then-generate paradigm that chooses compact multi-hop item-centric reasoning paths guided by intent, specificity, and diversity heuristics before structure-aware LLM prompting.

If this is right

Explanations gain persuasiveness by matching historical preferences rather than only stating true facts.
The new metric exposes misalignment that factuality scores alone cannot detect.
Recommendation accuracy, explanation quality, and inference speed remain comparable to prior methods.
Factual hallucinations decline as a side effect of the tighter evidence selection.
Trustworthy explanations require joint satisfaction of factual correctness and preference alignment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Future systems could replace the three heuristics with learned scorers to handle larger knowledge graphs without manual tuning.
The same selection logic might apply to personalized search or dialogue agents where outputs must respect user history.
Preference alignment could become a standard second check alongside factuality in any LLM explanation pipeline.
Datasets with denser user-item graphs might show larger gains if the path selection covers more subtle preference signals.

Load-bearing premise

Compact multi-hop paths can be chosen with only those three heuristics so they stay both factually correct and aligned with user preferences without losing important evidence or creating fresh inconsistencies.

What would settle it

A controlled test on a dataset with explicit preference graphs where PURE's chosen paths produce user-rated explanations that are more inconsistent with history than a random or factuality-only baseline.

Figures

Figures reproduced from arXiv: 2603.03080 by Baisong Liu, Chengkai Wang.

**Figure 2.** Figure 2: Overview of PURE. It contains five components: Structure-Enhanced Semantic Indexing, Target-Aware User Intent, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Pairwise human evaluation on explainability be [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Efficiency Analysis of PURE. 5.6 Human Alignment of P-EHR (RQ5) To validate the robustness of P-EHR, we conducted a blind human evaluation comparing PURE with G-Refer on 450 pairs. Annotators were presented with a summary of the user’s past reviews and the target item, and were asked to identify which of two anonymized explanations better aligned with user preferences (or indicate a tie). We recruited 15 g… view at source ↗

**Figure 4.** Figure 4: Parameter sensitivity analysis of the PURE. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

LLM-based explainable recommenders can produce fluent explanations that are factually correct, yet still justify items using attributes that conflict with a user's historical preferences. Such preference-inconsistent explanations yield logically valid but unconvincing reasoning and are largely missed by standard hallucination or faithfulness metrics. We formalize this failure mode and propose PURE, a preference-aware reasoning framework following a select-then-generate paradigm. Instead of only improving generation, PURE intervenes in evidence selection, it selects a compact set of multi-hop item-centric reasoning paths that are both factually grounded and aligned with user preference structure, guided by user intent, specificity, and diversity to suppress generic, weakly personalized evidence. The selected evidence is then injected into LLM generation via structure-aware prompting that preserves relational constraints. To measure preference inconsistency, we introduce a feature-level, user-centric evaluation metric that reveals misalignment overlooked by factuality-based measures. Experiments on three real-world datasets show that PURE consistently reduces preference-inconsistent explanations and factual hallucinations while maintaining competitive recommendation accuracy, explanation quality, and inference efficiency. These results highlight that trustworthy explanations require not only factual correctness but also justification aligned with user preferences.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper usefully calls out preference inconsistency as a separate issue from factuality in LLM explanations and offers a selection-based fix, though the heuristic grounding needs more validation.

read the letter

The paper's main point is that LLM recommenders often generate explanations that are factually correct but still rely on attributes or paths that don't align with what the user has preferred in the past. This makes them less trustworthy even when they pass standard checks for hallucinations. PURE tries to handle this by shifting the focus to selecting the right evidence first using a set of heuristics before the generation step. What stands out as new is the combination of the select-then-generate paradigm with guidance from user intent, specificity, and diversity to choose compact multi-hop item-centric paths. They also define a feature-level metric that looks at inconsistency from the user's perspective rather than just factual accuracy. The experiments claim this leads to fewer inconsistent explanations and hallucinations on three real-world datasets without major losses in accuracy or speed. The approach has some merit in targeting a practical issue that current methods overlook. However, the reliance on those heuristics for selection raises questions because they don't include a direct model of the user's preference structure from history. It's possible that the selected paths still justify items in ways that conflict with latent preferences or miss key evidence. The abstract and results don't provide enough on statistical significance, baseline comparisons, or whether the new metric matches human judgments, which weakens the case for the improvements being robust. This kind of work is aimed at researchers and practitioners building explainable recommendation systems with LLMs. A reader focused on user trust and explanation quality would get value from seeing how they formalize and measure this inconsistency. Given that it engages honestly with the limitations of existing metrics and offers a concrete intervention, it deserves to go through peer review so others can assess the details of the implementation and experiments.

Referee Report

2 major / 1 minor

Summary. The paper proposes PURE, a select-then-generate framework for LLM-based explainable recommendation that intervenes at the evidence-selection stage by choosing compact multi-hop item-centric reasoning paths guided by user-intent, specificity, and diversity heuristics. These paths are intended to be simultaneously factually grounded and aligned with latent user preferences; the selected evidence is then injected via structure-aware prompting. The authors introduce a feature-level, user-centric metric to quantify preference inconsistency (distinct from standard factuality or hallucination measures) and report that PURE reduces both preference-inconsistent explanations and factual hallucinations on three real-world datasets while preserving recommendation accuracy, explanation quality, and inference efficiency.

Significance. If the empirical claims are substantiated, the work identifies a practically important failure mode—factually valid yet preference-misaligned explanations—that is missed by existing metrics and offers a lightweight, heuristic-driven intervention that does not require retraining the underlying recommender or LLM. The new metric and the explicit separation of selection from generation could influence evaluation practices in explainable recommendation and encourage future systems to treat preference consistency as a first-class requirement alongside factual correctness.

major comments (2)

[Experimental Evaluation] Experimental Evaluation: the abstract states that PURE 'consistently reduces preference-inconsistent explanations' across three datasets, yet provides no information on the precise baselines, statistical significance tests, effect sizes, or ablation results isolating the contribution of the intent/specificity/diversity heuristics versus simpler selection strategies. Without these details the central empirical claim cannot be verified.
[Methodology] Methodology (select-then-generate paradigm): the path-selection heuristics are described as operating on user intent, specificity, and diversity, but no explicit mechanism (e.g., a learned user-preference embedding, consistency loss, or historical-interaction constraint) is given that ties the selected multi-hop paths to the user's latent preference structure. Consequently, it remains possible for the chosen paths to justify items via attributes that conflict with observed user history, undermining the claim that the framework reliably enforces preference alignment.

minor comments (1)

[Abstract / Methodology] The abstract refers to 'structure-aware prompting that preserves relational constraints' without specifying the prompting template or how relational structure is encoded; a concrete example in the main text would improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our work. The comments highlight important areas for strengthening the experimental reporting and methodological clarity. We address each major comment below and outline the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Experimental Evaluation] Experimental Evaluation: the abstract states that PURE 'consistently reduces preference-inconsistent explanations' across three datasets, yet provides no information on the precise baselines, statistical significance tests, effect sizes, or ablation results isolating the contribution of the intent/specificity/diversity heuristics versus simpler selection strategies. Without these details the central empirical claim cannot be verified.

Authors: We agree that the experimental section requires more granular reporting to allow full verification of the claims. The full manuscript already compares PURE against standard LLM-based recommenders and prior explainable methods on three datasets, but we will expand the revision to explicitly list all baselines, report statistical significance via paired t-tests with p-values, include effect sizes (e.g., Cohen's d for the reduction in preference inconsistency), and add ablation tables that isolate the individual and combined contributions of the user-intent, specificity, and diversity heuristics against simpler alternatives such as random path selection or attribute-frequency baselines. These additions will be placed in the experimental evaluation section and will not alter the core results. revision: yes
Referee: [Methodology] Methodology (select-then-generate paradigm): the path-selection heuristics are described as operating on user intent, specificity, and diversity, but no explicit mechanism (e.g., a learned user-preference embedding, consistency loss, or historical-interaction constraint) is given that ties the selected multi-hop paths to the user's latent preference structure. Consequently, it remains possible for the chosen paths to justify items via attributes that conflict with observed user history, undermining the claim that the framework reliably enforces preference alignment.

Authors: The selection heuristics are explicitly derived from each user's historical interaction data: user intent is inferred from the most frequent attributes appearing in the user's past purchases, specificity filters for attributes that appear in the target item's description but are rare in the user's history only when they align with observed patterns, and diversity ensures coverage across distinct preference dimensions. No learned embedding or auxiliary loss is used because the framework is designed to remain training-free and lightweight. We acknowledge that the current description could be clearer on the historical grounding; in the revision we will add pseudocode for the selection procedure, concrete examples from the datasets showing how conflicting attributes are excluded, and a short discussion of why the heuristic approach suffices for alignment without explicit optimization. This will strengthen rather than change the methodology. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces the PURE framework as a select-then-generate paradigm that intervenes on evidence selection using intent, specificity, and diversity heuristics before LLM generation. No equations, derivations, or fitted parameters are described that reduce the claimed reductions in preference inconsistency to quantities defined by the same inputs or by self-citation chains. The central claims rest on experimental results across three datasets rather than any self-referential construction, making the framework self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, the approach rests on the existence of multi-hop paths in an item-attribute graph and the assumption that user preference structure can be approximated from historical interactions; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5503 in / 1168 out tokens · 47468 ms · 2026-05-15T16:46:06.101555+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

[1]

Nitay Calderon, Liat Ein Dor, and Roi Reichart. 2025. Multi-domain explainabil- ity of preferences. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 14553–14586

work page 2025
[2]

Jinpeng Chen, Jianxiang He, Huan Li, Senzhang Wang, Yuan Cao, Kaimin Wei, Zhenye Yang, and Ye Ji. 2025. Hierarchical Intent-guided Optimization with Pluggable LLM-Driven Semantics for Session-based Recommendation. In Pro- ceedings of the 48th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval (Padua, Italy) (SIGIR ’2...

work page doi:10.1145/3726302.3729994 2025
[3]

Junyi Chen, Mengjia Wu, Qian Liu, and Yi Zhang. 2026. Explainable prediction of knowledge recombination: A synergized method with heterogeneous hyper- graph learning and large language models. Information Processing & Manage- ment 63, 1 (2026), 104336

work page 2026
[4]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv e-prints (2024), arXiv–2407

work page 2024
[5]

Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Barcelona, Spain) (KDD ’24) . Association for Computing Machinery, New York, NY...

work page 2024
[6]

doi:10.1145/3637528.3671470

work page doi:10.1145/3637528.3671470
[8]

Shijie Geng, Zuohui Fu, Juntao Tan, Yingqiang Ge, Gerard De Melo, and Yongfeng Zhang. 2022. Path language modeling over knowledge graphsfor ex- plainable recommendation. In Proceedings of the ACM Web Conference 2022. 946– 955

work page 2022
[9]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1, 2 (2022), 3

work page 2022
[10]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446

work page 2002
[11]

Sara Kemper, Justin Cui, Kai Dicarlantonio, Kathy Lin, Danjie Tang, Anton Ko- rikov, and Scott Sanner. 2024. Retrieval-augmented conversational recommen- dation with prompt-based semi-structured natural language state tracking. In Proceedings of the 47th International ACM SIGIR Conference on Research and De- velopment in Information Retrieval . 2786–2790

work page 2024
[12]

Jieyong Kim, Hyunseo Kim, Hyunjin Cho, SeongKu Kang, Buru Chang, Jinyoung Yeo, and Dongha Lee. 2025. Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy) (SIGIR ’25). Association for Co...

work page doi:10.1145/3726302.3730055 2025
[13]

Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, and Xing Xie. 2023. RecExplainer: Aligning Large Language Models for Recommendation Model In- terpretability. ArXiv abs/2311.10947 (2023). https://api.semanticscholar.org/ CorpusID:265294974

work page arXiv 2023
[14]

Guanrong Li, Haolin Yang, Xinyu Liu, Zhen Wu, and Xinyu Dai. 2025. Counterfactual Language Reasoning for Explainable Recommendation Systems. arXiv:2503.08051 [cs.AI] https://arxiv.org/abs/2503.08051

work page arXiv 2025
[15]

Lei Li, Li Chen, and Yongfeng Zhang. 2020. Towards Controllable Explanation Generation for Recommender Systems via Neural Template. In WWW Demo

work page 2020
[16]

Lei Li, Yongfeng Zhang, and Li Chen. 2020. Generate Neural Template Explana- tions for Recommendation. In CIKM

work page 2020
[17]

Lei Li, Yongfeng Zhang, and Li Chen. 2023. Personalized Prompt Learning for Explainable Recommendation. ACM Trans. Inf. Syst. 41, 4, Article 103 (March 2023), 26 pages. doi:10.1145/3580488

work page doi:10.1145/3580488 2023
[18]

Xinze Li, Yushi Bai, Bowen Jin, Fengbin Zhu, Liangming Pan, and Yixin Cao

work page
[19]

ISBN 9798400715921

Long Context vs. RAG: Strategies for Processing Long Documents in LLMs. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy) (SIGIR ’25) . Association for Computing Machinery, New York, NY, USA, 4110–4113. doi:10.1145/3726302. 3731690

work page doi:10.1145/3726302
[20]

Yuhan Li, Xinni Zhang, Linhao Luo, Heng Chang, Yuxiang Ren, Irwin King, and Jia Li. 2025. G-refer: Graph retrieval-augmented large language model for ex- plainable recommendation. In Proceedings of the ACM on Web Conference 2025 . 240–251

work page 2025
[21]

Zelong Li, Yan Liang, Ming Wang, Sungro Yoon, Jiaying Shi, Xin Shen, Xiang He, Chenwei Zhang, Wenyi Wu, Hanbo Wang, et al. 2024. Explainable and coherent complement recommendation based on large language models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Manage- ment. 4678–4685

work page 2024
[22]

Ziyu Li, Zhijie Tan, Suhuan Wu, Weiping Li, and Tong Mo. 2026. STLLM-Rec: enhancing explainable recommendation via self-training LLMs. World Wide Web 29, 1 (2026), 11

work page 2026
[23]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81

work page 2004
[24]

Zhuang Liu, Yunpu Ma, Matthias Schubert, Yuanxin Ouyang, Wenge Rong, and Zhang Xiong. 2023. Multimodal contrastive transformer for explainable rec- ommendation. IEEE Transactions on Computational Social Systems 11, 2 (2023), 2632–2643

work page 2023
[25]

Zunlong Liu, Yang Xu, Gao Cong, Lei Zhu, Qinjun Qiu, and Huaxiang Zhang

work page
[26]

ACM Trans

ARTS: A General and Efficient Multi-Task Self-Prompt Framework for Explainable Sequential Recommendation. ACM Trans. Inf. Syst. 43, 3, Article 73 (March 2025), 30 pages. doi:10.1145/3717833

work page doi:10.1145/3717833 2025
[27]

Yucong Luo, Mingyue Cheng, Hao Zhang, Junyu Lu, and Enhong Chen. 2024. Unlocking the potential of large language models for explainable recommenda- tions. In International Conference on Database Systems for Advanced Applications. Springer, 286–303

work page 2024
[28]

Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, and Haofen Wang

work page
[29]

Unifying Large Language Models and Knowledge Graphs for Question Answering: Recent Advances and Opportunities.. In EDBT. 1174–1177

work page
[30]

Sicheng Pan, Dongsheng Li, Hansu Gu, Tun Lu, Xufang Luo, and Ning Gu. 2022. Accurate and explainable recommendation via review rationalization. InProceed- ings of the ACM web conference 2022 . 3092–3101

work page 2022
[31]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics . 311–318

work page 2002
[32]

Sani, Asal Meskin, Mohammad Amanlou, and Hamid R

S.M.F. Sani, Asal Meskin, Mohammad Amanlou, and Hamid R. Rabiee

work page
[33]

ArXiv abs/2508.05225 (2025)

FIRE: Faithful Interpretable Recommendation Explanations. ArXiv abs/2508.05225 (2025). https://api.semanticscholar.org/CorpusID:280546117

work page arXiv 2025
[34]

Teng Shi, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, and Han Li. 2025. Retrieval Augmented Generation with Collaborative Filtering for Per- sonalized Text Generation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy) (SIGIR ’25) . Association for Computing Machine...

work page 2025
[35]

doi:10.1145/3726302.3730075

work page doi:10.1145/3726302.3730075
[36]

Yiqun Sun, Qiang Huang, Yixuan Tang, Anthony KH Tung, and Jun Yu. 2024. A general framework for producing interpretable semantic text embeddings. arXiv preprint arXiv:2410.03435 (2024)

work page arXiv 2024
[37]

Yan-Martin Tamm, Rinchin Damdinov, and Alexey Vasilev. 2021. Quality metrics in recommender systems: Do we calculate metrics consistently?. In Proceedings of the 15th ACM conference on recommender systems . 708–713

work page 2021
[38]

Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, and Rui Wang. 2020. Re- lational graph attention network for aspect-based sentiment analysis. arXiv preprint arXiv:2004.12362 (2020)

work page arXiv 2020
[39]

Shijie Wang, Wenqi Fan, Yue Feng, Lin Shanru, Xinyu Ma, Shuaiqiang Wang, and Dawei Yin. 2025. Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , Wanxi- ang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher ...

work page doi:10.18653/v1/2025.acl-long.1317 2025
[40]

Shijie Wang, Wenqi Fan, Yue Feng, Lin Shanru, Xinyu Ma, Shuaiqiang Wang, and Dawei Yin. 2025. Knowledge graph retrieval-augmented generation for llm- based recommendation. In Proceedings of the 63rd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers) . 27152–27168. Beyond Factual Correctness: Mitigating Preference-Inc...

work page 2025
[41]

Cedric Waterschoot, Nava Tintarev, and Francesco Barile. 2025. Consistent Ex- plainers or Unreliable Narrators? Understanding LLM-generated Group Recom- mendations. In Proceedings of the Nineteenth ACM Conference on Recommender Systems. 539–544

work page 2025
[42]

Ching-Wen Yang, Zhi-Quan Feng, Ying-Jia Lin, Che Wei Chen, Kun-da Wu, Hao Xu, Yao Jui-Feng, and Hung-Yu Kao. 2025. Maple: Enhancing review generation with multi-aspect prompt learning in explainable recommendation. In Proceed- ings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 31803–31821

work page 2025
[43]

Mengyuan Yang, Mengying Zhu, Yan Wang, Linxun Chen, Yilei Zhao, Xiuyuan Wang, Bing Han, Xiaolin Zheng, and Jianwei Yin. 2024. Fine-tuning large lan- guage model based explainable recommendation with explainable quality re- ward. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelli- gence and Thirty-Sixth Conference on Innovative Appli...

work page 2024
[44]

Mengyuan Yang, Mengying Zhu, Yan Wang, Linxun Chen, Yilei Zhao, Xiuyuan Wang, Bing Han, Xiaolin Zheng, and Jianwei Yin. 2024. Fine-tuning large lan- guage model based explainable recommendation with explainable quality re- ward. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 38. 9250–9259

work page 2024
[45]

Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim

work page
[46]

Advances in neural information processing systems 32 (2019)

Graph transformer networks. Advances in neural information processing systems 32 (2019)

work page 2019
[47]

Yuting Zhang, Ying Sun, Fuzhen Zhuang, Yongchun Zhu, Zhulin An, and Yongjun Xu. 2023. Triple dual learning for opinion-based explainable recom- mendation. ACM Transactions on Information Systems 42, 3 (2023), 1–27

work page 2023
[48]

Wayne Xin Zhao, Gaole He, Kunlin Yang, Hong-Jian Dou, Jin Huang, Siqi Ouyang, and Ji-Rong Wen. 2019. KB4Rec: A Data Set for Linking Knowl- edge Bases with Recommender Systems. Data Intelligence 1, 2 (2019), 121–136. doi:10.1162/dint_a_00008

work page doi:10.1162/dint_a_00008 2019

[1] [1]

Nitay Calderon, Liat Ein Dor, and Roi Reichart. 2025. Multi-domain explainabil- ity of preferences. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. 14553–14586

work page 2025

[2] [2]

Jinpeng Chen, Jianxiang He, Huan Li, Senzhang Wang, Yuan Cao, Kaimin Wei, Zhenye Yang, and Ye Ji. 2025. Hierarchical Intent-guided Optimization with Pluggable LLM-Driven Semantics for Session-based Recommendation. In Pro- ceedings of the 48th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval (Padua, Italy) (SIGIR ’2...

work page doi:10.1145/3726302.3729994 2025

[3] [3]

Junyi Chen, Mengjia Wu, Qian Liu, and Yi Zhang. 2026. Explainable prediction of knowledge recombination: A synergized method with heterogeneous hyper- graph learning and large language models. Information Processing & Manage- ment 63, 1 (2026), 104336

work page 2026

[4] [4]

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv e-prints (2024), arXiv–2407

work page 2024

[5] [5]

Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li. 2024. A Survey on RAG Meeting LLMs: Towards Retrieval-Augmented Large Language Models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (Barcelona, Spain) (KDD ’24) . Association for Computing Machinery, New York, NY...

work page 2024

[6] [6]

doi:10.1145/3637528.3671470

work page doi:10.1145/3637528.3671470

[7] [8]

Shijie Geng, Zuohui Fu, Juntao Tan, Yingqiang Ge, Gerard De Melo, and Yongfeng Zhang. 2022. Path language modeling over knowledge graphsfor ex- plainable recommendation. In Proceedings of the ACM Web Conference 2022. 946– 955

work page 2022

[8] [9]

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. 2022. Lora: Low-rank adaptation of large language models. ICLR 1, 2 (2022), 3

work page 2022

[9] [10]

Kalervo Järvelin and Jaana Kekäläinen. 2002. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems (TOIS) 20, 4 (2002), 422–446

work page 2002

[10] [11]

Sara Kemper, Justin Cui, Kai Dicarlantonio, Kathy Lin, Danjie Tang, Anton Ko- rikov, and Scott Sanner. 2024. Retrieval-augmented conversational recommen- dation with prompt-based semi-structured natural language state tracking. In Proceedings of the 47th International ACM SIGIR Conference on Research and De- velopment in Information Retrieval . 2786–2790

work page 2024

[11] [12]

Jieyong Kim, Hyunseo Kim, Hyunjin Cho, SeongKu Kang, Buru Chang, Jinyoung Yeo, and Dongha Lee. 2025. Review-driven Personalized Preference Reasoning with Large Language Models for Recommendation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy) (SIGIR ’25). Association for Co...

work page doi:10.1145/3726302.3730055 2025

[12] [13]

Yuxuan Lei, Jianxun Lian, Jing Yao, Xu Huang, Defu Lian, and Xing Xie. 2023. RecExplainer: Aligning Large Language Models for Recommendation Model In- terpretability. ArXiv abs/2311.10947 (2023). https://api.semanticscholar.org/ CorpusID:265294974

work page arXiv 2023

[13] [14]

Guanrong Li, Haolin Yang, Xinyu Liu, Zhen Wu, and Xinyu Dai. 2025. Counterfactual Language Reasoning for Explainable Recommendation Systems. arXiv:2503.08051 [cs.AI] https://arxiv.org/abs/2503.08051

work page arXiv 2025

[14] [15]

Lei Li, Li Chen, and Yongfeng Zhang. 2020. Towards Controllable Explanation Generation for Recommender Systems via Neural Template. In WWW Demo

work page 2020

[15] [16]

Lei Li, Yongfeng Zhang, and Li Chen. 2020. Generate Neural Template Explana- tions for Recommendation. In CIKM

work page 2020

[16] [17]

Lei Li, Yongfeng Zhang, and Li Chen. 2023. Personalized Prompt Learning for Explainable Recommendation. ACM Trans. Inf. Syst. 41, 4, Article 103 (March 2023), 26 pages. doi:10.1145/3580488

work page doi:10.1145/3580488 2023

[17] [18]

Xinze Li, Yushi Bai, Bowen Jin, Fengbin Zhu, Liangming Pan, and Yixin Cao

work page

[18] [19]

ISBN 9798400715921

Long Context vs. RAG: Strategies for Processing Long Documents in LLMs. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy) (SIGIR ’25) . Association for Computing Machinery, New York, NY, USA, 4110–4113. doi:10.1145/3726302. 3731690

work page doi:10.1145/3726302

[19] [20]

Yuhan Li, Xinni Zhang, Linhao Luo, Heng Chang, Yuxiang Ren, Irwin King, and Jia Li. 2025. G-refer: Graph retrieval-augmented large language model for ex- plainable recommendation. In Proceedings of the ACM on Web Conference 2025 . 240–251

work page 2025

[20] [21]

Zelong Li, Yan Liang, Ming Wang, Sungro Yoon, Jiaying Shi, Xin Shen, Xiang He, Chenwei Zhang, Wenyi Wu, Hanbo Wang, et al. 2024. Explainable and coherent complement recommendation based on large language models. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Manage- ment. 4678–4685

work page 2024

[21] [22]

Ziyu Li, Zhijie Tan, Suhuan Wu, Weiping Li, and Tong Mo. 2026. STLLM-Rec: enhancing explainable recommendation via self-training LLMs. World Wide Web 29, 1 (2026), 11

work page 2026

[22] [23]

Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81

work page 2004

[23] [24]

Zhuang Liu, Yunpu Ma, Matthias Schubert, Yuanxin Ouyang, Wenge Rong, and Zhang Xiong. 2023. Multimodal contrastive transformer for explainable rec- ommendation. IEEE Transactions on Computational Social Systems 11, 2 (2023), 2632–2643

work page 2023

[24] [25]

Zunlong Liu, Yang Xu, Gao Cong, Lei Zhu, Qinjun Qiu, and Huaxiang Zhang

work page

[25] [26]

ACM Trans

ARTS: A General and Efficient Multi-Task Self-Prompt Framework for Explainable Sequential Recommendation. ACM Trans. Inf. Syst. 43, 3, Article 73 (March 2025), 30 pages. doi:10.1145/3717833

work page doi:10.1145/3717833 2025

[26] [27]

Yucong Luo, Mingyue Cheng, Hao Zhang, Junyu Lu, and Enhong Chen. 2024. Unlocking the potential of large language models for explainable recommenda- tions. In International Conference on Database Systems for Advanced Applications. Springer, 286–303

work page 2024

[27] [28]

Chuangtao Ma, Yongrui Chen, Tianxing Wu, Arijit Khan, and Haofen Wang

work page

[28] [29]

Unifying Large Language Models and Knowledge Graphs for Question Answering: Recent Advances and Opportunities.. In EDBT. 1174–1177

work page

[29] [30]

Sicheng Pan, Dongsheng Li, Hansu Gu, Tun Lu, Xufang Luo, and Ning Gu. 2022. Accurate and explainable recommendation via review rationalization. InProceed- ings of the ACM web conference 2022 . 3092–3101

work page 2022

[30] [31]

Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics . 311–318

work page 2002

[31] [32]

Sani, Asal Meskin, Mohammad Amanlou, and Hamid R

S.M.F. Sani, Asal Meskin, Mohammad Amanlou, and Hamid R. Rabiee

work page

[32] [33]

ArXiv abs/2508.05225 (2025)

FIRE: Faithful Interpretable Recommendation Explanations. ArXiv abs/2508.05225 (2025). https://api.semanticscholar.org/CorpusID:280546117

work page arXiv 2025

[33] [34]

Teng Shi, Jun Xu, Xiao Zhang, Xiaoxue Zang, Kai Zheng, Yang Song, and Han Li. 2025. Retrieval Augmented Generation with Collaborative Filtering for Per- sonalized Text Generation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval (Padua, Italy) (SIGIR ’25) . Association for Computing Machine...

work page 2025

[34] [35]

doi:10.1145/3726302.3730075

work page doi:10.1145/3726302.3730075

[35] [36]

Yiqun Sun, Qiang Huang, Yixuan Tang, Anthony KH Tung, and Jun Yu. 2024. A general framework for producing interpretable semantic text embeddings. arXiv preprint arXiv:2410.03435 (2024)

work page arXiv 2024

[36] [37]

Yan-Martin Tamm, Rinchin Damdinov, and Alexey Vasilev. 2021. Quality metrics in recommender systems: Do we calculate metrics consistently?. In Proceedings of the 15th ACM conference on recommender systems . 708–713

work page 2021

[37] [38]

Kai Wang, Weizhou Shen, Yunyi Yang, Xiaojun Quan, and Rui Wang. 2020. Re- lational graph attention network for aspect-based sentiment analysis. arXiv preprint arXiv:2004.12362 (2020)

work page arXiv 2020

[38] [39]

Shijie Wang, Wenqi Fan, Yue Feng, Lin Shanru, Xinyu Ma, Shuaiqiang Wang, and Dawei Yin. 2025. Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , Wanxi- ang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher ...

work page doi:10.18653/v1/2025.acl-long.1317 2025

[39] [40]

Shijie Wang, Wenqi Fan, Yue Feng, Lin Shanru, Xinyu Ma, Shuaiqiang Wang, and Dawei Yin. 2025. Knowledge graph retrieval-augmented generation for llm- based recommendation. In Proceedings of the 63rd Annual Meeting of the Associ- ation for Computational Linguistics (Volume 1: Long Papers) . 27152–27168. Beyond Factual Correctness: Mitigating Preference-Inc...

work page 2025

[40] [41]

Cedric Waterschoot, Nava Tintarev, and Francesco Barile. 2025. Consistent Ex- plainers or Unreliable Narrators? Understanding LLM-generated Group Recom- mendations. In Proceedings of the Nineteenth ACM Conference on Recommender Systems. 539–544

work page 2025

[41] [42]

Ching-Wen Yang, Zhi-Quan Feng, Ying-Jia Lin, Che Wei Chen, Kun-da Wu, Hao Xu, Yao Jui-Feng, and Hung-Yu Kao. 2025. Maple: Enhancing review generation with multi-aspect prompt learning in explainable recommendation. In Proceed- ings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 31803–31821

work page 2025

[42] [43]

Mengyuan Yang, Mengying Zhu, Yan Wang, Linxun Chen, Yilei Zhao, Xiuyuan Wang, Bing Han, Xiaolin Zheng, and Jianwei Yin. 2024. Fine-tuning large lan- guage model based explainable recommendation with explainable quality re- ward. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelli- gence and Thirty-Sixth Conference on Innovative Appli...

work page 2024

[43] [44]

Mengyuan Yang, Mengying Zhu, Yan Wang, Linxun Chen, Yilei Zhao, Xiuyuan Wang, Bing Han, Xiaolin Zheng, and Jianwei Yin. 2024. Fine-tuning large lan- guage model based explainable recommendation with explainable quality re- ward. In Proceedings of the AAAI Conference on Artificial Intelligence , Vol. 38. 9250–9259

work page 2024

[44] [45]

Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, and Hyunwoo J Kim

work page

[45] [46]

Advances in neural information processing systems 32 (2019)

Graph transformer networks. Advances in neural information processing systems 32 (2019)

work page 2019

[46] [47]

Yuting Zhang, Ying Sun, Fuzhen Zhuang, Yongchun Zhu, Zhulin An, and Yongjun Xu. 2023. Triple dual learning for opinion-based explainable recom- mendation. ACM Transactions on Information Systems 42, 3 (2023), 1–27

work page 2023

[47] [48]

Wayne Xin Zhao, Gaole He, Kunlin Yang, Hong-Jian Dou, Jin Huang, Siqi Ouyang, and Ji-Rong Wen. 2019. KB4Rec: A Data Set for Linking Knowl- edge Bases with Recommender Systems. Data Intelligence 1, 2 (2019), 121–136. doi:10.1162/dint_a_00008

work page doi:10.1162/dint_a_00008 2019