pith. sign in

arxiv: 2604.20848 · v1 · submitted 2026-02-11 · 💻 cs.IR · cs.AI

MATRAG: Multi-Agent Transparent Retrieval-Augmented Generation for Explainable Recommendations

Pith reviewed 2026-05-16 06:05 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords multi-agent systemsretrieval-augmented generationexplainable recommendationsknowledge graphslarge language modelstransparency scoringrecommendation systems
0
0 comments X p. Extension

The pith

MATRAG combines four specialized agents with knowledge-graph retrieval to produce more accurate and explainable recommendations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MATRAG, a framework that uses multiple agents to handle different parts of the recommendation process while grounding outputs in retrieved knowledge. It aims to solve the problems of opaque and untrustworthy suggestions from large language models in recommendation tasks. A sympathetic reader would care because such a system could make AI recommendations more reliable and understandable for everyday users. Experiments show gains in accuracy metrics and high ratings for explanation quality on common datasets.

Core claim

MATRAG employs four specialized agents: a User Modeling Agent that constructs dynamic preference profiles, an Item Analysis Agent that extracts semantic features from knowledge graphs, a Reasoning Agent that synthesizes collaborative and content-based signals, and an Explanation Agent that generates natural language justifications grounded in retrieved knowledge, together with a transparency scoring mechanism. This architecture achieves state-of-the-art performance on three benchmark datasets, improving Hit Rate by 12.7% and NDCG by 15.3% over leading baselines, with 87.4% of explanations rated helpful and trustworthy by experts.

What carries the argument

Four-agent collaboration with knowledge graph-augmented retrieval and transparency scoring mechanism that quantifies explanation faithfulness.

If this is right

  • Recommendations achieve higher accuracy through synthesis of user preferences and item features from knowledge graphs.
  • Explanations are grounded in retrieved knowledge, fostering greater user trust.
  • The system provides measurable transparency scores for each recommendation.
  • It establishes new benchmarks for transparent agentic recommendation systems.
  • Insights support deployment in production environments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar multi-agent setups could improve transparency in other LLM applications such as personalized search.
  • The transparency scoring might serve as a general tool for detecting unfaithful outputs in generative models.
  • Evaluating the framework on dynamic, streaming data would test its adaptability to changing user preferences.
  • Connections to knowledge graph completion techniques could further enhance the Item Analysis Agent.

Load-bearing premise

The four-agent division and transparency scoring mechanism produce genuinely faithful explanations rather than post-hoc rationalizations that merely correlate with the chosen items.

What would settle it

A controlled test where knowledge graph evidence is altered to contradict the model's item choice, measuring whether transparency scores decrease and explanations acknowledge the mismatch.

Figures

Figures reproduced from arXiv: 2604.20848 by Sushant Mehta.

Figure 1
Figure 1. Figure 1: Overview of the MATRAG framework showing [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
read the original abstract

Large Language Model (LLM)-based recommendation systems have demonstrated remarkable capabilities in understanding user preferences and generating personalized suggestions. However, existing approaches face critical challenges in transparency, knowledge grounding, and the ability to provide coherent explanations that foster user trust. We introduce MATRAG (Multi-Agent Transparent Retrieval-Augmented Generation), a novel framework that combined multi-agent collaboration with knowledge graph-augmented retrieval to deliver explainable recommendations. MATRAG employs four specialized agents: a User Modeling Agent that constructs dynamic preference profiles, an Item Analysis Agent that extracts semantic features from knowledge graphs, a Reasoning Agent that synthesizes collaborative and content-based signals, and an Explanation Agent that generates natural language justifications grounded in retrieved knowledge. Our framework incorporates a transparency scoring mechanism that quantifies explanation faithfulness and relevance. Extensive experiments on three benchmark datasets (Amazon Reviews, MovieLens-1M, and Yelp) demonstrate that MATRAG achieves state-of-the-art performance, improving recommendation accuracy by 12.7\% (Hit Rate) and 15.3\% (NDCG) over leading baselines, while human evaluation confirms that 87.4\% of generated explanations are rated as helpful and trustworthy by domain experts. Our work establishes new benchmarks for transparent, agentic recommendation systems and provides actionable insights for deploying LLM-based recommenders in production environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MATRAG, a multi-agent framework for explainable recommendations that integrates LLM-based agents with knowledge-graph-augmented retrieval. Four agents (User Modeling, Item Analysis, Reasoning, and Explanation) collaborate to produce recommendations and natural-language justifications, augmented by a transparency scoring mechanism that quantifies faithfulness and relevance. Experiments on Amazon Reviews, MovieLens-1M, and Yelp are claimed to yield state-of-the-art results with 12.7% Hit Rate and 15.3% NDCG gains over baselines, plus 87.4% of explanations rated helpful and trustworthy by domain experts.

Significance. If the reported gains and explanation quality are rigorously validated, the work would advance transparent LLM-based recommenders by demonstrating a practical multi-agent architecture that grounds outputs in retrieved knowledge. The transparency scoring mechanism directly targets user-trust issues that remain open in current RAG recommenders. However, the absence of experimental protocols, baseline specifications, statistical tests, and independent faithfulness validation substantially weakens the current contribution, as the central performance and explainability claims cannot be assessed from the manuscript.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Experiments): The stated 12.7% Hit Rate and 15.3% NDCG improvements are presented without any description of the experimental protocol, baseline definitions, dataset splits, statistical significance tests, or error bars, rendering the SOTA claim unverifiable and load-bearing for the paper's primary contribution.
  2. [§3.2] §3.2 (Transparency Scoring): The transparency scoring mechanism is introduced to quantify faithfulness and relevance, yet no formulation, parameter-fitting procedure, or correlation analysis with human ratings is supplied; this leaves open the possibility that scores are fitted to the same data used for accuracy metrics, creating a circularity risk for the explainability claims.
  3. [§4] §4 (Human Evaluation): The 87.4% helpfulness/trustworthiness rate is reported without the evaluation protocol, number of experts, rating scale, inter-rater agreement statistics, or any automated grounding checks (e.g., KG entailment or citation overlap), so it is impossible to determine whether the explanations are faithful or merely post-hoc rationalizations.
minor comments (2)
  1. [Abstract] Abstract: 'a novel framework that combined' is grammatically incorrect and should read 'combines'.
  2. [Throughout] Throughout: Agent interaction diagrams and the precise definition of the transparency score would benefit from explicit equations or pseudocode to improve reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We agree that additional details are needed to make the experimental claims verifiable and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The stated 12.7% Hit Rate and 15.3% NDCG improvements are presented without any description of the experimental protocol, baseline definitions, dataset splits, statistical significance tests, or error bars, rendering the SOTA claim unverifiable and load-bearing for the paper's primary contribution.

    Authors: We acknowledge the need for full experimental transparency. In the revised version we will expand §4 with: (i) explicit train/validation/test splits (80/10/10) and preprocessing steps for each dataset, (ii) precise baseline implementations and hyper-parameter settings with citations, (iii) results reported with standard deviations over 5 random seeds, and (iv) statistical significance via paired t-tests (p < 0.01) against all baselines. These additions will allow independent verification of the reported gains. revision: yes

  2. Referee: [§3.2] §3.2 (Transparency Scoring): The transparency scoring mechanism is introduced to quantify faithfulness and relevance, yet no formulation, parameter-fitting procedure, or correlation analysis with human ratings is supplied; this leaves open the possibility that scores are fitted to the same data used for accuracy metrics, creating a circularity risk for the explainability claims.

    Authors: The transparency score is a linear combination of KG entailment (cosine similarity of sentence embeddings to retrieved triples) and citation overlap (Jaccard index of cited entities). Weights were obtained via grid search on a validation set held out from both training and test data. We will insert the exact formula, the validation-based fitting procedure, and a post-hoc Pearson correlation (r = 0.71) between transparency scores and the human ratings to demonstrate that the metric is not circular with the accuracy evaluation. revision: yes

  3. Referee: [§4] §4 (Human Evaluation): The 87.4% helpfulness/trustworthiness rate is reported without the evaluation protocol, number of experts, rating scale, inter-rater agreement statistics, or any automated grounding checks (e.g., KG entailment or citation overlap), so it is impossible to determine whether the explanations are faithful or merely post-hoc rationalizations.

    Authors: We will augment the human-evaluation subsection with: five domain experts (PhD-level researchers in recommender systems), a 5-point Likert scale for helpfulness and trustworthiness, Fleiss’ kappa = 0.78 for inter-rater agreement, and automated grounding metrics (average KG entailment score 0.82 and citation overlap 0.67). These details will be reported together with the 87.4 % figure to substantiate faithfulness. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical claims rest on benchmark experiments without self-referential derivations

full rationale

The paper introduces a multi-agent architecture and transparency scoring mechanism, then reports accuracy gains and human ratings on three standard benchmark datasets. No equations, derivations, or first-principles results appear in the provided text. Performance numbers are presented as direct experimental outcomes rather than predictions derived from fitted parameters that would reduce to the inputs by construction. The transparency scoring is described as a component of the framework without any indication that its parameters were tuned on the same data used for the reported HR/NDCG metrics in a way that creates circular evaluation. Human evaluation is reported separately. This is a standard empirical systems paper whose central claims are falsifiable against external benchmarks and therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 5 invented entities

Abstract introduces four new agent components and a transparency scorer without citing prior independent evidence for their effectiveness or providing parameter-free derivations.

invented entities (5)
  • User Modeling Agent no independent evidence
    purpose: constructs dynamic preference profiles
    Newly defined component of MATRAG; no independent evidence supplied.
  • Item Analysis Agent no independent evidence
    purpose: extracts semantic features from knowledge graphs
    Newly defined component of MATRAG; no independent evidence supplied.
  • Reasoning Agent no independent evidence
    purpose: synthesizes collaborative and content-based signals
    Newly defined component of MATRAG; no independent evidence supplied.
  • Explanation Agent no independent evidence
    purpose: generates natural language justifications grounded in retrieved knowledge
    Newly defined component of MATRAG; no independent evidence supplied.
  • transparency scoring mechanism no independent evidence
    purpose: quantifies explanation faithfulness and relevance
    Newly introduced metric; no independent evidence or derivation supplied.

pith-pipeline@v0.9.0 · 5529 in / 1440 out tokens · 36209 ms · 2026-05-16T06:05:13.230158+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · 5 internal anchors

  1. [1]

    Keqin Bao, Jizhi Zhang, Yang Zhang, Wenjie Wang, Fuli Feng, and Xiangnan He

  2. [2]

    InProceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23)

    TALLRec: An Effective and Efficient Tuning Framework to Align Large Lan- guage Model with Recommendation. InProceedings of the 17th ACM Conference on Recommender Systems (RecSys ’23). 1007–1014

  3. [3]

    Chong Chen, Min Zhang, Yiqun Liu, and Shaoping Ma. 2019. Co-Attentive Multi- Task Learning for Explainable Recommendation. InProceedings of the 28th Inter- national Joint Conference on Artificial Intelligence (IJCAI ’19). 2137–2143

  4. [4]

    Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, and Jonathan Larson. 2024. From Local to Global: A Graph RAG Approach to Query-Focused Summarization.arXiv preprint arXiv:2404.16130

  5. [5]

    Yuhang Fang, Yufei Zhou, Qing Li, and Peng Zhang. 2024. Multi-Agent Conver- sational Recommender Systems with Coordinated Interaction. InProceedings of the ACM Web Conference 2024 (WWW ’24). 2145–2156

  6. [6]

    Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-Rec: Towards Interactive and Explainable LLMs-Augmented Recommender System.arXiv preprint arXiv:2303.14524

  7. [7]

    Yingqiang Ge, Shuchang Liu, Zuohui Fu, Juntao Tan, Zelong Li, Shuyuan Xu, Yunqi Li, Yikun Xian, and Yongfeng Zhang. 2024. A Survey on Trustworthy Recommender Systems.ACM Transactions on Recommender Systems3 (2024), 1–68

  8. [8]

    Sixun Guo, Shijie Zhang, Weiwei Sun, Pengjie Ren, Zhumin Chen, and Zhaochun Ren. 2023. Towards Explainable Conversational Recommender Systems. InPro- ceedings of the 46th International ACM SIGIR Conference on Research and Develop- ment in Information Retrieval (SIGIR ’23). 2786–2790

  9. [9]

    Maxwell Harper and Joseph A

    F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context.ACM Transactions on Interactive Intelligent Systems5, 4 (2015), 1–19

  10. [10]

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang

  11. [11]

    InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20)

    LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. InProceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’20). 639–648

  12. [12]

    Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al

  13. [13]

    InProceedings of the Twelfth International Conference on Learning Representations (ICLR ’24)

    MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework. InProceedings of the Twelfth International Conference on Learning Representations (ICLR ’24)

  14. [14]

    Yupeng Hou, Junjie Zhang, Zihan Lin, Hongyu Lu, Ruobing Xie, Julian McAuley, and Wayne Xin Zhao. 2024. Large Language Models are Zero-Shot Rankers for Recommender Systems. InProceedings of the 46th European Conference on Information Retrieval (ECIR ’24). 364–381

  15. [15]

    Xu Huang, Jianxun Lian, Yuxuan Lei, Jing Yao, Defu Lian, and Xing Xie. 2023. Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations.arXiv preprint arXiv:2308.16505

  16. [16]

    Wang-Cheng Kang and Julian McAuley. 2018. Self-Attentive Sequential Rec- ommendation. InProceedings of the 2018 IEEE International Conference on Data Mining (ICDM ’18). 197–206

  17. [17]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. 2020. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. InAdvances in Neural Information Processing Systems (NeurIPS ’20). 9459–9474

  18. [18]

    Lei Li, Yongfeng Zhang, Dugang Liu, and Li Chen. 2024. Large Language Models for Generative Recommendation: A Survey and Visionary Discussions. InPro- ceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING ’24). 10146–10159

  19. [19]

    Guohao Li, Hasan Abed Al Kader Hammoud, Hani Itani, Dmitrii Khizbullin, and Bernard Ghanem. 2024. CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. InAdvances in Neural Information Processing Systems (NeurIPS ’24)

  20. [20]

    Jianghao Lin, Xinyi Dai, Yunjia Xi, Weiwen Liu, Bo Chen, Hao Zhang, Yong Liu, Chuhan Wu, Xiangyang Li, Chenxu Zhu, Huifeng Guo, Yong Yu, Ruiming Tang, and Weinan Zhang. 2024. How Can Recommender Systems Benefit from Large Language Models: A Survey.ACM Transactions on Information Systems(2024)

  21. [21]

    Qidong Liu, Xiangyu Zhao, Yuhao Wang, Yejing Wang, Zijian Zhang, Yuqi Sun, Xiang Li, Maolin Wang, Pengyue Jia, Chong Chen, Wei Huang, and Feng Tian

  22. [22]

    Large Language Model Enhanced Recommender Systems: A Survey.arXiv preprint arXiv:2412.13432

  23. [23]

    Petr Lubos, Ladislav Peska, and Patrik Slavík. 2024. User Evaluation of LLM- Generated Explanations for Recommender Systems. InProceedings of the 29th International Conference on Intelligent User Interfaces (IUI ’24). 597–608

  24. [24]

    Jianmo Ni, Jiacheng Li, and Julian McAuley. 2019. Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP ’19). 188–197

  25. [25]

    OpenAI. 2023. GPT-4 Technical Report.arXiv preprint arXiv:2303.08774

  26. [26]

    Zhangchi Qiu, Zehui Wang, Jianan Wang, and Alan Wee-Chung Liew. 2025. Graph Retrieval-Augmented LLM for Conversational Recommendation Systems. arXiv preprint arXiv:2503.06430

  27. [27]

    Nils Reimers and Iryna Gurevych. 2019. Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing (EMNLP ’19). 3982–3992

  28. [28]

    Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme

  29. [29]

    InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI ’09)

    BPR: Bayesian Personalized Ranking from Implicit Feedback. InProceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI ’09). 452–461

  30. [30]

    Alan Said. 2024. On Explaining Recommendations with Large Language Models: A Review.Frontiers in Big Data7 (2024), 1505284

  31. [31]

    Yutong Shu, Peng Zhang, Yifan Li, and Chuang Zhang. 2024. Knowledge Graph- Enhanced LLM for Multi-hop Link Prediction.arXiv preprint arXiv:2402.12345

  32. [32]

    Itallo Silva, Leandro Marinho, Alan Said, and Martijn Willemsen. 2024. Leverag- ing ChatGPT for Automated Human-Centered Explanations in Recommender Systems. InProceedings of the 29th International Conference on Intelligent User Interfaces (IUI ’24). 597–608

  33. [33]

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yas- mine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhos- ale, et al. 2023. Llama 2: Open Foundation and Fine-Tuned Chat Models.arXiv preprint arXiv:2307.09288

  34. [34]

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. 2024. The Llama 3 Herd of Models.arXiv preprint arXiv:2407.21783

  35. [35]

    Xiang Wang, Xiangnan He, Yixin Cao, Meng Liu, and Tat-Seng Chua. 2019. KGAT: Knowledge Graph Attention Network for Recommendation. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD ’19). 950–958

  36. [36]

    Xiang Wang, Tinglin Huang, Dingxian Wang, Yancheng Yuan, Zhenguang Liu, Xiangnan He, and Tat-Seng Chua. 2021. Learning Intents behind Interactions with Knowledge Graph for Recommendation. InProceedings of the Web Conference 2021 (WWW ’21). 878–887

  37. [37]

    Yancheng Wang, Ziyan Jiang, Zheng Chen, Fan Yang, Yingxue Zhou, Eunah Cho, Xing Fan, Xiaojiang Huang, Yanbin Lu, and Yingzhen Yang. 2023. RecMind: Large Language Model Powered Agent for Recommendation.arXiv preprint arXiv:2308.14296

  38. [38]

    Zhefan Wang, Yuanqing Yu, Wendi Yu, Weizhi Ma, and Min Zhang. 2024. MACRec: A Multi-Agent Collaboration Framework for Recommendation.arXiv preprint arXiv:2402.15235

  39. [39]

    Shijie Wang, Hangyu Guo, Zhibo Cai, Yongwei Zhao, Yubin Bao, and Ge Yu

  40. [40]

    InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL ’25)

    Knowledge Graph Retrieval-Augmented Generation for LLM-based Rec- ommendation. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (ACL ’25)

  41. [41]

    Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al. 2023. AutoGen: En- abling Next-Gen LLM Applications via Multi-Agent Conversation.arXiv preprint WWW ’26, April 13–17, 2026, Dubai, UAE Sushant Mehta arXiv:2308.08155

  42. [42]

    Likang Wu, Zhi Zheng, Zhaopeng Qiu, Hao Wang, Hongchao Gu, Tingjia Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, Hui Xiong, and Enhong Chen

  43. [43]

    A Survey on Large Language Models for Recommendation.arXiv preprint arXiv:2305.19860

  44. [44]

    Xie Liu, Chen Zhang, Xiangnan He, and Fuli Feng. 2024. Enabling Explainable Recommendation in E-commerce with LLM-powered Product Knowledge Graph. InIJCAI Workshop on Knowledge Graphs and LLMs

  45. [45]

    Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. 2023. ReAct: Synergizing Reasoning and Acting in Language Models. InProceedings of the Eleventh International Conference on Learning Repre- sentations (ICLR ’23)

  46. [46]

    Yelp. 2023. Yelp Open Dataset. https://www.yelp.com/dataset

  47. [47]

    Bruce Croft

    Yongfeng Zhang, Xu Chen, Qingyao Ai, Liu Yang, and W. Bruce Croft. 2020. To- wards Conversational Recommendation over Multi-Type Dialogs. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL ’20). 1036–1049

  48. [48]

    An Zhang, Yuxin Chen, Leheng Sheng, Xiang Wang, and Tat-Seng Chua. 2024. On Generative Agents in Recommendation. InProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’24). 1807–1817

  49. [49]

    Junjie Zhang, Yupeng Hou, Ruobing Xie, Wenqi Sun, Julian McAuley, Wayne Xin Zhao, Leyu Lin, and Ji-Rong Wen. 2024. AgentCF: Collaborative Learning with Autonomous Language Agents for Recommender Systems. InProceedings of the ACM Web Conference 2024 (WWW ’24). 3876–3887

  50. [50]

    Xi Zhu, Yu Wang, Hang Gao, Wujiang Xu, Chen Wang, Zhiwei Liu, Kun Wang, Mingyu Jin, Linsey Pang, Qingsong Wen, Philip Yu, and Yongfeng Zhang. 2024. Recommender Systems Meet Large Language Model Agents: A Survey.arXiv preprint arXiv:2411.00114

  51. [51]

    Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu. 2025. Knowledge Graph-Guided Retrieval Augmented Generation.arXiv preprint arXiv:2502.06864. A Prompt Templates We provide the key prompt templates used by MATRAG agents. A.1 User Modeling Agent Prompt You are a User Modeling Agent. Analyze the user's interaction history and extract structured p...

  52. [52]

    Explicit preferences (stated likes/dislikes)

  53. [53]

    Implicit preferences (inferred from behavior)

  54. [54]

    Contextual factors (time, device, session)

  55. [55]

    A.2 Explanation Agent Prompt You are an Explanation Agent

    Preference evolution (temporal patterns) Output as structured JSON. A.2 Explanation Agent Prompt You are an Explanation Agent. Generate a transparent, grounded explanation for the recommendation. Recommended Item: {item} User Profile: {user_profile} Reasoning Chain: {reasoning_chain} Retrieved Knowledge: {kg_subgraph} Generate an explanation that:

  56. [56]

    Cites specific evidence from knowledge

  57. [57]

    Connects to user preferences explicitly

  58. [58]

    Is honest about recommendation rationale

  59. [59]

    Table 7: HR@10 by user activity level on Amazon Electronics

    Uses natural, accessible language B Additional Experimental Results B.1 Performance by User Activity Level Table 7 shows MATRAG’s performance across users with different activity levels. Table 7: HR@10 by user activity level on Amazon Electronics. Method Low Medium High LLMRank 0.298 0.451 0.562 MACRec 0.341 0.478 0.589 MATRAG 0.412 0.534 0.628 MATRAG sho...