pith. sign in

arxiv: 2604.14223 · v1 · submitted 2026-04-14 · 💻 cs.IR · cs.AI

TRACE: A Conversational Framework for Sustainable Tourism Recommendation with Agentic Counterfactual Explanations

Pith reviewed 2026-05-10 14:36 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords sustainable tourismconversational recommender systemsmulti-agent LLM frameworkscounterfactual explanationspreference elicitationenvironmental impactuser studies
0
0 comments X

The pith

TRACE uses AI agents and counterfactual explanations to nudge users toward sustainable tourism recommendations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Traditional travel recommenders often push popular destinations that harm the environment. This paper presents TRACE, a system that adds sustainability by having multiple AI agents work together in a conversation. One agent draws out the user's hidden preferences for green travel, another builds a detailed user profile, and others suggest options that consider both what the user wants and the environmental cost. The system asks smart follow-up questions and shows alternative scenarios to help users see greener paths. User tests show this leads to more sustainable decisions without reducing how well the suggestions match user needs or how quickly the system replies.

Core claim

TRACE is a multi-agent LLM-based framework for tourism recommendations that balances user relevance with environmental impact through an orchestrator-worker architecture, where agents elicit latent sustainability preferences, construct user personas, and generate agentic counterfactual explanations to promote reflection on lower-impact alternatives.

What carries the argument

The modular orchestrator-worker architecture consisting of specialized agents for sustainability preference elicitation, persona construction, recommendation balancing, and counterfactual explanation generation.

If this is right

  • Users receive interactive nudges that surface greener travel options without direct pressure.
  • Recommendation quality is maintained as shown by preserved relevance in user studies.
  • Interactive responsiveness remains intact during the conversation.
  • Semantic analyses confirm that the explanations align with sustainable decision-making goals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar agentic approaches could be adapted to encourage sustainable choices in other recommendation areas like dining or product purchases.
  • Long-term studies might show whether these nudges lead to lasting changes in travel behavior.
  • The framework's design supports adding more specialized agents for additional factors such as cultural impact or local economy benefits.

Load-bearing premise

LLM-based agents can reliably draw out accurate sustainability preferences and generate unbiased, non-hallucinated counterfactual explanations.

What would settle it

An experiment in which users do not choose more sustainable options when given the counterfactual explanations compared to a version without them, or where the generated explanations are rated as inaccurate by participants.

Figures

Figures reproduced from arXiv: 2604.14223 by Adithi Satish, Ashmi Banerjee, Wolfgang W\"orndl, Yashar Deldjoo.

Figure 1
Figure 1. Figure 1: System Architecture of the CRS Chatbot. This dia [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example TRACE session. The session starts with [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Combined feedback distribution across rating cate [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Traditional conversational travel recommender systems primarily optimize for user relevance and convenience, often reinforcing popular, overcrowded destinations and carbon-intensive travel choices. To address this, we present TRACE (Tourism Recommendation with Agentic Counterfactual Explanations), a multi-agent, LLM-based framework that promotes sustainable tourism through interactive nudging. TRACE uses a modular orchestrator-worker architecture where specialized agents elicit latent sustainability preferences, construct structured user personas, and generate recommendations that balance relevance with environmental impact. A key innovation lies in its use of agentic counterfactual explanations and LLM-driven clarifying questions, which together surface greener alternatives and refine understanding of intent, fostering user reflection without coercion. User studies and semantic alignment analyses demonstrate that TRACE effectively supports sustainable decision-making while preserving recommendation quality and interactive responsiveness. TRACE is implemented on Google's Agent Development Kit, with full code, Docker setup, prompts, and a publicly available demo video to ensure reproducibility. A project summary, including all resources, prompts, and demo access, is available at https://ashmibanerjee.github.io/trace-chatbot.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents TRACE, a multi-agent LLM-based conversational framework for tourism recommendations that promotes sustainable choices via an orchestrator-worker architecture. Specialized agents elicit latent sustainability preferences, construct user personas, and generate recommendations balanced with environmental impact, using agentic counterfactual explanations and clarifying questions to encourage reflection without coercion. The central claim is that user studies and semantic alignment analyses demonstrate effective support for sustainable decision-making while preserving recommendation quality and interactive responsiveness. The system is implemented on Google's Agent Development Kit with full code, Docker setup, prompts, and a public demo for reproducibility.

Significance. If the user-study claims hold, this represents a meaningful engineering contribution to sustainable recommender systems in information retrieval by addressing carbon-intensive travel patterns through interactive nudging. The explicit provision of code, prompts, Docker configuration, and demo video is a clear strength that supports reproducibility and extension by the community.

major comments (2)
  1. [Abstract and Evaluation] Abstract and the user-studies section: the central claim that 'user studies and semantic alignment analyses demonstrate that TRACE effectively supports sustainable decision-making while preserving recommendation quality' is unsupported because no details are given on study design, sample size, metrics (e.g., reflection or alignment scores), statistical tests, or quantitative results. This information is required to evaluate whether the observed effects are genuine or artifacts of the LLM agents.
  2. [Framework (§3)] Framework description: the multi-agent architecture for preference elicitation, persona construction, and counterfactual generation contains no validation steps (e.g., expert annotation, consistency checks across prompts, or bias audits) against hallucinations or systematic biases. This assumption is load-bearing for the claim that the system surfaces genuine sustainability preferences rather than model defaults.
minor comments (1)
  1. [Discussion] The paper would benefit from a dedicated limitations subsection that explicitly discusses risks of LLM-induced bias in the agentic components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and describe the revisions we will make to improve the clarity and rigor of the manuscript.

read point-by-point responses
  1. Referee: [Abstract and Evaluation] Abstract and the user-studies section: the central claim that 'user studies and semantic alignment analyses demonstrate that TRACE effectively supports sustainable decision-making while preserving recommendation quality' is unsupported because no details are given on study design, sample size, metrics (e.g., reflection or alignment scores), statistical tests, or quantitative results. This information is required to evaluate whether the observed effects are genuine or artifacts of the LLM agents.

    Authors: We agree that the current version of the manuscript does not provide sufficient methodological detail on the user studies and semantic alignment analyses to fully substantiate the claims in the abstract. In the revised manuscript we will expand the evaluation section with a complete description of the study design, participant sample size and recruitment, the specific metrics (including reflection scores, semantic alignment scores, recommendation quality, and responsiveness), the statistical tests applied, and the quantitative results with supporting tables or figures. We will also update the abstract to reference these additions where appropriate. revision: yes

  2. Referee: [Framework (§3)] Framework description: the multi-agent architecture for preference elicitation, persona construction, and counterfactual generation contains no validation steps (e.g., expert annotation, consistency checks across prompts, or bias audits) against hallucinations or systematic biases. This assumption is load-bearing for the claim that the system surfaces genuine sustainability preferences rather than model defaults.

    Authors: We acknowledge that the framework description would be strengthened by explicit validation procedures for the agentic components. While the public code, prompts, and demo already enable community inspection, we will revise Section 3 to add a dedicated validation subsection. This will include prompt consistency checks across multiple runs, expert annotation of sample outputs for hallucination and bias detection, and any systematic audits performed during development. These additions will directly address concerns about whether the system elicits genuine preferences. revision: yes

Circularity Check

0 steps flagged

No circularity; engineering framework with independent user-study validation

full rationale

The paper presents TRACE as a modular multi-agent LLM architecture for eliciting sustainability preferences and generating counterfactual explanations in tourism recommendations. Its central claims rest on the described system design plus reported user studies and semantic alignment analyses, none of which involve mathematical derivations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the result to its own inputs. No equations, uniqueness theorems, or ansatzes are invoked; the contribution is self-contained as an implemented framework with reproducibility artifacts.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest primarily on domain assumptions about LLM agent reliability in preference elicitation and explanation generation, with no free parameters or invented entities specified in the abstract.

axioms (1)
  • domain assumption LLM agents can accurately elicit and model latent user sustainability preferences through interactive dialogue without significant bias or error.
    Invoked implicitly in the description of specialized agents for preference elicitation and persona construction.

pith-pipeline@v0.9.0 · 5495 in / 1208 out tokens · 30957 ms · 2026-05-10T14:36:15.220953+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    Ionut Arghire. 2026. Chainlit Vulnerabilities May Leak Sensitive Infor- mation. https://www.securityweek.com/chainlit-vulnerabilities-may-leak- sensitive-information/. Accessed 2026-02

  2. [2]

    Ashmi Banerjee. 2023. Fairness and sustainability in multistakeholder tourism recommender systems. InProceedings of the 31st ACM Conference on User Modeling, Adaptation and Personalization. 274–279

  3. [3]

    Ashmi Banerjee, Melih Mert Aksoy, and Wolfgang Wörndl. 2025. SmartSustain Recommender System: Navigating Sustainability Trade-offs in Personalized City Trip Planning.arXiv preprint arXiv:2510.17355(2025)

  4. [4]

    Ashmi Banerjee, Tunar Mahmudov, Emil Adler, Fitri Nur Aisyah, and Wolfgang Wörndl. 2025. Modeling sustainable city trips: integrating CO 2 e emissions, popularity, and seasonality into tourism recommender systems.Information Technology & Tourism27, 1 (2025), 189–226

  5. [5]

    Ashmi Banerjee, Tunar Mahmudov, and Wolfgang Wörndl. 2024. Green Desti- nation Recommender: A Web Application to Encourage Responsible City Trip Recommendations. InAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization. 486–490

  6. [6]

    Ashmi Banerjee, Adithi Satish, Fitri Nur Aisyah, Wolfgang Wörndl, and Yashar Deldjoo. 2025. Collab-REC: An LLM-based Agentic Framework for Balancing Recommendations in Tourism.arXiv preprint arXiv:2508.15030(2025)

  7. [7]

    Ashmi Banerjee, Adithi Satish, Fitri Nur Aisyah, Wolfgang Wörndl, and Yashar Deldjoo. 2025. SynthTRIPs: A Knowledge-Grounded Framework for Benchmark Data Generation for Personalized Tourism Recommenders. InProceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval. 3743–3752

  8. [8]

    Oren Barkan, Veronika Bogina, Liya Gurevitch, Yuval Asher, and Noam Koenig- stein. 2024. A counterfactual framework for learning and evaluating explanations for recommender systems. InProceedings of the ACM Web Conference 2024. 3723– 3733

  9. [9]

    Keping Bi, Qingyao Ai, and W Bruce Croft. 2021. Asking clarifying questions based on negative feedback in conversational search. InProceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval. 157–166

  10. [10]

    Chainlit. 2026. Chainlit: Get Started – Overview. https://docs.chainlit.io/get- started/overview

  11. [11]

    Philip Drammeh. 2025. Multi-Agent LLM Orchestration Achieves Determin- istic, High-Quality Decision Support for Incident Response.arXiv preprint arXiv:2511.15755(2025)

  12. [12]

    Jiabao Fang, Shen Gao, Pengjie Ren, Xiuying Chen, Suzan Verberne, and Zhaochun Ren. 2024. A multi-agent conversational recommender system.arXiv preprint arXiv:2402.01135

  13. [13]

    FastAPI. 2026. FastAPI Documentation. https://fastapi.tiangolo.com/

  14. [14]

    Firebase and Google Cloud. 2026. Cloud Firestore Documentation. https://firebase. google.com/docs/firestore

  15. [15]

    Google Cloud. 2026. Cloud Run Documentation. https://cloud.google.com/run

  16. [16]

    Google Cloud. 2026. Overview of Agent Development Kit. https://docs.cloud. google.com/agent-builder/agent-development-kit/overview

  17. [17]

    Google Cloud. 2026. Vertex AI Platform. https://cloud.google.com/vertex-ai

  18. [18]

    2024.Gemini 2.5: Technical Report

    Google DeepMind. 2024.Gemini 2.5: Technical Report. Technical Report. Google DeepMind. https://storage.googleapis.com/deepmind-media/gemini/gemini_ v2_5_report.pdf

  19. [19]

    Shengyu Gu. 2024. A survey of large language models in tourism (Tourism LLMs). Preprint on Qeios(2024)

  20. [20]

    Riccardo Guidotti. 2024. Counterfactual explanations and how to find them: literature review and benchmarking.Data Mining and Knowledge Discovery38, 5 (2024), 2770–2824

  21. [21]

    Haya Halimeh and Oliver Müller. 2025. Towards Greener Choices: Decision Information Nudging for Sustainability-Aware Recommender Explanations. In International Workshop on Recommender Systems for Sustainability and Social Good. Springer, 27–42

  22. [22]

    Chengkai Huang, Junda Wu, Yu Xia, Zixu Yu, Ruhan Wang, Tong Yu, Ruiyi Zhang, Ryan A Rossi, Branislav Kveton, Dongruo Zhou, et al . 2025. Towards agentic recommender systems in the era of multimodal large language models.arXiv preprint arXiv:2503.16734(2025)

  23. [23]

    Ankur Joshi, Saket Kale, Satish Chandel, and D Kumar Pal. 2015. Likert scale: Explored and explained.British journal of applied science & technology7, 4 (2015), 396–403

  24. [24]

    Sara Kemper, Justin Cui, Kai Dicarlantonio, Kathy Lin, Danjie Tang, Anton Ko- rikov, and Scott Sanner. 2024. Retrieval-augmented conversational recommen- dation with prompt-based semi-structured natural language state tracking. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval. 2786–2790

  25. [25]

    Reza Yousefi Maragheh and Yashar Deldjoo. 2025. The Future is Agentic: Defini- tions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems. arXiv preprint arXiv:2507.02097(2025)

  26. [26]

    Noemi Mauro, Livio Scarpinati, Fabio Ferrero, Angelo Geninatti Cossatin, and Claudio Mattutino. 2024. Point-of-Interest Recommender Systems: Nudging towards Sustainable Tourism. InAdjunct Proceedings of the 32nd ACM Conference on User Modeling, Adaptation and Personalization. 491–495

  27. [27]

    Qiyao Peng, Hongtao Liu, Hua Huang, Qing Yang, and Minglai Shao. 2025. A survey on llm-powered agents for recommender systems.arXiv preprint arXiv:2502.10050(2025)

  28. [28]

    Xuhui Ren, Hongzhi Yin, Tong Chen, Hao Wang, Zi Huang, and Kai Zheng

  29. [29]

    InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval

    Learning to ask appropriate questions in conversational recommendation. InProceedings of the 44th international ACM SIGIR conference on research and development in information retrieval. 808–817

  30. [30]

    Ivan Sekulić, Mohammad Aliannejadi, and Fabio Crestani. 2021. Towards Facet- Driven Generation of Clarifying Questions for Conversational Search(ICTIR ’21). Association for Computing Machinery, New York, NY, USA, 167–175. doi:10. 1145/3471158.3472257

  31. [31]

    Ivan Sekulić, Weronika Łajewska, Krisztian Balog, and Fabio Crestani. 2024. Estimating the usefulness of clarifying questions and answers for conversational search. InEuropean Conference on Information Retrieval. Springer, 384–392

  32. [32]

    Zijian Shao, Jiancan Wu, Weijian Chen, and Xiang Wang. 2025. Personal Travel Solver: A Preference-Driven LLM-Solver System for Travel Planning. InProceed- ings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 27622–27642

  33. [33]

    Juntao Tan, Shuyuan Xu, Yingqiang Ge, Yunqi Li, Xu Chen, and Yongfeng Zhang

  34. [34]

    InProceedings of the 30th ACM International Conference on Information & Knowledge Management

    Counterfactual explainable recommendation. InProceedings of the 30th ACM International Conference on Information & Knowledge Management. 1784–1793. SIGIR ’26, July 20–24, 2026, Melbourne, VIC, Australia Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl, and Yashar Deldjoo

  35. [35]

    Ke Wang, Shuai Yan, Haoran Yuan, Yanling Huang, Yuhang Wu, Fei Li, Shengying Yang, and Huan Deng. 2025. Toward Interpretable and Persistent Personalization: A Memory-Augmented Agent Framework for LLM-Based Travel Planning.IEEE Access13 (2025), 193125–193141

  36. [36]

    Xiangmeng Wang, Qian Li, Dianer Yu, Qing Li, and Guandong Xu. 2024. Coun- terfactual explanation for fairness in recommendation.ACM Transactions on Information Systems42, 4 (2024), 1–30

  37. [37]

    Zhefan Wang, Yuanqing Yu, Wendi Zheng, Weizhi Ma, and Min Zhang. 2024. Macrec: A multi-agent collaboration framework for recommendation. (2024), 2760–2764

  38. [38]

    Dianer Yu, Qian Li, Xiangmeng Wang, Qing Li, and Guandong Xu. 2023. Coun- terfactual explainable conversational recommendation.IEEE Transactions on Knowledge and Data Engineering36, 6 (2023), 2388–2400

  39. [39]

    Hamed Zamani, Susan Dumais, Nick Craswell, Paul Bennett, and Gord Lueck

  40. [40]

    InProceedings of the web conference 2020

    Generating clarifying questions for information retrieval. InProceedings of the web conference 2020. 418–428