Recognition: no theorem link
Exploring Plan Space through Conversation: An Agentic Framework for LLM-Mediated Explanations in Planning
Pith reviewed 2026-05-15 17:50 UTC · model grok-4.3
The pith
A multi-agent LLM architecture generates interactive explanations for AI plans that adapt to user questions and context.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present a multi-agent Large Language Model architecture that is agnostic to the explanation framework and enables user- and context-dependent interactive explanations. They instantiate the framework for goal-conflict explanations and evaluate it through a user study that compares the LLM-mediated dialogue against a baseline template-based interface.
What carries the argument
multi-agent LLM architecture that mediates between the user, the planner, and explanation logic to produce context-aware responses
If this is right
- The same agent structure can be reused for other explanation types such as plan repair or preference elicitation without rebuilding the core system.
- Users receive explanations that evolve with their questions rather than fixed templates, supporting iterative refinement of plans.
- The framework separates the explanation layer from the planner itself, allowing existing planners to gain conversational interfaces.
- Human preferences and expertise can be incorporated more directly into the planning loop through dialogue.
Where Pith is reading between the lines
- The approach could be extended to domains like robotics or scheduling where plan revisions must respond to changing human constraints.
- Adding lightweight verification steps between agents might address reliability concerns while preserving the conversational flow.
- Longer-term use might reveal patterns in how users discover new preferences through interaction that static interfaces miss.
Load-bearing premise
LLM agents can reliably produce accurate, non-hallucinated explanations that correctly reflect the underlying planner's logic and user intent without additional verification steps.
What would settle it
A controlled test set of plans with known goal conflicts where LLM explanations are checked against ground-truth planner traces and user intent logs, and the explanations deviate in more than a small fraction of cases.
Figures
read the original abstract
When automating plan generation for a real-world sequential decision problem, the goal is often not to replace the human planner, but to facilitate an iterative reasoning and elicitation process, where the human's role is to guide the AI planner according to their preferences and expertise. In this context, explanations that respond to users' questions are crucial to improve their understanding of potential solutions and increase their trust in the system. To enable natural interaction with such a system, we present a multi-agent Large Language Model (LLM) architecture that is agnostic to the explanation framework and enables user- and context-dependent interactive explanations. We also describe an instantiation of this framework for goal-conflict explanations, which we use to conduct a user study comparing the LLM-powered interaction with a baseline template-based explanation interface.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a multi-agent LLM architecture for generating interactive, user- and context-dependent explanations in automated planning. It claims the architecture is agnostic to any specific explanation framework, instantiates it for goal-conflict explanations, and evaluates it via a user study against a template-based baseline interface.
Significance. If the empirical support holds, the work could advance human-AI collaboration in planning by enabling natural-language, adaptive explanations that incorporate user preferences, potentially increasing trust and iterative refinement in real-world sequential decision systems. The multi-agent design offers flexibility that rigid templates lack.
major comments (3)
- [User Study] User Study section: The manuscript references a user study comparing LLM-powered interaction to a template-based baseline but provides no quantitative results, methodology details (e.g., participant count, tasks, measures), statistical analysis, or error rates. This leaves the central claim of effective interactive explanations without load-bearing evidence.
- [Framework Architecture] Framework Architecture section: The assertion that the multi-agent architecture is agnostic to the explanation framework is stated but demonstrated only through a single instantiation for goal-conflict explanations. No additional cases or general argument is given to establish agnosticism.
- [Explanation Generation] Explanation Generation: No verification step, planner-trace alignment check, or hallucination filter is described to ensure LLM outputs faithfully encode the underlying planner's logic rather than plausible approximations. This directly risks undermining the reliability of user- and context-dependent explanations.
minor comments (2)
- [Abstract] Abstract: Key quantitative or qualitative findings from the user study should be summarized to give readers a concrete sense of outcomes.
- [Framework Architecture] Notation: Clarify how agent roles and message passing are defined to improve reproducibility of the multi-agent setup.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of the user study, clarify the framework's generality, and improve safeguards for explanation fidelity.
read point-by-point responses
-
Referee: [User Study] User Study section: The manuscript references a user study comparing LLM-powered interaction to a template-based baseline but provides no quantitative results, methodology details (e.g., participant count, tasks, measures), statistical analysis, or error rates. This leaves the central claim of effective interactive explanations without load-bearing evidence.
Authors: We agree that the User Study section requires substantially more detail to support the central claims. We will revise this section to include the full methodology (participant recruitment, count, demographics, tasks, experimental design, and measures), quantitative results with statistical analysis, and any observed error rates or qualitative findings. This will provide the necessary load-bearing evidence. revision: yes
-
Referee: [Framework Architecture] Framework Architecture section: The assertion that the multi-agent architecture is agnostic to the explanation framework is stated but demonstrated only through a single instantiation for goal-conflict explanations. No additional cases or general argument is given to establish agnosticism.
Authors: The architecture separates concerns across agents (planner interface, explanation generator, user modeler, and dialogue manager) so that only the explanation generator needs to be swapped for a different framework. While the paper focuses on goal-conflict explanations, we will add an explicit general argument in the Framework Architecture section describing how the modular interfaces support other explanation types (e.g., plan repair or preference elicitation) without altering the overall structure. revision: partial
-
Referee: [Explanation Generation] Explanation Generation: No verification step, planner-trace alignment check, or hallucination filter is described to ensure LLM outputs faithfully encode the underlying planner's logic rather than plausible approximations. This directly risks undermining the reliability of user- and context-dependent explanations.
Authors: We acknowledge this limitation. The current implementation grounds the explanation agent via prompts that include the raw planner trace, but no explicit verification or hallucination filter is described. We will revise the Explanation Generation section to document the prompt-based grounding techniques used and will add a lightweight alignment check (e.g., keyword or entity matching against the trace) to mitigate the risk of plausible but unfaithful outputs. revision: yes
Circularity Check
No circularity: independent architectural proposal
full rationale
The paper presents a descriptive multi-agent LLM framework for interactive planning explanations, instantiated via a user study on goal-conflict cases. No equations, fitted parameters, predictions, or derivations appear in the provided text. The agnosticism claim and interactivity features are introduced as design choices rather than results derived from prior steps or self-citations. The architecture does not reduce to its own inputs by construction, and the user study serves as external evaluation rather than a self-referential loop. This is a standard non-circular framework paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption LLMs can generate accurate and context-appropriate explanations for planning problems without systematic errors or hallucinations
Reference graph
Works this paper leans on
-
[1]
Oxford University Press (1980)
Achinstein, P.: The Nature of Explanation. Oxford University Press (1980)
work page 1980
-
[2]
In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N
Alshomary, M., Lange, F., Booshehri, M., Sengupta, M., Cimiano, P., Wachsmuth, H.: Modeling the quality of dialogical explanations. In: Calzolari, N., Kan, M.Y., Hoste, V., Lenci, A., Sakti, S., Xue, N. (eds.) Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024). pp. 1152...
work page 2024
-
[3]
Bromberger, S.: An approach to explanation. In: Butler, R. (ed.) Analytical Philsophy. Oxford University Press (1962)
work page 1962
- [4]
-
[5]
Chakraborti,T.,Sreedharan,S.,Zhang,Y.,Kambhampati,S.:Planexplanationsasmodelreconciliation: Moving beyond explanation as soliloquy. In: IJCAI (2017)
work page 2017
-
[6]
Chen, H., Constante-Flores, G.E., Mantri, K.S.I., Kompalli, S.M., Ahluwalia, A.S., Li, C.: Optichat: Bridging optimization models and practitioners with large language models (2025),https://arxiv. org/abs/2501.08406
-
[7]
Chen, W., Su, Y., Zuo, J., Yang, C., Yuan, C., Chan, C.M., Yu, H., Lu, Y., Hung, Y.H., Qian, C., Qin, Y., Cong, X., Xie, R., Liu, Z., Sun, M., Zhou, J.: AgentVerse: Facilitating Multi-Agent Col- laboration and Exploring Emergent Behaviors. International Conference on Representation Learn- ing2024, 20094–20136 (May 2024),https://proceedings.iclr.cc/paper_f...
work page 2024
-
[8]
https://doi.org/10.48550/arXiv.2511.09378
Corrêa, A.B., Pereira, A.G., Seipp, J.: The 2025 Planning Performance of Frontier Large Language Models (Nov 2025). https://doi.org/10.48550/arXiv.2511.09378
- [9]
-
[10]
Dazeley, R., Vamplew, P., Foale, C., Young, C., Aryal, S., Cruz, F.: Levels of explainable artificial intelligence for human-aligned conversational explanations. AIJ (2021)
work page 2021
-
[11]
De Giacomo, G., De Masellis, R., Montali, M.: Reasoning on LTL on finite traces: Insensitivity to infiniteness. In: AAAI (2014)
work page 2014
-
[12]
Domshlak, C., Mirkis, V.: Deterministic oversubscription planning as heuristic search: Abstractions and reformulations. JAIR (2015)
work page 2015
-
[13]
In: ICAPS (2022) Exploring Plan Space through Conversation 15
Eifler, R., Brandao, M., Coles, A., Frank, J., Hoffmann, J.: Evaluating plan-property dependencies: A web-based platform and user study. In: ICAPS (2022) Exploring Plan Space through Conversation 15
work page 2022
-
[14]
Eifler, R., Cashmore, M., Hoffmann, J., Magazzeni, D., Steinmetz, M.: A new approach to plan-space explanation: Analyzing plan-property dependencies in oversubscription planning. In: AAAI (2020)
work page 2020
-
[15]
Eifler, R., Steinmetz, M., Torralba, A., Hoffmann, J.: Plan-space explanation via plan-property depen- dencies: Faster algorithms & more powerful properties. In: IJCAI (2020)
work page 2020
- [16]
-
[17]
In: Béchet, F., Lefèvre, F., Asher, N., Kim, S., Merlin, T
Fichtel, L., Spliethöver, M., Hüllermeier, E., Jimenez, P., Klowait, N., Kopp, S., Ngonga Ngomo, A.C., Robrecht, A., Scharlau, I., Terfloth, L., Vollmer, A.L., Wachsmuth, H.: Investigating co-constructive behavior of large language models in explanation dialogues. In: Béchet, F., Lefèvre, F., Asher, N., Kim, S., Merlin, T. (eds.) Proceedings of the 26th A...
work page 2025
-
[18]
Fuggitti, F., Chakraborti, T.: NL2LTL – a python package for converting natural language (NL) instruc- tions to linear temporal logic (LTL) formulas. In: ICAPS (2023)
work page 2023
-
[19]
Gamba, E., Bogaerts, B., Guns, T.: Efficiently explaining csps with unsatisfiable subset optimization. JAIR (2023)
work page 2023
-
[20]
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
Guo, T., Chen, X., Wang, Y., Chang, R., Pei, S., Chawla, N.V., Wiest, O., Zhang, X.: Large language model based multi-agents: A survey of progress and challenges. arXiv:2402.01680 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[21]
Helmert, M.: The Fast Downward planning system. JAIR26, 191–246 (2006)
work page 2006
- [22]
-
[23]
Kambhampati, S., Valmeekam, K., Guan, L., Verma, M., Stechly, K., Bhambri, S., Saldyt, L., Murthy, A.: Position: Llms can’t plan, but can help planning in llm-modulo frameworks. In: ICML (2024)
work page 2024
-
[24]
Krarup, B., Coles, A.J., Long, D., Smith, D.E.: Explaining plan quality differences. In: ICAPS (2024)
work page 2024
-
[25]
Krarup, B., Krivic, S., Magazzeni, D., Long, D., Cashmore, M., Smith, D.E.: Contrastive explanations of plans through model restrictions. JAIR (2021)
work page 2021
-
[26]
In: NeurIPS Workshop on Human Centered AI (2022)
Lakkaraju, H., Slack, D., Chen, Y., Tan, C., Singh, S.: Rethinking explainability as a dialogue: A practitioner’s perspective. In: NeurIPS Workshop on Human Centered AI (2022)
work page 2022
- [27]
-
[28]
Likert, R.: A technique for the measurement of attitudes. Archives of Psychology (1932)
work page 1932
-
[29]
In: Workshop on Language and Robotics at CoRL (2022)
Liu, J.X., Yang, Z., Schornstein, B., Liang, S., Idrees, I., Tellex, S., Shah, A.: Lang2ltl: Translating natural language commands to temporal specification with large language models. In: Workshop on Language and Robotics at CoRL (2022)
work page 2022
- [30]
-
[31]
Menu- based interfaces: An empirical study
Nguyen, Q.N., Sidorova, A., Torres, R.: User interactions with chatbot interfaces vs. Menu- based interfaces: An empirical study. Computers in Human Behavior128, 107093 (Mar 2022). https://doi.org/10.1016/j.chb.2021.107093
-
[32]
Nguyen, V.B., Schlötterer, J., Seifert, C.: From black boxes to conversations: Incorporating xai in a conversational agent. In: XAI (2023)
work page 2023
-
[33]
Peng, B., Narayanan, S., Papadimitriou, C.: On limitations of the transformer architecture. COLM (2024)
work page 2024
-
[34]
In: Workshop on Progress Towards the Holy Grail at CP (2024)
Povéda, G., Strahl, A., Hall, M., Boumazouza, R., Quintana-Amate, S., Alvarez, N., Bleukx, I., Tsouros, D., Verhaeghe, H., Guns, T.: Trustworthy and explainable decision-making for workforce allocation. In: Workshop on Progress Towards the Holy Grail at CP (2024)
work page 2024
-
[35]
Emergent Coordination in Multi-Agent Language Models
Riedl, C.: Emergent Coordination in Multi-Agent Language Models (Oct 2025). https://doi.org/10.48550/arXiv.2510.05174
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.05174 2025
-
[36]
Cambridge University Press (1979)
Searle, J.R.: Expression and meaning: Studies in the theory of speech acts. Cambridge University Press (1979)
work page 1979
-
[37]
Shen, H., Huang, C., Wu, T., Huang, T.K.: Convxai: Delivering heterogeneous AI explanations via conversations to support human-ai scientific writing. CoRR (2023)
work page 2023
-
[38]
Nature Machine Intelligence (2023)
Slack, D., Krishna, S., Lakkaraju, H., Singh, S.: Explaining machine learning models with interactive natural language conversations using talktomodel. Nature Machine Intelligence (2023)
work page 2023
- [39]
-
[40]
Smith, D.E.: Choosing objectives in over-subscription planning. In: ICAPS (2004)
work page 2004
-
[41]
Sreedharan, S., Chakraborti, T., Kambhampati, S.: Foundations of explanations as model reconciliation. AIJ (2021)
work page 2021
-
[42]
Sreedharan, S., Srivastava, S., Smith, D.E., Kambhampati, S.: Why can’t you do that hal? explaining unsolvability of planning tasks. In: IJCAI (2019) 16 G. Fouilhé et al
work page 2019
-
[43]
State, L., Ruggieri, S., Turini, F.: Reason to explain: Interactive contrastive explanations (reasonx). In: XAI (2023)
work page 2023
-
[44]
Valmeekam, K., Marquez, M., Sreedharan, S., Kambhampati, S.: On the planning abilities of large language models-a critical investigation. NeurIPS (2023)
work page 2023
- [45]
-
[46]
Load package$Pi before package$P j into truck$T
Zhang, T., Yang, X.J., Li, B.: May i ask a follow-up question? understanding the benefits of conversations in neural network explainability. International Journal of Human–Computer Interaction (2024) Exploring Plan Space through Conversation 17 A Platform for Iterative Planning with Explanations The source code and pre-built docker images will be publishe...
work page 2024
-
[47]
**Context-Aware**: Base suggestions on the current state (enforced, satisfied, unsatisfied goals)
-
[48]
**Conversation-Aware**: Consider what has already been discussed to avoid repetition
-
[49]
**Actionable**: Suggest questions that help the user make progress toward finding an optimal combination of goals
-
[50]
**Natural Language**: Use natural, conversational phrasing that sounds like a real user
-
[51]
**Diverse**: Provide 1-3 questions covering different aspects or question types when pos- sible
-
[52]
**Goal-Specific**: Reference actual goals from the current context by name
-
[53]
**Progressive**: If the user just learned something, suggest logical follow-up questions ## Suggestion Strategy - You can take inspiration from questions the user previously asked ### For Unsolvable Tasks: - Prioritize understanding why it’s unsolvable - Suggest questions about how to make it solvable - Suggest questions about which goals are conflicting ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.