GLARE: A Natural Language Interface for Querying Global Explanations
Pith reviewed 2026-06-26 17:53 UTC · model grok-4.3
The pith
An LLM mediator translates natural language questions into SQL queries over local explanation data to enable interactive access to global explanations of image classifiers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper claims that the system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. Evaluation on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors shows that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.
What carries the argument
The LLM mediator that translates natural language questions into structured SQL queries over local explanation data.
If this is right
- Flexible aggregation of explanation data becomes possible without users handling low-level representations.
- Each query produces statistics-augmented natural language responses along with intent-aligned visualizations.
- The system supports both local explanations and global ones through the same interface.
- Performance holds across novel queries, new datasets, and some degree of linguistic errors.
Where Pith is reading between the lines
- The same mediation pattern could let users query other structured AI outputs beyond explanations.
- Real-time user corrections on generated queries might reduce the impact of occasional translation mistakes.
- Conversational access might replace many specialized dashboards for exploring model behavior.
Load-bearing premise
The LLM can accurately and reliably translate arbitrary natural language questions into correct SQL queries over the explanation data without introducing errors or hallucinations that affect the output statistics and visualizations.
What would settle it
A test set of natural language questions with manually verified correct SQL translations and expected aggregation outputs, checked to determine whether the system's generated queries produce matching statistics and visualizations.
Figures
read the original abstract
While global explanations are crucial for understanding vision models across datasets, classes, and decision contexts, their complex and monolithic nature often hinders practical exploration. Because users typically seek targeted answers to specific questions rather than static artifacts, we present an LLM-based interactive interface that provides natural language access to global explanations for black-box image classifiers. The system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. We evaluate the system on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors. Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents GLARE, an LLM-based interactive natural language interface for querying global explanations of black-box image classifiers. The core LLM mediator translates user questions into SQL queries over local explanation data to enable flexible aggregation, statistics-augmented NL responses, local explanations, and intent-aligned visualizations. The system is evaluated on intent interpretation, query mapping accuracy, generalization to novel queries/datasets, and robustness to linguistic errors, with the abstract claiming that results demonstrate substantial improvements in accessibility and usability for human-centered XAI.
Significance. If the claimed evaluations are sound and quantitative, the work could meaningfully advance practical XAI by lowering the barrier to exploring complex global explanations through natural language, a useful step beyond static artifacts. The mediator pattern for structured querying over explanation data is a pragmatic contribution to interactive explanation interfaces.
major comments (2)
- [Abstract] Abstract: The statement that 'Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability' provides no quantitative metrics (e.g., accuracy percentages, error rates, baseline comparisons), methodology details, or result summaries, which is load-bearing for the central claim of improvement over existing approaches.
- [Abstract] Abstract (evaluation description): The evaluation is asserted to cover 'intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors,' yet no specific results, tables, or analysis of failure cases (e.g., SQL translation hallucinations affecting statistics) are referenced, leaving the reliability of the LLM mediator unverified.
minor comments (1)
- [Abstract] The abstract would be strengthened by including at least one concrete quantitative finding (e.g., 'query mapping accuracy of X% on Y queries') to ground the improvement claims.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and will revise the abstract accordingly to strengthen the presentation of our results.
read point-by-point responses
-
Referee: [Abstract] Abstract: The statement that 'Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability' provides no quantitative metrics (e.g., accuracy percentages, error rates, baseline comparisons), methodology details, or result summaries, which is load-bearing for the central claim of improvement over existing approaches.
Authors: We agree that the abstract as currently written lacks the specific quantitative support needed to substantiate the central claim. In the revised manuscript, we will expand the abstract to include key evaluation metrics (e.g., query mapping accuracy of XX%, generalization success rates across novel queries and datasets, and robustness to linguistic variations) along with a brief note on the evaluation methodology. This will make the improvement claim evidence-based while respecting abstract length constraints. revision: yes
-
Referee: [Abstract] Abstract (evaluation description): The evaluation is asserted to cover 'intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors,' yet no specific results, tables, or analysis of failure cases (e.g., SQL translation hallucinations affecting statistics) are referenced, leaving the reliability of the LLM mediator unverified.
Authors: We acknowledge the abstract does not currently reference specific results or failure-case analysis. We will revise it to summarize the primary quantitative outcomes for each evaluation dimension and explicitly note that detailed tables, error analyses (including LLM mediator hallucinations), and failure cases are presented in the main body (Sections 4 and 5). Due to abstract space limits we cannot embed full tables, but the revision will direct readers to the supporting evidence and improve verifiability of the mediator's reliability. revision: yes
Circularity Check
No significant circularity
full rationale
The paper presents a system description for an LLM-mediated natural language interface to global explanations, with claims resting on empirical evaluations of intent interpretation, query mapping accuracy, generalization, and robustness. No mathematical derivations, fitted parameters, predictions, or self-referential claims are present that reduce to inputs by construction. The architecture and results do not invoke self-citations as load-bearing uniqueness theorems or smuggle ansatzes; the central mediator claim is supported by described evaluations rather than tautological definitions or renaming of known results.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
In: 11th International Confer- ence on Learning Representations (ICLR 2023)
Azzolin, S., Longa, A., Barbiero, P., Lio, P., Passerini, A., et al.: Global explainabil- ity of gnns via logic combination of learned concepts. In: 11th International Confer- ence on Learning Representations (ICLR 2023). pp. 1–19. International Conference on Learning Representations (ICLR) (2023)
2023
-
[2]
arXiv preprint arXiv:2501.03888 (2025)
Baugh, K.G., Dickens, L., Russo, A.: Neural dnf-mt: A neuro-symbolic approach for learning interpretable and editable policies. arXiv preprint arXiv:2501.03888 (2025)
-
[3]
https://openaipublic.blob.core.windows.net/neuron- explainer/paper/index.html (2023)
Bills, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., Goh, G., Sutskever, I., Leike, J., Wu, J., Saunders, W.: Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron- explainer/paper/index.html (2023)
2023
-
[4]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1971–1978 (2014)
1971
-
[5]
Advances in neural information processing systems8(1995)
Craven, M., Shavlik, J.: Extracting tree-structured representations of trained net- works. Advances in neural information processing systems8(1995)
1995
-
[6]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Darwiche, A., Ji, C.: On the computation of necessary and sufficient explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 5582–5591 (2022)
2022
-
[7]
In: Proceedings of the 2019 CHI conference on human factors in computing systems
Hohman, F., Head, A., Caruana, R., DeLine, R., Drucker, S.M.: Gamut: A design probe to understand how data scientists understand machine learning models. In: Proceedings of the 2019 CHI conference on human factors in computing systems. pp. 1–13 (2019)
2019
-
[8]
LoRA: Low-Rank Adaptation of Large Language Models
Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Chen, W.: Lora: Low-rank adaptation of large language models. ArXivabs/2106.09685(2021), https://api.semanticscholar.org/CorpusID:235458009 GLARE: A Natural Language Interface for Querying Global Explanations 15
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[9]
In: International conference on machine learning
Koh, P.W., Nguyen, T., Tang, Y.S., Mussmann, S., Pierson, E., Kim, B., Liang, P.: Concept bottleneck models. In: International conference on machine learning. pp. 5338–5348. PMLR (2020)
2020
-
[10]
In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining
Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: A joint frame- work for description and prediction. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1675–1684 (2016)
2016
-
[11]
In: Proceedings of the 2020 CHI conference on human factors in computing systems
Liao, Q.V., Gruen, D., Miller, S.: Questioning the ai: informing design practices for explainable ai user experiences. In: Proceedings of the 2020 CHI conference on human factors in computing systems. pp. 1–15 (2020)
2020
-
[12]
Artificial intelligence267, 1–38 (2019)
Miller, T.: Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence267, 1–38 (2019)
2019
-
[13]
Qwen, Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., et.al: Qwen2.5 Technical Report (Jan 2025). https://doi.org/10.48550/arXiv.2412.15115, http://arxiv.org/abs/2412.15115, arXiv:2412.15115 [cs]
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2412.15115 2025
-
[14]
CoRR (2024)
Rivière, M., Pathak, S., Sessa, P.G., Hardin, C., Bhupatiraju, S., Hussenot, L., Mesnard, T., Shahriari, B., Ramé, A., Ferret, J., et al.: Gemma 2: Improving open language models at a practical size. CoRR (2024)
2024
-
[15]
Advances in Neural Information Processing Systems36, 68539–68551 (2023)
Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettle- moyer, L., Cancedda, N., Scialom, T.: Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems36, 68539–68551 (2023)
2023
-
[16]
In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
Scholak, T., Schucher, N., Bahdanau, D.: Picard: Parsing incrementally for con- strained auto-regressive decoding from language models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 9895– 9901 (2021)
2021
-
[17]
In: Proceedings of the IEEE international conference on computer vision
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. pp. 618–626 (2017)
2017
-
[18]
CoRR (2024)
Singh, C., Inala, J.P., Galley, M., Caruana, R., Gao, J.: Rethinking interpretability in the era of large language models. CoRR (2024)
2024
-
[19]
arXiv preprint arXiv:2207.04154 (2022)
Slack, D., Krishna, S., Lakkaraju, H., Singh, S.: Talktomodel: Explaining machine learning models with interactive natural language conversations. arXiv preprint arXiv:2207.04154 (2022)
-
[20]
arXiv preprint arXiv:2601.13404 (2026)
Vasu, B., Raffa, G., Tadepalli, P.: Local-to-global logical explanations for deep vision models. arXiv preprint arXiv:2601.13404 (2026)
-
[21]
In: Proceedings of the AAAI Conference on Artificial Intelligence
Vasu, B.K., Tadepalli, P.: Global explanations for image classifiers (student ab- stract). In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 16352–16353 (2023)
2023
-
[22]
IEEE transac- tions on visualization and computer graphics26(1), 56–65 (2019)
Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viégas, F., Wilson, J.: The what-if tool: Interactive probing of machine learning models. IEEE transac- tions on visualization and computer graphics26(1), 56–65 (2019)
2019
-
[23]
Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., et al.: Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[24]
Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing throughade20kdataset.In:ProceedingsoftheIEEEconferenceoncomputervision and pattern recognition. pp. 633–641 (2017) 16 B. Vasu, R.Mangannavar. A Appendix A.1 Query type breakdown Table 5 provides a per-query-type breakdown for Gemma 2 9B. The model achieves 100% accuracy on...
2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.