pith. sign in

arxiv: 2606.19735 · v1 · pith:LNLMXBZ7new · submitted 2026-06-18 · 💻 cs.AI · cs.CV

GLARE: A Natural Language Interface for Querying Global Explanations

Pith reviewed 2026-06-26 17:53 UTC · model grok-4.3

classification 💻 cs.AI cs.CV
keywords global explanationsnatural language interfaceLLM mediatorXAIimage classifiersSQL queriesusability
0
0 comments X

The pith

An LLM mediator translates natural language questions into SQL queries over local explanation data to enable interactive access to global explanations of image classifiers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an interface that lets users pose questions in plain English about how black-box image classifiers behave across datasets and classes. A core LLM converts each question into a structured SQL query against stored local explanations, which then aggregates results into tailored statistics and visualizations. This setup replaces static monolithic explanation artifacts with on-demand, intent-specific outputs. Evaluation covers how well the system interprets user intent, maps questions accurately, generalizes to new queries and datasets, and tolerates linguistic imperfections. If successful, the approach makes global explanations practically usable for people who need targeted answers rather than fixed reports.

Core claim

The paper claims that the system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. Evaluation on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors shows that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.

What carries the argument

The LLM mediator that translates natural language questions into structured SQL queries over local explanation data.

If this is right

  • Flexible aggregation of explanation data becomes possible without users handling low-level representations.
  • Each query produces statistics-augmented natural language responses along with intent-aligned visualizations.
  • The system supports both local explanations and global ones through the same interface.
  • Performance holds across novel queries, new datasets, and some degree of linguistic errors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same mediation pattern could let users query other structured AI outputs beyond explanations.
  • Real-time user corrections on generated queries might reduce the impact of occasional translation mistakes.
  • Conversational access might replace many specialized dashboards for exploring model behavior.

Load-bearing premise

The LLM can accurately and reliably translate arbitrary natural language questions into correct SQL queries over the explanation data without introducing errors or hallucinations that affect the output statistics and visualizations.

What would settle it

A test set of natural language questions with manually verified correct SQL translations and expected aggregation outputs, checked to determine whether the system's generated queries produce matching statistics and visualizations.

Figures

Figures reproduced from arXiv: 2606.19735 by Bhavan Vasu, Rajesh Mangannavar.

Figure 1
Figure 1. Figure 1: End-to-end pipeline. Top: The upstream framework generates mDNF explana￾tions from aggregation of local concept or logic based explanation. Bottom: Our system translates natural language questions into validated SQL queries executed against the explanation database, returning structured answers with supporting evidence images. 3.3 Query Templates We define 24 query templates corresponding to common analyti… view at source ↗
Figure 2
Figure 2. Figure 2: Visual grounding evidence for “In living rooms, what objects appear with sculp￾ture?”: top-3 evidence images showing the original image (Column 1), Objects deemed important (with value 1) by local explanations (Column 2), and finally the masked image highlighting only the important objects [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
read the original abstract

While global explanations are crucial for understanding vision models across datasets, classes, and decision contexts, their complex and monolithic nature often hinders practical exploration. Because users typically seek targeted answers to specific questions rather than static artifacts, we present an LLM-based interactive interface that provides natural language access to global explanations for black-box image classifiers. The system's core LLM acts as a mediator, translating natural language questions into structured SQL queries over local explanation data. This enables flexible aggregation without exposing users to low-level representations. For each query, the interface outputs statistics-augmented natural language responses, supporting local explanations, and intent-aligned visualizations. We evaluate the system on intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors. Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability of global explanations for human-centered XAI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents GLARE, an LLM-based interactive natural language interface for querying global explanations of black-box image classifiers. The core LLM mediator translates user questions into SQL queries over local explanation data to enable flexible aggregation, statistics-augmented NL responses, local explanations, and intent-aligned visualizations. The system is evaluated on intent interpretation, query mapping accuracy, generalization to novel queries/datasets, and robustness to linguistic errors, with the abstract claiming that results demonstrate substantial improvements in accessibility and usability for human-centered XAI.

Significance. If the claimed evaluations are sound and quantitative, the work could meaningfully advance practical XAI by lowering the barrier to exploring complex global explanations through natural language, a useful step beyond static artifacts. The mediator pattern for structured querying over explanation data is a pragmatic contribution to interactive explanation interfaces.

major comments (2)
  1. [Abstract] Abstract: The statement that 'Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability' provides no quantitative metrics (e.g., accuracy percentages, error rates, baseline comparisons), methodology details, or result summaries, which is load-bearing for the central claim of improvement over existing approaches.
  2. [Abstract] Abstract (evaluation description): The evaluation is asserted to cover 'intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors,' yet no specific results, tables, or analysis of failure cases (e.g., SQL translation hallucinations affecting statistics) are referenced, leaving the reliability of the LLM mediator unverified.
minor comments (1)
  1. [Abstract] The abstract would be strengthened by including at least one concrete quantitative finding (e.g., 'query mapping accuracy of X% on Y queries') to ground the improvement claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below and will revise the abstract accordingly to strengthen the presentation of our results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The statement that 'Our results demonstrate that LLM-mediated querying substantially improves the accessibility and usability' provides no quantitative metrics (e.g., accuracy percentages, error rates, baseline comparisons), methodology details, or result summaries, which is load-bearing for the central claim of improvement over existing approaches.

    Authors: We agree that the abstract as currently written lacks the specific quantitative support needed to substantiate the central claim. In the revised manuscript, we will expand the abstract to include key evaluation metrics (e.g., query mapping accuracy of XX%, generalization success rates across novel queries and datasets, and robustness to linguistic variations) along with a brief note on the evaluation methodology. This will make the improvement claim evidence-based while respecting abstract length constraints. revision: yes

  2. Referee: [Abstract] Abstract (evaluation description): The evaluation is asserted to cover 'intent interpretation, query mapping accuracy, generalization to novel queries and datasets, and robustness to linguistic errors,' yet no specific results, tables, or analysis of failure cases (e.g., SQL translation hallucinations affecting statistics) are referenced, leaving the reliability of the LLM mediator unverified.

    Authors: We acknowledge the abstract does not currently reference specific results or failure-case analysis. We will revise it to summarize the primary quantitative outcomes for each evaluation dimension and explicitly note that detailed tables, error analyses (including LLM mediator hallucinations), and failure cases are presented in the main body (Sections 4 and 5). Due to abstract space limits we cannot embed full tables, but the revision will direct readers to the supporting evidence and improve verifiability of the mediator's reliability. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents a system description for an LLM-mediated natural language interface to global explanations, with claims resting on empirical evaluations of intent interpretation, query mapping accuracy, generalization, and robustness. No mathematical derivations, fitted parameters, predictions, or self-referential claims are present that reduce to inputs by construction. The architecture and results do not invoke self-citations as load-bearing uniqueness theorems or smuggle ansatzes; the central mediator claim is supported by described evaluations rather than tautological definitions or renaming of known results.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract; the work is a system description relying on standard LLM and database technologies.

pith-pipeline@v0.9.1-grok · 5671 in / 974 out tokens · 24696 ms · 2026-06-26T17:53:07.831382+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 6 canonical work pages · 3 internal anchors

  1. [1]

    In: 11th International Confer- ence on Learning Representations (ICLR 2023)

    Azzolin, S., Longa, A., Barbiero, P., Lio, P., Passerini, A., et al.: Global explainabil- ity of gnns via logic combination of learned concepts. In: 11th International Confer- ence on Learning Representations (ICLR 2023). pp. 1–19. International Conference on Learning Representations (ICLR) (2023)

  2. [2]

    arXiv preprint arXiv:2501.03888 (2025)

    Baugh, K.G., Dickens, L., Russo, A.: Neural dnf-mt: A neuro-symbolic approach for learning interpretable and editable policies. arXiv preprint arXiv:2501.03888 (2025)

  3. [3]

    https://openaipublic.blob.core.windows.net/neuron- explainer/paper/index.html (2023)

    Bills, S., Cammarata, N., Mossing, D., Tillman, H., Gao, L., Goh, G., Sutskever, I., Leike, J., Wu, J., Saunders, W.: Language models can explain neurons in language models. https://openaipublic.blob.core.windows.net/neuron- explainer/paper/index.html (2023)

  4. [4]

    In: Proceedings of the IEEE conference on computer vision and pattern recognition

    Chen, X., Mottaghi, R., Liu, X., Fidler, S., Urtasun, R., Yuille, A.: Detect what you can: Detecting and representing objects using holistic models and body parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 1971–1978 (2014)

  5. [5]

    Advances in neural information processing systems8(1995)

    Craven, M., Shavlik, J.: Extracting tree-structured representations of trained net- works. Advances in neural information processing systems8(1995)

  6. [6]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Darwiche, A., Ji, C.: On the computation of necessary and sufficient explanations. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 36, pp. 5582–5591 (2022)

  7. [7]

    In: Proceedings of the 2019 CHI conference on human factors in computing systems

    Hohman, F., Head, A., Caruana, R., DeLine, R., Drucker, S.M.: Gamut: A design probe to understand how data scientists understand machine learning models. In: Proceedings of the 2019 CHI conference on human factors in computing systems. pp. 1–13 (2019)

  8. [8]

    LoRA: Low-Rank Adaptation of Large Language Models

    Hu, E.J., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Chen, W.: Lora: Low-rank adaptation of large language models. ArXivabs/2106.09685(2021), https://api.semanticscholar.org/CorpusID:235458009 GLARE: A Natural Language Interface for Querying Global Explanations 15

  9. [9]

    In: International conference on machine learning

    Koh, P.W., Nguyen, T., Tang, Y.S., Mussmann, S., Pierson, E., Kim, B., Liang, P.: Concept bottleneck models. In: International conference on machine learning. pp. 5338–5348. PMLR (2020)

  10. [10]

    In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining

    Lakkaraju, H., Bach, S.H., Leskovec, J.: Interpretable decision sets: A joint frame- work for description and prediction. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. pp. 1675–1684 (2016)

  11. [11]

    In: Proceedings of the 2020 CHI conference on human factors in computing systems

    Liao, Q.V., Gruen, D., Miller, S.: Questioning the ai: informing design practices for explainable ai user experiences. In: Proceedings of the 2020 CHI conference on human factors in computing systems. pp. 1–15 (2020)

  12. [12]

    Artificial intelligence267, 1–38 (2019)

    Miller, T.: Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence267, 1–38 (2019)

  13. [13]

    Qwen2.5 Technical Report

    Qwen, Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., et.al: Qwen2.5 Technical Report (Jan 2025). https://doi.org/10.48550/arXiv.2412.15115, http://arxiv.org/abs/2412.15115, arXiv:2412.15115 [cs]

  14. [14]

    CoRR (2024)

    Rivière, M., Pathak, S., Sessa, P.G., Hardin, C., Bhupatiraju, S., Hussenot, L., Mesnard, T., Shahriari, B., Ramé, A., Ferret, J., et al.: Gemma 2: Improving open language models at a practical size. CoRR (2024)

  15. [15]

    Advances in Neural Information Processing Systems36, 68539–68551 (2023)

    Schick, T., Dwivedi-Yu, J., Dessì, R., Raileanu, R., Lomeli, M., Hambro, E., Zettle- moyer, L., Cancedda, N., Scialom, T.: Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems36, 68539–68551 (2023)

  16. [16]

    In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

    Scholak, T., Schucher, N., Bahdanau, D.: Picard: Parsing incrementally for con- strained auto-regressive decoding from language models. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. pp. 9895– 9901 (2021)

  17. [17]

    In: Proceedings of the IEEE international conference on computer vision

    Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad- cam: Visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision. pp. 618–626 (2017)

  18. [18]

    CoRR (2024)

    Singh, C., Inala, J.P., Galley, M., Caruana, R., Gao, J.: Rethinking interpretability in the era of large language models. CoRR (2024)

  19. [19]

    arXiv preprint arXiv:2207.04154 (2022)

    Slack, D., Krishna, S., Lakkaraju, H., Singh, S.: Talktomodel: Explaining machine learning models with interactive natural language conversations. arXiv preprint arXiv:2207.04154 (2022)

  20. [20]

    arXiv preprint arXiv:2601.13404 (2026)

    Vasu, B., Raffa, G., Tadepalli, P.: Local-to-global logical explanations for deep vision models. arXiv preprint arXiv:2601.13404 (2026)

  21. [21]

    In: Proceedings of the AAAI Conference on Artificial Intelligence

    Vasu, B.K., Tadepalli, P.: Global explanations for image classifiers (student ab- stract). In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 37, pp. 16352–16353 (2023)

  22. [22]

    IEEE transac- tions on visualization and computer graphics26(1), 56–65 (2019)

    Wexler, J., Pushkarna, M., Bolukbasi, T., Wattenberg, M., Viégas, F., Wilson, J.: The what-if tool: Interactive probing of machine learning models. IEEE transac- tions on visualization and computer graphics26(1), 56–65 (2019)

  23. [23]

    Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task

    Yu, T., Zhang, R., Yang, K., Yasunaga, M., Wang, D., Li, Z., Ma, J., Li, I., Yao, Q., Roman, S., et al.: Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv preprint arXiv:1809.08887 (2018)

  24. [24]

    Zhou, B., Zhao, H., Puig, X., Fidler, S., Barriuso, A., Torralba, A.: Scene parsing throughade20kdataset.In:ProceedingsoftheIEEEconferenceoncomputervision and pattern recognition. pp. 633–641 (2017) 16 B. Vasu, R.Mangannavar. A Appendix A.1 Query type breakdown Table 5 provides a per-query-type breakdown for Gemma 2 9B. The model achieves 100% accuracy on...