Interactive Mars Image Content-Based Search with Interpretable Machine Learning

Bhavan Vasu; Emily Dunkel; Kevin Grimes; Kiri L. Wagstaff; Michael McAuley; Steven Lu

arxiv: 2402.16860 · v2 · submitted 2024-01-19 · 💻 cs.CV · cs.IR

Interactive Mars Image Content-Based Search with Interpretable Machine Learning

Bhavan Vasu , Steven Lu , Emily Dunkel , Kiri L. Wagstaff , Kevin Grimes , Michael McAuley This is my paper

Pith reviewed 2026-05-24 03:55 UTC · model grok-4.3

classification 💻 cs.CV cs.IR

keywords interpretable machine learningprototype-based classificationMars image analysiscontent-based image searchplanetary data systemCuriosity roverexplainable AI

0 comments

The pith

A prototype-based architecture lets users understand and validate the evidence used by a classifier on Mars rover images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a prototype-based machine learning model that classifies images collected by the Mars Science Laboratory Curiosity rover while supplying explanations for its decisions. This setup addresses the challenge of making content-based search interpretable in large archives of planetary images hosted by the NASA Planetary Data System. Users can inspect the prototypes to see what image features drive each classification and to check whether those features align with scientific expectations. The authors also measure how diverse and correct the selected evidence is across different classes. The resulting system is slated to replace the current non-interpretable classifier on the PDS Image Atlas.

Core claim

The paper establishes that a prototype-based architecture enables users to understand and validate the evidence used by a classifier trained on MSL Curiosity rover images and supports investigation of the diversity and correctness of that evidence, with deployment planned on the PDS Image Atlas.

What carries the argument

Prototype-based architecture that supplies visual prototypes as human-inspectable evidence for each classification decision.

If this is right

The interpretable classifier can replace the existing non-interpretable system on the PDS Image Atlas.
Users gain the ability to validate evidence behind content-based searches of planetary images.
Measurements of evidence diversity and correctness become available for the Mars image classifier.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prototype approach could be tested on image collections from other rover or orbiter missions to check whether interpretability transfers.
If the prototypes prove reliable, the method might reduce the need for post-hoc explanation techniques when deploying classifiers on scientific image archives.

Load-bearing premise

The prototypes generated by the model must accurately reflect the classifier's actual decision process and must be useful for scientists to check the classifications.

What would settle it

Domain experts reviewing the prototypes for a set of test images cannot correctly anticipate or justify the model's classifications on those images.

Figures

Figures reproduced from arXiv: 2402.16860 by Bhavan Vasu, Emily Dunkel, Kevin Grimes, Kiri L. Wagstaff, Michael McAuley, Steven Lu.

**Figure 2.** Figure 2: Figure showing representative examples from [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Explanation for two images from class Sun showing the difference between evidence when the image is misclassified as Night Sky (red, left) vs. when it is classified correctly as Sun (green, right) from a VGG19 backbone. The meaning of the columns is identical to [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: Average number of In-class prototypes for top-3 [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: An illustration of user experience being considered [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

read the original abstract

The NASA Planetary Data System (PDS) hosts millions of images of planets, moons, and other bodies collected throughout many missions. The ever-expanding nature of data and user engagement demands an interpretable content classification system to support scientific discovery and individual curiosity. In this paper, we leverage a prototype-based architecture to enable users to understand and validate the evidence used by a classifier trained on images from the Mars Science Laboratory (MSL) Curiosity rover mission. In addition to providing explanations, we investigate the diversity and correctness of evidence used by the content-based classifier. The work presented in this paper will be deployed on the PDS Image Atlas, replacing its non-interpretable counterpart.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This applies established prototype methods to MSL images for a PDS deployment but reports no results or validation of the explanations.

read the letter

This paper applies established prototype-based methods to MSL rover images for an explainable search tool on the PDS atlas, but supplies no quantitative results or validation of the explanations. The work is a targeted engineering project that aims to replace the existing non-interpretable classifier with one where users can see the prototype evidence. They plan to investigate the diversity and correctness of that evidence as well. This could be helpful for users of the specific archive if it works as intended. The soft spots are clear from the abstract: no numbers on performance, no metrics for faithfulness or usefulness of the prototypes, and no baselines. The assumption that the prototypes will enable understanding and validation is not backed by any reported checks, which matches the stress-test note. Nothing suggests new methods or derivations; it's an application of known techniques. Readers interested in tools for planetary data or interpretable ML in science might get something from the deployment details if the full paper has them. It's too narrow for broader CV audiences. I wouldn't bring this to reading group. I wouldn't cite it. It deserves peer review if the full version has rigorous experiments to support the claims, since the use case is concrete and the goal of interpretability for science data is reasonable.

Referee Report

2 major / 0 minor

Summary. The paper presents a prototype-based interpretable machine learning architecture for content-based classification and search of images from the Mars Science Laboratory (MSL) Curiosity rover mission. It claims that this enables users to understand and validate the evidence used by the classifier, while also investigating the diversity and correctness of that evidence, with the system intended for deployment on the NASA PDS Image Atlas to replace a non-interpretable counterpart.

Significance. If the prototype explanations prove faithful to the model's decisions and useful for scientific validation on planetary imagery, the work could meaningfully improve trust and usability in large-scale NASA data archives, supporting both research and public engagement. The engineering focus on deployment is a practical strength, but the lack of any reported quantitative validation limits assessment of whether these benefits are realized.

major comments (2)

Abstract: The central claim that the prototype-based architecture enables users to 'understand and validate the evidence' and that the authors 'investigate the diversity and correctness of evidence' is unsupported because the manuscript supplies no quantitative results, error analysis, baseline comparisons, faithfulness metrics (e.g., prototype-logit correlation), or expert-rated usefulness scores. This absence is load-bearing for the stated goal of scientific validation.
Abstract: No concrete metrics, protocols, or evaluation criteria are defined for assessing 'diversity and correctness' of the evidence used by the classifier. Without such grounding, it is impossible to determine whether the investigation substantiates the claim that explanations are practically useful for Mars science.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation for major revision. The comments focus on the need for quantitative grounding of claims in the abstract. We address each point below and note that while the manuscript emphasizes architectural design, deployment, and qualitative case studies on MSL imagery, we will make targeted revisions to improve clarity without altering the core contribution.

read point-by-point responses

Referee: Abstract: The central claim that the prototype-based architecture enables users to 'understand and validate the evidence' and that the authors 'investigate the diversity and correctness of evidence' is unsupported because the manuscript supplies no quantitative results, error analysis, baseline comparisons, faithfulness metrics (e.g., prototype-logit correlation), or expert-rated usefulness scores. This absence is load-bearing for the stated goal of scientific validation.

Authors: We agree that quantitative metrics such as faithfulness scores or expert ratings would strengthen the claims. The manuscript supports the claims through explicit prototype visualizations and case studies in the results section, where users can directly inspect the evidence (e.g., prototypes matching terrain features or rover components). No error analysis or baseline comparisons appear because the contribution centers on interpretability and deployment rather than performance benchmarking. We will revise the abstract to qualify the investigation as qualitative and add a brief limitations paragraph discussing the absence of formal faithfulness metrics. revision: partial
Referee: Abstract: No concrete metrics, protocols, or evaluation criteria are defined for assessing 'diversity and correctness' of the evidence used by the classifier. Without such grounding, it is impossible to determine whether the investigation substantiates the claim that explanations are practically useful for Mars science.

Authors: Diversity is illustrated by the range of distinct visual concepts captured across prototypes (different geological and artificial features), and correctness is shown by alignment with known MSL image content via the presented examples. No formal protocol or numeric criteria were defined because the work is exploratory and user-facing. We will add explicit working definitions and a short evaluation protocol description in the revised methods or results section to address this concern. revision: partial

Circularity Check

0 steps flagged

No circularity; application paper with no derivations or fitted predictions

full rationale

The paper describes deployment of an existing prototype-based classifier on MSL images and reports an empirical investigation of diversity/correctness. No equations, first-principles derivations, or quantitative predictions are present. The central claim (explanations are faithful and useful) is an untested modeling assumption rather than a result derived from inputs by construction. Any self-citations to prototype methods are external to the present work and do not create a self-referential chain. This is the normal case of an engineering application paper whose validity rests on external validation, not internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no model equations, training details, or explicit assumptions are provided, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5650 in / 997 out tokens · 19997 ms · 2026-05-24T03:55:07.007303+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

[1]

Aamodt, A.; and Plaza, E. 1994. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1): 39--59

work page 1994
[2]

Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; and Kim, B. 2018. Sanity checks for saliency maps. Advances in neural information processing systems, 31

work page 2018
[3]

Bau, D.; Zhou, B.; Khosla, A.; Oliva, A.; and Torralba, A. 2017. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6541--6549

work page 2017
[4]

B.; Freeman, W

Bau, D.; Zhu, J.-Y.; Strobelt, H.; Zhou, B.; Tenenbaum, J. B.; Freeman, W. T.; and Torralba, A. 2019. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations (ICLR)

work page 2019
[5]

Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; and Su, J. K. 2019. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32

work page 2019
[6]

Y.; and Turek, M

Gunning, D.; Vorm, E.; Wang, J. Y.; and Turek, M. 2021. DARPA's explainable AI (XAI) program: A retrospective. Applied AI Letters, 2(4): e61

work page 2021
[7]

Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017. On calibration of modern neural networks. In International conference on machine learning, 1321--1330. PMLR

work page 2017
[8]

Khorram, S.; Lawson, T.; and Fuxin, L. 2021. iGOS++ integrated gradient optimized saliency by bilateral perturbations. In Proceedings of the Conference on Health, Inference, and Learning, 174--182

work page 2021
[9]

P.; and Ba, J

Kingma, D. P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

work page 2015
[10]

Kolodner, J. L. 1992. An introduction to case-based reasoning. Artificial intelligence review, 6(1): 3--34

work page 1992
[11]

Lakkaraju, H.; and Bastani, O. 2020. ``How do I fool you?" Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79--85

work page 2020
[12]

Lucas, M.; Lerma, M.; Furst, J.; and Raicu, D. 2022. RSI-Grad-CAM: Visual explanations from deep networks via Riemann-Stieltjes integrated gradient-based localization. In International Symposium on Visual Computing, 262--274. Springer

work page 2022
[13]

Nauck, D.; and Kruse, R. 1999. Obtaining interpretable fuzzy classification rules from medical data. Artificial intelligence in medicine, 16(2): 149--169

work page 1999
[14]

Petsiuk, V.; Das, A.; and Saenko, K. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models. In Proceedings of the British Machine Vision Conference (BMVC)

work page 2018
[15]

Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206--215

work page 2019
[16]

Rui, Y.; Huang, T.; Ortega, M.; and Mehrotra, S. 1998. Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5): 644--655

work page 1998
[17]

C.; and Fei-Fei, L

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3): 211--252

work page 2015
[18]

R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D

Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on computer vision, 618--626

work page 2017
[19]

Vasu, B.; Hu, B.; Dong, B.; Collins, R.; and Hoogs, A. 2021. Explainable, interactive content-based image retrieval. Applied AI Letters, 2(4): e41

work page 2021
[20]

B.; Doran, G.; Francis, R.; Lee, J.; et al

Wagstaff, K.; Lu, S.; Dunkel, E.; Grimes, K.; Zhao, B.; Cai, J.; Cole, S. B.; Doran, G.; Francis, R.; Lee, J.; et al. 2021. Mars image content classification: Three years of NASA deployment and recent advances. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 15204--15213

work page 2021
[21]

D.; and Fergus, R

Zeiler, M. D.; and Fergus, R. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision, 818--833. Springer

work page 2014
[22]

Zhang, N.; Donahue, J.; Girshick, R.; and Darrell, T. 2014. Part-based R-CNNs for fine-grained category detection. In European conference on computer vision, 834--849. Springer

work page 2014
[23]

Zheng, H.; Fu, J.; Mei, T.; and Luo, J. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, 5209--5217

work page 2017
[24]

Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; and Torralba, A. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2921--2929

work page 2016
[25]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[26]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Aamodt, A.; and Plaza, E. 1994. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1): 39--59

work page 1994

[2] [2]

Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; and Kim, B. 2018. Sanity checks for saliency maps. Advances in neural information processing systems, 31

work page 2018

[3] [3]

Bau, D.; Zhou, B.; Khosla, A.; Oliva, A.; and Torralba, A. 2017. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6541--6549

work page 2017

[4] [4]

B.; Freeman, W

Bau, D.; Zhu, J.-Y.; Strobelt, H.; Zhou, B.; Tenenbaum, J. B.; Freeman, W. T.; and Torralba, A. 2019. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations (ICLR)

work page 2019

[5] [5]

Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; and Su, J. K. 2019. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32

work page 2019

[6] [6]

Y.; and Turek, M

Gunning, D.; Vorm, E.; Wang, J. Y.; and Turek, M. 2021. DARPA's explainable AI (XAI) program: A retrospective. Applied AI Letters, 2(4): e61

work page 2021

[7] [7]

Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017. On calibration of modern neural networks. In International conference on machine learning, 1321--1330. PMLR

work page 2017

[8] [8]

Khorram, S.; Lawson, T.; and Fuxin, L. 2021. iGOS++ integrated gradient optimized saliency by bilateral perturbations. In Proceedings of the Conference on Health, Inference, and Learning, 174--182

work page 2021

[9] [9]

P.; and Ba, J

Kingma, D. P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

work page 2015

[10] [10]

Kolodner, J. L. 1992. An introduction to case-based reasoning. Artificial intelligence review, 6(1): 3--34

work page 1992

[11] [11]

Lakkaraju, H.; and Bastani, O. 2020. ``How do I fool you?" Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79--85

work page 2020

[12] [12]

Lucas, M.; Lerma, M.; Furst, J.; and Raicu, D. 2022. RSI-Grad-CAM: Visual explanations from deep networks via Riemann-Stieltjes integrated gradient-based localization. In International Symposium on Visual Computing, 262--274. Springer

work page 2022

[13] [13]

Nauck, D.; and Kruse, R. 1999. Obtaining interpretable fuzzy classification rules from medical data. Artificial intelligence in medicine, 16(2): 149--169

work page 1999

[14] [14]

Petsiuk, V.; Das, A.; and Saenko, K. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models. In Proceedings of the British Machine Vision Conference (BMVC)

work page 2018

[15] [15]

Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206--215

work page 2019

[16] [16]

Rui, Y.; Huang, T.; Ortega, M.; and Mehrotra, S. 1998. Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5): 644--655

work page 1998

[17] [17]

C.; and Fei-Fei, L

Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3): 211--252

work page 2015

[18] [18]

R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D

Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on computer vision, 618--626

work page 2017

[19] [19]

Vasu, B.; Hu, B.; Dong, B.; Collins, R.; and Hoogs, A. 2021. Explainable, interactive content-based image retrieval. Applied AI Letters, 2(4): e41

work page 2021

[20] [20]

B.; Doran, G.; Francis, R.; Lee, J.; et al

Wagstaff, K.; Lu, S.; Dunkel, E.; Grimes, K.; Zhao, B.; Cai, J.; Cole, S. B.; Doran, G.; Francis, R.; Lee, J.; et al. 2021. Mars image content classification: Three years of NASA deployment and recent advances. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 15204--15213

work page 2021

[21] [21]

D.; and Fergus, R

Zeiler, M. D.; and Fergus, R. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision, 818--833. Springer

work page 2014

[22] [22]

Zhang, N.; Donahue, J.; Girshick, R.; and Darrell, T. 2014. Part-based R-CNNs for fine-grained category detection. In European conference on computer vision, 834--849. Springer

work page 2014

[23] [23]

Zheng, H.; Fu, J.; Mei, T.; and Luo, J. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, 5209--5217

work page 2017

[24] [24]

Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; and Torralba, A. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2921--2929

work page 2016

[25] [25]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[26] [26]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page