Interactive Mars Image Content-Based Search with Interpretable Machine Learning
Pith reviewed 2026-05-24 03:55 UTC · model grok-4.3
The pith
A prototype-based architecture lets users understand and validate the evidence used by a classifier on Mars rover images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The paper establishes that a prototype-based architecture enables users to understand and validate the evidence used by a classifier trained on MSL Curiosity rover images and supports investigation of the diversity and correctness of that evidence, with deployment planned on the PDS Image Atlas.
What carries the argument
Prototype-based architecture that supplies visual prototypes as human-inspectable evidence for each classification decision.
If this is right
- The interpretable classifier can replace the existing non-interpretable system on the PDS Image Atlas.
- Users gain the ability to validate evidence behind content-based searches of planetary images.
- Measurements of evidence diversity and correctness become available for the Mars image classifier.
Where Pith is reading between the lines
- The same prototype approach could be tested on image collections from other rover or orbiter missions to check whether interpretability transfers.
- If the prototypes prove reliable, the method might reduce the need for post-hoc explanation techniques when deploying classifiers on scientific image archives.
Load-bearing premise
The prototypes generated by the model must accurately reflect the classifier's actual decision process and must be useful for scientists to check the classifications.
What would settle it
Domain experts reviewing the prototypes for a set of test images cannot correctly anticipate or justify the model's classifications on those images.
Figures
read the original abstract
The NASA Planetary Data System (PDS) hosts millions of images of planets, moons, and other bodies collected throughout many missions. The ever-expanding nature of data and user engagement demands an interpretable content classification system to support scientific discovery and individual curiosity. In this paper, we leverage a prototype-based architecture to enable users to understand and validate the evidence used by a classifier trained on images from the Mars Science Laboratory (MSL) Curiosity rover mission. In addition to providing explanations, we investigate the diversity and correctness of evidence used by the content-based classifier. The work presented in this paper will be deployed on the PDS Image Atlas, replacing its non-interpretable counterpart.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a prototype-based interpretable machine learning architecture for content-based classification and search of images from the Mars Science Laboratory (MSL) Curiosity rover mission. It claims that this enables users to understand and validate the evidence used by the classifier, while also investigating the diversity and correctness of that evidence, with the system intended for deployment on the NASA PDS Image Atlas to replace a non-interpretable counterpart.
Significance. If the prototype explanations prove faithful to the model's decisions and useful for scientific validation on planetary imagery, the work could meaningfully improve trust and usability in large-scale NASA data archives, supporting both research and public engagement. The engineering focus on deployment is a practical strength, but the lack of any reported quantitative validation limits assessment of whether these benefits are realized.
major comments (2)
- Abstract: The central claim that the prototype-based architecture enables users to 'understand and validate the evidence' and that the authors 'investigate the diversity and correctness of evidence' is unsupported because the manuscript supplies no quantitative results, error analysis, baseline comparisons, faithfulness metrics (e.g., prototype-logit correlation), or expert-rated usefulness scores. This absence is load-bearing for the stated goal of scientific validation.
- Abstract: No concrete metrics, protocols, or evaluation criteria are defined for assessing 'diversity and correctness' of the evidence used by the classifier. Without such grounding, it is impossible to determine whether the investigation substantiates the claim that explanations are practically useful for Mars science.
Simulated Author's Rebuttal
We thank the referee for the detailed review and the recommendation for major revision. The comments focus on the need for quantitative grounding of claims in the abstract. We address each point below and note that while the manuscript emphasizes architectural design, deployment, and qualitative case studies on MSL imagery, we will make targeted revisions to improve clarity without altering the core contribution.
read point-by-point responses
-
Referee: Abstract: The central claim that the prototype-based architecture enables users to 'understand and validate the evidence' and that the authors 'investigate the diversity and correctness of evidence' is unsupported because the manuscript supplies no quantitative results, error analysis, baseline comparisons, faithfulness metrics (e.g., prototype-logit correlation), or expert-rated usefulness scores. This absence is load-bearing for the stated goal of scientific validation.
Authors: We agree that quantitative metrics such as faithfulness scores or expert ratings would strengthen the claims. The manuscript supports the claims through explicit prototype visualizations and case studies in the results section, where users can directly inspect the evidence (e.g., prototypes matching terrain features or rover components). No error analysis or baseline comparisons appear because the contribution centers on interpretability and deployment rather than performance benchmarking. We will revise the abstract to qualify the investigation as qualitative and add a brief limitations paragraph discussing the absence of formal faithfulness metrics. revision: partial
-
Referee: Abstract: No concrete metrics, protocols, or evaluation criteria are defined for assessing 'diversity and correctness' of the evidence used by the classifier. Without such grounding, it is impossible to determine whether the investigation substantiates the claim that explanations are practically useful for Mars science.
Authors: Diversity is illustrated by the range of distinct visual concepts captured across prototypes (different geological and artificial features), and correctness is shown by alignment with known MSL image content via the presented examples. No formal protocol or numeric criteria were defined because the work is exploratory and user-facing. We will add explicit working definitions and a short evaluation protocol description in the revised methods or results section to address this concern. revision: partial
Circularity Check
No circularity; application paper with no derivations or fitted predictions
full rationale
The paper describes deployment of an existing prototype-based classifier on MSL images and reports an empirical investigation of diversity/correctness. No equations, first-principles derivations, or quantitative predictions are present. The central claim (explanations are faithful and useful) is an untested modeling assumption rather than a result derived from inputs by construction. Any self-citations to prototype methods are external to the present work and do not create a self-referential chain. This is the normal case of an engineering application paper whose validity rests on external validation, not internal reduction.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Aamodt, A.; and Plaza, E. 1994. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1): 39--59
work page 1994
-
[2]
Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; and Kim, B. 2018. Sanity checks for saliency maps. Advances in neural information processing systems, 31
work page 2018
-
[3]
Bau, D.; Zhou, B.; Khosla, A.; Oliva, A.; and Torralba, A. 2017. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6541--6549
work page 2017
-
[4]
Bau, D.; Zhu, J.-Y.; Strobelt, H.; Zhou, B.; Tenenbaum, J. B.; Freeman, W. T.; and Torralba, A. 2019. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations (ICLR)
work page 2019
-
[5]
Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; and Su, J. K. 2019. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32
work page 2019
-
[6]
Gunning, D.; Vorm, E.; Wang, J. Y.; and Turek, M. 2021. DARPA's explainable AI (XAI) program: A retrospective. Applied AI Letters, 2(4): e61
work page 2021
-
[7]
Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017. On calibration of modern neural networks. In International conference on machine learning, 1321--1330. PMLR
work page 2017
-
[8]
Khorram, S.; Lawson, T.; and Fuxin, L. 2021. iGOS++ integrated gradient optimized saliency by bilateral perturbations. In Proceedings of the Conference on Health, Inference, and Learning, 174--182
work page 2021
-
[9]
Kingma, D. P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
work page 2015
-
[10]
Kolodner, J. L. 1992. An introduction to case-based reasoning. Artificial intelligence review, 6(1): 3--34
work page 1992
-
[11]
Lakkaraju, H.; and Bastani, O. 2020. ``How do I fool you?" Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79--85
work page 2020
-
[12]
Lucas, M.; Lerma, M.; Furst, J.; and Raicu, D. 2022. RSI-Grad-CAM: Visual explanations from deep networks via Riemann-Stieltjes integrated gradient-based localization. In International Symposium on Visual Computing, 262--274. Springer
work page 2022
-
[13]
Nauck, D.; and Kruse, R. 1999. Obtaining interpretable fuzzy classification rules from medical data. Artificial intelligence in medicine, 16(2): 149--169
work page 1999
-
[14]
Petsiuk, V.; Das, A.; and Saenko, K. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models. In Proceedings of the British Machine Vision Conference (BMVC)
work page 2018
-
[15]
Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206--215
work page 2019
-
[16]
Rui, Y.; Huang, T.; Ortega, M.; and Mehrotra, S. 1998. Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5): 644--655
work page 1998
-
[17]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3): 211--252
work page 2015
-
[18]
R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D
Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on computer vision, 618--626
work page 2017
-
[19]
Vasu, B.; Hu, B.; Dong, B.; Collins, R.; and Hoogs, A. 2021. Explainable, interactive content-based image retrieval. Applied AI Letters, 2(4): e41
work page 2021
-
[20]
B.; Doran, G.; Francis, R.; Lee, J.; et al
Wagstaff, K.; Lu, S.; Dunkel, E.; Grimes, K.; Zhao, B.; Cai, J.; Cole, S. B.; Doran, G.; Francis, R.; Lee, J.; et al. 2021. Mars image content classification: Three years of NASA deployment and recent advances. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 15204--15213
work page 2021
-
[21]
Zeiler, M. D.; and Fergus, R. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision, 818--833. Springer
work page 2014
-
[22]
Zhang, N.; Donahue, J.; Girshick, R.; and Darrell, T. 2014. Part-based R-CNNs for fine-grained category detection. In European conference on computer vision, 834--849. Springer
work page 2014
-
[23]
Zheng, H.; Fu, J.; Mei, T.; and Luo, J. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, 5209--5217
work page 2017
-
[24]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; and Torralba, A. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2921--2929
work page 2016
-
[25]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[26]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.