pith. sign in

arxiv: 2402.16860 · v2 · submitted 2024-01-19 · 💻 cs.CV · cs.IR

Interactive Mars Image Content-Based Search with Interpretable Machine Learning

Pith reviewed 2026-05-24 03:55 UTC · model grok-4.3

classification 💻 cs.CV cs.IR
keywords interpretable machine learningprototype-based classificationMars image analysiscontent-based image searchplanetary data systemCuriosity roverexplainable AI
0
0 comments X

The pith

A prototype-based architecture lets users understand and validate the evidence used by a classifier on Mars rover images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates a prototype-based machine learning model that classifies images collected by the Mars Science Laboratory Curiosity rover while supplying explanations for its decisions. This setup addresses the challenge of making content-based search interpretable in large archives of planetary images hosted by the NASA Planetary Data System. Users can inspect the prototypes to see what image features drive each classification and to check whether those features align with scientific expectations. The authors also measure how diverse and correct the selected evidence is across different classes. The resulting system is slated to replace the current non-interpretable classifier on the PDS Image Atlas.

Core claim

The paper establishes that a prototype-based architecture enables users to understand and validate the evidence used by a classifier trained on MSL Curiosity rover images and supports investigation of the diversity and correctness of that evidence, with deployment planned on the PDS Image Atlas.

What carries the argument

Prototype-based architecture that supplies visual prototypes as human-inspectable evidence for each classification decision.

If this is right

  • The interpretable classifier can replace the existing non-interpretable system on the PDS Image Atlas.
  • Users gain the ability to validate evidence behind content-based searches of planetary images.
  • Measurements of evidence diversity and correctness become available for the Mars image classifier.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same prototype approach could be tested on image collections from other rover or orbiter missions to check whether interpretability transfers.
  • If the prototypes prove reliable, the method might reduce the need for post-hoc explanation techniques when deploying classifiers on scientific image archives.

Load-bearing premise

The prototypes generated by the model must accurately reflect the classifier's actual decision process and must be useful for scientists to check the classifications.

What would settle it

Domain experts reviewing the prototypes for a set of test images cannot correctly anticipate or justify the model's classifications on those images.

Figures

Figures reproduced from arXiv: 2402.16860 by Bhavan Vasu, Emily Dunkel, Kevin Grimes, Kiri L. Wagstaff, Michael McAuley, Steven Lu.

Figure 1
Figure 1. Figure 1: Qualitative example of the top-4 most visually [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Figure showing representative examples from [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Explanation for two images from class Sun showing the difference between evidence when the image is misclassified as Night Sky (red, left) vs. when it is classified correctly as Sun (green, right) from a VGG19 backbone. The meaning of the columns is identical to [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Average number of In-class prototypes for top-3 [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: An illustration of user experience being considered [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
read the original abstract

The NASA Planetary Data System (PDS) hosts millions of images of planets, moons, and other bodies collected throughout many missions. The ever-expanding nature of data and user engagement demands an interpretable content classification system to support scientific discovery and individual curiosity. In this paper, we leverage a prototype-based architecture to enable users to understand and validate the evidence used by a classifier trained on images from the Mars Science Laboratory (MSL) Curiosity rover mission. In addition to providing explanations, we investigate the diversity and correctness of evidence used by the content-based classifier. The work presented in this paper will be deployed on the PDS Image Atlas, replacing its non-interpretable counterpart.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper presents a prototype-based interpretable machine learning architecture for content-based classification and search of images from the Mars Science Laboratory (MSL) Curiosity rover mission. It claims that this enables users to understand and validate the evidence used by the classifier, while also investigating the diversity and correctness of that evidence, with the system intended for deployment on the NASA PDS Image Atlas to replace a non-interpretable counterpart.

Significance. If the prototype explanations prove faithful to the model's decisions and useful for scientific validation on planetary imagery, the work could meaningfully improve trust and usability in large-scale NASA data archives, supporting both research and public engagement. The engineering focus on deployment is a practical strength, but the lack of any reported quantitative validation limits assessment of whether these benefits are realized.

major comments (2)
  1. Abstract: The central claim that the prototype-based architecture enables users to 'understand and validate the evidence' and that the authors 'investigate the diversity and correctness of evidence' is unsupported because the manuscript supplies no quantitative results, error analysis, baseline comparisons, faithfulness metrics (e.g., prototype-logit correlation), or expert-rated usefulness scores. This absence is load-bearing for the stated goal of scientific validation.
  2. Abstract: No concrete metrics, protocols, or evaluation criteria are defined for assessing 'diversity and correctness' of the evidence used by the classifier. Without such grounding, it is impossible to determine whether the investigation substantiates the claim that explanations are practically useful for Mars science.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and the recommendation for major revision. The comments focus on the need for quantitative grounding of claims in the abstract. We address each point below and note that while the manuscript emphasizes architectural design, deployment, and qualitative case studies on MSL imagery, we will make targeted revisions to improve clarity without altering the core contribution.

read point-by-point responses
  1. Referee: Abstract: The central claim that the prototype-based architecture enables users to 'understand and validate the evidence' and that the authors 'investigate the diversity and correctness of evidence' is unsupported because the manuscript supplies no quantitative results, error analysis, baseline comparisons, faithfulness metrics (e.g., prototype-logit correlation), or expert-rated usefulness scores. This absence is load-bearing for the stated goal of scientific validation.

    Authors: We agree that quantitative metrics such as faithfulness scores or expert ratings would strengthen the claims. The manuscript supports the claims through explicit prototype visualizations and case studies in the results section, where users can directly inspect the evidence (e.g., prototypes matching terrain features or rover components). No error analysis or baseline comparisons appear because the contribution centers on interpretability and deployment rather than performance benchmarking. We will revise the abstract to qualify the investigation as qualitative and add a brief limitations paragraph discussing the absence of formal faithfulness metrics. revision: partial

  2. Referee: Abstract: No concrete metrics, protocols, or evaluation criteria are defined for assessing 'diversity and correctness' of the evidence used by the classifier. Without such grounding, it is impossible to determine whether the investigation substantiates the claim that explanations are practically useful for Mars science.

    Authors: Diversity is illustrated by the range of distinct visual concepts captured across prototypes (different geological and artificial features), and correctness is shown by alignment with known MSL image content via the presented examples. No formal protocol or numeric criteria were defined because the work is exploratory and user-facing. We will add explicit working definitions and a short evaluation protocol description in the revised methods or results section to address this concern. revision: partial

Circularity Check

0 steps flagged

No circularity; application paper with no derivations or fitted predictions

full rationale

The paper describes deployment of an existing prototype-based classifier on MSL images and reports an empirical investigation of diversity/correctness. No equations, first-principles derivations, or quantitative predictions are present. The central claim (explanations are faithful and useful) is an untested modeling assumption rather than a result derived from inputs by construction. Any self-citations to prototype methods are external to the present work and do not create a self-referential chain. This is the normal case of an engineering application paper whose validity rests on external validation, not internal reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no model equations, training details, or explicit assumptions are provided, so no free parameters, axioms, or invented entities can be identified.

pith-pipeline@v0.9.0 · 5650 in / 997 out tokens · 19997 ms · 2026-05-24T03:55:07.007303+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

26 extracted references · 26 canonical work pages

  1. [1]

    Aamodt, A.; and Plaza, E. 1994. Case-based reasoning: Foundational issues, methodological variations, and system approaches. AI communications, 7(1): 39--59

  2. [2]

    Adebayo, J.; Gilmer, J.; Muelly, M.; Goodfellow, I.; Hardt, M.; and Kim, B. 2018. Sanity checks for saliency maps. Advances in neural information processing systems, 31

  3. [3]

    Bau, D.; Zhou, B.; Khosla, A.; Oliva, A.; and Torralba, A. 2017. Network dissection: Quantifying interpretability of deep visual representations. In Proceedings of the IEEE conference on computer vision and pattern recognition, 6541--6549

  4. [4]

    B.; Freeman, W

    Bau, D.; Zhu, J.-Y.; Strobelt, H.; Zhou, B.; Tenenbaum, J. B.; Freeman, W. T.; and Torralba, A. 2019. GAN Dissection: Visualizing and Understanding Generative Adversarial Networks. In Proceedings of the International Conference on Learning Representations (ICLR)

  5. [5]

    Chen, C.; Li, O.; Tao, D.; Barnett, A.; Rudin, C.; and Su, J. K. 2019. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems, 32

  6. [6]

    Y.; and Turek, M

    Gunning, D.; Vorm, E.; Wang, J. Y.; and Turek, M. 2021. DARPA's explainable AI (XAI) program: A retrospective. Applied AI Letters, 2(4): e61

  7. [7]

    Guo, C.; Pleiss, G.; Sun, Y.; and Weinberger, K. Q. 2017. On calibration of modern neural networks. In International conference on machine learning, 1321--1330. PMLR

  8. [8]

    Khorram, S.; Lawson, T.; and Fuxin, L. 2021. iGOS++ integrated gradient optimized saliency by bilateral perturbations. In Proceedings of the Conference on Health, Inference, and Learning, 174--182

  9. [9]

    P.; and Ba, J

    Kingma, D. P.; and Ba, J. 2015. Adam: A Method for Stochastic Optimization. In Bengio, Y.; and LeCun, Y., eds., 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings

  10. [10]

    Kolodner, J. L. 1992. An introduction to case-based reasoning. Artificial intelligence review, 6(1): 3--34

  11. [11]

    Lakkaraju, H.; and Bastani, O. 2020. ``How do I fool you?" Manipulating User Trust via Misleading Black Box Explanations. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 79--85

  12. [12]

    Lucas, M.; Lerma, M.; Furst, J.; and Raicu, D. 2022. RSI-Grad-CAM: Visual explanations from deep networks via Riemann-Stieltjes integrated gradient-based localization. In International Symposium on Visual Computing, 262--274. Springer

  13. [13]

    Nauck, D.; and Kruse, R. 1999. Obtaining interpretable fuzzy classification rules from medical data. Artificial intelligence in medicine, 16(2): 149--169

  14. [14]

    Petsiuk, V.; Das, A.; and Saenko, K. 2018. RISE: Randomized Input Sampling for Explanation of Black-box Models. In Proceedings of the British Machine Vision Conference (BMVC)

  15. [15]

    Rudin, C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5): 206--215

  16. [16]

    Rui, Y.; Huang, T.; Ortega, M.; and Mehrotra, S. 1998. Relevance feedback: a power tool for interactive content-based image retrieval. IEEE Transactions on Circuits and Systems for Video Technology, 8(5): 644--655

  17. [17]

    C.; and Fei-Fei, L

    Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; Berg, A. C.; and Fei-Fei, L. 2015. ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV), 115(3): 211--252

  18. [18]

    R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D

    Selvaraju, R. R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; and Batra, D. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on computer vision, 618--626

  19. [19]

    Vasu, B.; Hu, B.; Dong, B.; Collins, R.; and Hoogs, A. 2021. Explainable, interactive content-based image retrieval. Applied AI Letters, 2(4): e41

  20. [20]

    B.; Doran, G.; Francis, R.; Lee, J.; et al

    Wagstaff, K.; Lu, S.; Dunkel, E.; Grimes, K.; Zhao, B.; Cai, J.; Cole, S. B.; Doran, G.; Francis, R.; Lee, J.; et al. 2021. Mars image content classification: Three years of NASA deployment and recent advances. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 35, 15204--15213

  21. [21]

    D.; and Fergus, R

    Zeiler, M. D.; and Fergus, R. 2014. Visualizing and understanding convolutional networks. In European conference on computer vision, 818--833. Springer

  22. [22]

    Zhang, N.; Donahue, J.; Girshick, R.; and Darrell, T. 2014. Part-based R-CNNs for fine-grained category detection. In European conference on computer vision, 834--849. Springer

  23. [23]

    Zheng, H.; Fu, J.; Mei, T.; and Luo, J. 2017. Learning multi-attention convolutional neural network for fine-grained image recognition. In Proceedings of the IEEE international conference on computer vision, 5209--5217

  24. [24]

    Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; and Torralba, A. 2016. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2921--2929

  25. [25]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  26. [26]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...