pith. sign in

arxiv: 2605.21186 · v1 · pith:NZWW644Cnew · submitted 2026-05-20 · 💻 cs.CV · cs.AI

SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

Pith reviewed 2026-05-21 05:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords explainable AItiny object detectionbacteria detectionfeature attributionSAM modelmorphological reconstructionXAI frameworkinstance-level denoising
0
0 comments X

The pith

Converting attribution maps into geometry-aware prompts for SAM3 yields more coherent explanations for tiny bacteria detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to fix a practical problem in medical imaging: when AI spots very small bacteria in cluttered microscope images, standard explanation techniques produce blurry or scattered maps that do not clearly justify the detection. SAM-Sode first extracts rough feature attributions, then converts them into prompts that direct the SAM3 model to sharpen boundaries and reconstruct bacterial shapes. A second step applies two constraints—one tied to physical image properties and one to geometric consistency—to strip away background noise at the level of each individual object. A reader would care because clinical decisions often rest on whether the AI’s highlighted regions make morphological sense to a human expert.

Core claim

The SAM-Sode framework transforms initial feature attribution maps into geometry-aware prompts for the SAM3 foundation model to achieve spatial refinement and morphological reconstruction of the explanatory mappings. A dual-constraint mechanism based on physical significance and geometric alignment performs instance-level denoising, generating coherent explanations that better align with human expert intuition while suppressing background redundancy.

What carries the argument

Geometry-aware prompts created from initial feature attribution maps, supplied to SAM3 for refinement, together with the dual-constraint mechanism that enforces physical and geometric consistency during denoising.

If this is right

  • Explanations for tiny-object detections become spatially precise and morphologically complete rather than diffuse.
  • Background elements that do not match bacterial geometry are removed at the instance level.
  • The resulting maps supply logically coherent visual evidence that matches the expectations of human experts.
  • The same pipeline applies across the authors’ custom circuit-background dataset and additional public datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The prompting strategy could be tested on other sparse-object tasks such as cell counting in histopathology slides or defect detection in manufacturing images.
  • A controlled study could measure whether clinicians change their diagnostic decisions when shown the refined maps versus raw attributions.
  • If the dual constraints prove robust, they might serve as a general post-processing step for any segmentation foundation model used in explanation pipelines.

Load-bearing premise

The SAM3 model already contains enough built-in knowledge of bacterial shapes to turn sparse, noisy attribution maps into accurate outlines without introducing new distortions or biases.

What would settle it

On the 2,524-image bacteria dataset, expert annotators would judge the refined explanation maps no more faithful or less noisy than those from conventional attribution methods, or the maps would show new artifacts absent from the original detections.

Figures

Figures reproduced from arXiv: 2605.21186 by Dazhi Huang, Hechang Chen, Mude Shi, Rufeng Chen, Shuo Yan, Sihong Xie, Tianxing Ji, Wanying Tan, Yazheng Liu, Zili Shao.

Figure 1
Figure 1. Figure 1: The SAM-Sode pipeline: (a) SR-TOD for detection; (b) Explanatory Attribu￾tion for point extraction; (c) Attribution Refinement using SAM3 and dual-filtering for final results. 2 Method Our Pipeline consists of three stages: detection of small objects (Section 2.1), in￾terpretable attribution (Section 2.2), and attribution refinement via mask-guided Constraints (Section 2.4). Firstly, the SR-TOD model is ut… view at source ↗
Figure 2
Figure 2. Figure 2: Global sensitivity analysis of augmentation crops ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (Left) Original bacterial instance; (Right) The corresponding refined mask. The blue dashed lines represent the true physical boundaries of the bacterium. Mask Intersection over Union (MaskIoU). Given that the ground truth (GT) of tiny objects exists only in the form of bounding boxes, we employ the axis￾aligned bounding box of the mask for cross-representation evaluation. Let S be the set of coordinates o… view at source ↗
Figure 4
Figure 4. Figure 4: For the discrete and fragmented maps generated by IG (e.g., ID:2, 9), our method extracts coherent attribution points that align with bacterial contours. For Grad-CAM results (e.g., ID:3, 11), SAM-Sode purifies background noise and success￾fully filters out false high-activation signals at the corners, which is particularly evident in the complete suppression of the spurious activation in ID:11. trained fo… view at source ↗
read the original abstract

Interpretability in object detection provides crucial confidence support for clinical auxiliary diagnosis. However, in tiny bacteria detection, traditional explanation methods often suffer from blurred foreground boundaries and diffuse feature attribution due to the extreme sparsity of target morphological features and severe interference from complex backgrounds. Such limitations hinder the provision of logically coherent morphological evidence. To bridge this gap, we propose a novel eXplainable AI (XAI) framework, SAM-Sode. The framework innovatively transforms initial feature attribution maps into geometry-aware prompts, leveraging the prior knowledge of the foundation model (SAM3) to achieve spatial refinement and morphological reconstruction of the explanatory mappings. Furthermore, we introduce a dual-constraint mechanism based on physical significance and geometric alignment to perform instance-level denoising, generating coherent explanations that better align with human expert intuition. Experimental results on our self-constructed bacteria dataset with complex circuit backgrounds (containing 2,524 images) and other public datasets demonstrate that the proposed method effectively suppresses background redundancy and significantly enhances the decision-making transparency of tiny object detection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes the SAM-Sode framework for generating faithful explanations in tiny bacteria detection tasks. It transforms feature attribution maps into geometry-aware prompts for the SAM3 foundation model to perform spatial refinement and morphological reconstruction. A dual-constraint mechanism based on physical significance and geometric alignment is introduced for instance-level denoising. The approach is evaluated on a self-constructed dataset consisting of 2,524 images with complex circuit backgrounds and additional public datasets, with claims of improved suppression of background redundancy and enhanced alignment with human expert intuition.

Significance. Should the empirical claims be substantiated, this work has the potential to advance explainable AI in the domain of tiny object detection within complex backgrounds, particularly relevant for clinical auxiliary diagnosis. The integration of foundation model priors for morphological reconstruction offers a novel way to address the challenges of sparse features and background interference in attribution maps.

major comments (2)
  1. Abstract: The abstract asserts that experimental results on the self-constructed 2,524-image dataset demonstrate effective suppression of background redundancy and significant enhancement of decision-making transparency, yet provides no quantitative results, error bars, ablation studies, baseline comparisons, or specific metrics; the central claim of improved faithfulness therefore cannot be verified from the available information.
  2. Method description: The framework's core claim rests on the assumption that SAM3 possesses sufficient prior knowledge of bacterial morphology to perform accurate spatial refinement and morphological reconstruction from sparse, noisy attribution maps without introducing new artifacts or biases; no section demonstrates validation of SAM3 outputs against independent expert morphological ground truth separate from the downstream detector.
minor comments (2)
  1. The dual-constraint mechanism (physical significance + geometric alignment) is described at a high level; providing explicit equations or pseudocode would improve clarity and reproducibility.
  2. The self-constructed dataset is introduced without details on annotation protocol, class balance, or how circuit backgrounds were generated; adding these would strengthen the experimental setup description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful and constructive review of our manuscript. We have carefully addressed each major comment below and commit to revisions that will strengthen the substantiation of our claims while preserving the core contributions of the work.

read point-by-point responses
  1. Referee: Abstract: The abstract asserts that experimental results on the self-constructed 2,524-image dataset demonstrate effective suppression of background redundancy and significant enhancement of decision-making transparency, yet provides no quantitative results, error bars, ablation studies, baseline comparisons, or specific metrics; the central claim of improved faithfulness therefore cannot be verified from the available information.

    Authors: We agree that the abstract would benefit from including key quantitative indicators to support its claims. In the revised version, we will update the abstract to report specific metrics such as improvements in background suppression ratio, explanation faithfulness scores (e.g., via deletion/insertion AUC), and comparative gains over baselines, along with references to error bars and ablation results presented in the main body. revision: yes

  2. Referee: Method description: The framework's core claim rests on the assumption that SAM3 possesses sufficient prior knowledge of bacterial morphology to perform accurate spatial refinement and morphological reconstruction from sparse, noisy attribution maps without introducing new artifacts or biases; no section demonstrates validation of SAM3 outputs against independent expert morphological ground truth separate from the downstream detector.

    Authors: We acknowledge the importance of isolating the validation of SAM3's morphological priors. While the current manuscript demonstrates overall benefits through quantitative task performance gains and qualitative alignment with expert intuition, we agree that a more targeted validation would be valuable. In the revision, we will add a new subsection presenting direct comparisons of SAM3-refined maps against independent expert morphological annotations on a subset of images, distinct from the detector's training labels, to assess artifact introduction and bias. revision: yes

Circularity Check

0 steps flagged

No significant circularity; methodological proposal is self-contained

full rationale

The paper describes an independent XAI framework proposal that transforms initial feature attribution maps into geometry-aware prompts for SAM3 followed by a dual-constraint mechanism for denoising. No equations, derivations, fitted parameters, or self-referential reductions appear in the abstract or described method. The central claims rest on experimental results from a self-constructed dataset and public datasets rather than any input-by-construction equivalence. The approach stands as a novel methodological suggestion without load-bearing self-citations, ansatzes smuggled via prior work, or renaming of known results that would trigger circularity under the specified patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach rests on the assumption that SAM3 can reliably reconstruct bacterial morphology from sparse prompts and that the dual constraints meaningfully denoise without post-hoc tuning.

axioms (1)
  • domain assumption SAM3 foundation model contains prior knowledge sufficient for morphological reconstruction of tiny bacteria from geometry-aware prompts
    Invoked when the paper states that SAM3 is leveraged to achieve spatial refinement and morphological reconstruction.

pith-pipeline@v0.9.0 · 5736 in / 1303 out tokens · 34167 ms · 2026-05-21T05:52:22.200132+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 2 internal anchors

  1. [1]

    Murray, C.J., Ikuta, K.S., Sharara, F., et al.: Global burden of bacterial antimi- crobial resistance in 2019: a systematic analysis.The Lancet399(10325), 629–655 (2022)

  2. [2]

    Pattern Anal

    Perez, A., Gonzalez, R.C.: An iterative thresholding algorithm for image segmen- tation.IEEE Trans. Pattern Anal. Mach. Intell.9(6), 742–751 (1987)

  3. [3]

    Otsu, N.: A threshold selection method from gray-level histograms.Automatica 11(3), 23–27 (1975)

  4. [4]

    Pearson Education India (2004)

    Gonzalez, R.C., Woods, R.E., Eddins, S.L.: Digital Image Processing Using MAT- LAB. Pearson Education India (2004)

  5. [5]

    Image Process.16(5), 1437–1445 (2007)

    Levner, I., Zhang, H.: Classification-driven watershed segmentation.IEEE Trans. Image Process.16(5), 1437–1445 (2007)

  6. [6]

    In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Petsiuk, V., Jain, R., Manjunatha, V., Morariu, V.I., Mehra, A., Ordonez, V., Saenko, K.: Black-box explanation of object detectors via saliency maps. In: Pro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 11443–11452 (2021)

  7. [7]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Petsiuk, V., Jain, R., Manjunatha, V., et al.: Black-box explanation of object detectors via saliency maps. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 11443–11452 (2021)

  8. [8]

    arXiv preprint arXiv:2306.02744 (2023)

    Truong,V.B.,Nguyen,T.T.H.,Nguyen,V.T.K.,etal.:Towardsbetterexplanations for object detection. arXiv preprint arXiv:2306.02744 (2023)

  9. [9]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

    Yamauchi, T.: Spatial sensitive Grad-CAM++: Improved visual explanation for object detectors via weighted combination of gradient map. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 8164– 8168 (2024)

  10. [10]

    arXiv preprint arXiv:2404.13417 (2024)

    Nguyen, Q.K., Nguyen, T.T.H., et al.: Efficient and concise explanations for ob- ject detection with gaussian-class activation mapping explainer. arXiv preprint arXiv:2404.13417 (2024)

  11. [11]

    (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning

    Montavon, G., Binder, A., Lapuschkin, S., et al.: Layer-wise relevance propagation: Anoverview.In:Samek,W.,Montavon,G.,Vedaldi,A.,Hansen,L.K.,Müller,K.R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. pp. 193–209. Springer, Cham (2019)

  12. [12]

    DSSD : Deconvolutional Single Shot Detector

    Fu, C.Y., Liu, W., Ranga, A., et al.: DSSD: Deconvolutional single shot detector. arXiv preprint arXiv:1701.06659 (2017). https://doi.org/10.48550/arXiv.1701.06659

  13. [13]

    In: Leibe, B., Matas, J., Sebe, N., Welling, M

    Liu, W., Anguelov, D., Erhan, D., et al.: SSD: Single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) Computer Vision – ECCV 2016. pp. 21–37. Springer, Cham (2016) 10 W. Tan et al

  14. [14]

    In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    Bell, S., Zitnick, C.L., Bala, K., et al.: Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 2874–2883 (2016)

  15. [15]

    In: Gee, J.C., et al

    Chen, B., Solebo, A., Shi, D., et al.: Minuscule cell detection in AS-OCT images with progressive field-of-view focusing. In: Gee, J.C., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2025. pp. 365–375. Springer, Cham (2026)

  16. [16]

    SAM 3: Segment Anything with Concepts

    Carion, N., Gustafson, L., Hu, Y.T., Debnath, S., Hu, R., Suris, D., Ryali, C., Alwala,K.V.,Khedr,H.,Huang,A.,etal.:Sam3:Segmentanythingwithconcepts. arXiv preprint arXiv:2511.16719(2025)