arxiv: 2604.11868 · v1 · submitted 2026-04-13 · 💻 cs.CV

Recognition: unknown

MedConcept: Unsupervised Concept Discovery for Interpretability in Medical VLMs

Md Rakibul Haque , KM Arefeen Sultan , Tushar Kataria , Shireen Elhabian

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:36 UTC · model grok-4.3

classification 💻 cs.CV

keywords medical vision-language modelsunsupervised concept discoveryinterpretabilitysparse neuron activationssemantic verificationradiology reportspseudo-reportsmedical VLMs

0 comments

The pith

MedConcept identifies sparse neuron activations in pretrained medical VLMs and converts them into readable pseudo-report summaries without any supervision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Medical vision-language models achieve strong results on tasks such as tumor segmentation and diagnosis but hide their reasoning inside opaque latent spaces. MedConcept locates groups of neurons that activate together for particular image patterns and turns those activations into short textual summaries styled like radiology reports. These summaries are then checked for alignment with actual reports by an independent medical language model that assigns each concept one of three scores: aligned, unaligned, or uncertain. The entire process runs without task labels or supervision, producing explanations that can be reused across different medical imaging tasks. This creates a concrete way for clinicians to examine what the model has internally recognized from scans.

Core claim

MedConcept uncovers latent medical concepts in a fully unsupervised manner by identifying sparse neuron-level concept activations from pretrained VLM representations and translating them into pseudo-report-style summaries that are grounded in clinically verifiable textual semantics. It further supplies a quantitative semantic verification protocol that uses a frozen external medical LLM to score each concept as Aligned, Unaligned, or Uncertain relative to radiology reports, establishing a post-hoc evaluation baseline for interpretability that is independent of any downstream task.

What carries the argument

Sparse neuron-level concept activations extracted from the pretrained VLM latent space, which are then mapped to pseudo-report-style textual summaries for semantic grounding and evaluation.

If this is right

Concept-level explanations become reusable across multiple downstream tasks such as segmentation and diagnosis prediction.
Physicians gain a practical route to inspect internal model reasoning through generated pseudo-reports.
A quantitative baseline for measuring interpretability in medical VLMs is established via the three alignment scores.
Interpretability methods are no longer restricted to task-specific gradient or attention visualizations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sparsity-driven extraction could be tested on non-medical VLMs to check whether the grounding step generalizes beyond radiology.
Mismatched or unaligned concepts could serve as a diagnostic signal for locating systematic biases in the original VLM.
Integrating the discovered concepts as auxiliary supervision during fine-tuning might improve both accuracy and transparency.
The pseudo-report format opens the possibility of direct comparison with human-generated reports in clinical audits.

Load-bearing premise

Sparse neuron activations in the VLM latent space correspond to distinct, clinically meaningful medical concepts that can be reliably grounded in textual semantics without any supervision or task-specific labels.

What would settle it

Apply the semantic verification protocol to a held-out collection of radiology reports and observe that the large majority of discovered concepts receive Unaligned or Uncertain scores rather than Aligned scores.

Figures

Figures reproduced from arXiv: 2604.11868 by KM Arefeen Sultan, Md Rakibul Haque, Shireen Elhabian, Tushar Kataria.

**Figure 1.** Figure 1: MedConcept– and unsupervised concept discovery and semantic verification framework. A sparse autoencoder (SAE) produces sparse activations/embeddings that are matched with textual embeddings generated by a text decoder over the full concept dictionary, assigning each neuron to the concept with the highest similarity. Latent Concept Discovery and Extraction. Given a volumetric image x ∈ R H×W×D, where H, W… view at source ↗

**Figure 2.** Figure 2: Qualitative evaluation of unsupervised 3D concept extraction using MedConcept. The top ten activated concepts per case are displayed. Concepts are evaluated post hoc for clinical grounding via MedGemma, which classifies each concept as Present (aligned), Absent (unaligned or contradicted), or Uncertain. Results indicate coherent sparse decomposition and consistent semantic grounding across datasets [PITH… view at source ↗

**Figure 3.** Figure 3: Boxplots of concept verification scores for the top-K predicted concepts (K ∈ 25, 35). Each panel shows the distribution of per-volume Concept Alignment (Aligned), Concept UnAlignment (UnAligned), and Concept Uncertain scores aggregated across cases. Alignment scores are consistently higher on the (A.) MerlinPlus dataset compared to (B.) Abdomen Atlas, while Abdomen Atlas exhibits comparatively lower al… view at source ↗

read the original abstract

While medical Vision-Language models (VLMs) achieve strong performance on tasks such as tumor or organ segmentation and diagnosis prediction, their opaque latent representations limit clinical trust and the ability to explain predictions. Interpretability of these multimodal representations are therefore essential for the trustworthy clinical deployment of pretrained medical VLMs. However, current interpretability methods, such as gradient- or attention-based visualizations, are often limited to specific tasks such as classification. Moreover, they do not provide concept-level explanations derived from shared pretrained representations that can be reused across downstream tasks. We introduce MedConcept, a framework that uncovers latent medical concepts in a fully unsupervised manner and grounds them in clinically verifiable textual semantics. MedConcept identifies sparse neuron-level concept activations from pretrained VLM representations and translates them into pseudo-report-style summaries, enabling physician-level inspection of internal model reasoning. To address the lack of quantitative evaluation in concept-based interpretability, we introduce a quantitative semantic verification protocol that leverages an independent pretrained medical LLM as a frozen external evaluator to assess concept alignment with radiology reports. We define three concept scores, Aligned, Unaligned, and Uncertain, to quantify semantic support, contradiction, or ambiguity relative to radiology reports and use them exclusively for post hoc evaluation. These scores provide a quantitative baseline for assessing interpretability in medical VLMs. All codes, prompt and data to be released on acceptance. Ke

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MedConcept sketches an unsupervised pipeline for concept discovery in medical VLMs using sparse neurons and LLM scoring, but offers no results to test whether the activations actually map to clinical concepts.

read the letter

The paper's main contribution is a framework called MedConcept that pulls sparse neuron activations from pretrained medical VLMs, turns them into pseudo-report summaries, and then scores them with an external medical LLM against radiology reports. It defines three post-hoc scores—Aligned, Unaligned, and Uncertain—to quantify how well the discovered concepts match or contradict the reports. This setup aims to give reusable, concept-level explanations that work across tasks instead of being limited to one like classification or segmentation.

Referee Report

2 major / 1 minor

Summary. The paper proposes MedConcept, an unsupervised framework for discovering latent medical concepts in pretrained Vision-Language Models (VLMs). It extracts sparse neuron-level activations from VLM representations, translates them into pseudo-report-style textual summaries for inspection, and introduces a post-hoc quantitative verification protocol that uses an independent pretrained medical LLM to score concepts as Aligned, Unaligned, or Uncertain relative to radiology reports. The approach aims to enable reusable, concept-level explanations across downstream tasks without supervision or task-specific labels.

Significance. If the sparse activations can be shown to reliably correspond to clinically meaningful concepts, the framework would advance interpretability in medical VLMs by providing physician-inspectable explanations grounded in textual semantics and a reusable quantitative baseline for evaluation. The use of a frozen external LLM for verification avoids direct circularity in discovery and addresses the noted gap in standardized metrics for concept-based interpretability.

major comments (2)

[Abstract] Abstract: The manuscript describes the intended pipeline and evaluation protocol at a high level but supplies no empirical results, ablation studies, implementation details, or examples of discovered concepts. This prevents any assessment of whether sparse neuron activations correspond to distinct clinical concepts rather than artifacts or co-occurrences, which is load-bearing for the central claim of enabling reliable interpretability.
[Method] Concept discovery description: No mechanism is provided to enforce or verify that sparsity in the latent space produces semantically disentangled, clinically meaningful concepts; the post-hoc LLM scoring occurs after selection and cannot retroactively confirm the mapping without task labels or ground-truth correspondence tests.

minor comments (1)

[Abstract] The abstract text ends abruptly with 'Ke', which appears to be an incomplete sentence and should be revised for completeness.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed review of our manuscript. We address each major comment below, clarifying the current scope of the work while committing to revisions that strengthen the empirical grounding and methodological transparency.

read point-by-point responses

Referee: [Abstract] Abstract: The manuscript describes the intended pipeline and evaluation protocol at a high level but supplies no empirical results, ablation studies, implementation details, or examples of discovered concepts. This prevents any assessment of whether sparse neuron activations correspond to distinct clinical concepts rather than artifacts or co-occurrences, which is load-bearing for the central claim of enabling reliable interpretability.

Authors: We agree that the abstract as submitted is high-level and omits concrete empirical results, ablations, implementation specifics, and examples of discovered concepts. The manuscript centers on introducing the unsupervised MedConcept framework and the independent-LLM verification protocol rather than reporting task-specific performance. In the revised version we will expand the abstract to include representative examples of extracted concepts, their pseudo-report translations, and the resulting Aligned/Uncertain/Unaligned score distributions. We will also add a concise summary of the activation-selection procedure and any ablations performed on sparsity thresholds so that readers can directly evaluate whether the sparse neurons capture clinically coherent concepts. revision: yes
Referee: [Method] Concept discovery description: No mechanism is provided to enforce or verify that sparsity in the latent space produces semantically disentangled, clinically meaningful concepts; the post-hoc LLM scoring occurs after selection and cannot retroactively confirm the mapping without task labels or ground-truth correspondence tests.

Authors: Concept discovery proceeds by directly harvesting the sparse neuron activations already present in the frozen pretrained VLM; no auxiliary loss or architectural constraint is added to enforce disentanglement beyond the natural sparsity observed in the model’s latent space. Candidate concepts are then rendered as pseudo-reports for inspection. The independent medical LLM is applied strictly after selection and serves only as an external, frozen evaluator that assigns the three semantic scores; it is deliberately decoupled from the discovery stage to avoid circularity. We acknowledge that this post-hoc protocol cannot supply ground-truth correspondence in the absence of task labels and that the clinical meaningfulness ultimately rests on the assumption that the VLM’s sparse activations are semantically coherent—an assumption shared by many unsupervised interpretability methods. In revision we will (i) detail the precise activation-selection rule (e.g., top-k magnitude or frequency thresholding) and (ii) add an explicit limitations paragraph discussing the post-hoc nature of the verification. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core derivation is self-contained: it extracts sparse neuron activations directly from pretrained VLM representations in a fully unsupervised manner, then applies an independent frozen medical LLM solely for post-hoc quantitative scoring of the resulting pseudo-reports against radiology reports. No equations or steps reduce the discovery process to fitted parameters, self-citations, or renamed inputs; the Aligned/Unaligned/Uncertain scores are defined exclusively for evaluation and do not influence or reconstruct the upstream concept identification. This structure avoids all enumerated circularity patterns and remains falsifiable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The approach rests on assumptions about the correspondence between neuron activations and semantic concepts plus the reliability of an external LLM evaluator; new score definitions are introduced for evaluation.

axioms (2)

domain assumption Sparse neuron activations within pretrained VLM representations correspond to distinct, reusable medical concepts
Central premise enabling unsupervised discovery from latent space.
domain assumption A separate pretrained medical LLM can serve as an unbiased external judge of semantic alignment between generated concept summaries and radiology reports
Required for the quantitative verification protocol.

invented entities (1)

Aligned, Unaligned, and Uncertain concept scores no independent evidence
purpose: To quantify semantic support, contradiction, or ambiguity of discovered concepts relative to reports
New post-hoc evaluation metrics defined in the paper.

pith-pipeline@v0.9.0 · 5557 in / 1238 out tokens · 64379 ms · 2026-05-10T15:36:45.652821+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

26 extracted references · 2 canonical work pages · 1 internal anchor

[1]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Bassi, P.R., Li, W., Chen, J., Zhu, Z., Lin, T., Decherchi, S., Cavalli, A., Wang, K., Yang, Y., Yuille, A.L., et al.: Learning segmentation from radiology reports. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 305–315. Springer (2025)

2025
[2]

Advances in Neural Information Processing Systems37, 84298–84328 (2024)

Bhalla, U., Oesterling, A., Srinivas, S., Calmon, F., Lakkaraju, H.: Interpreting clip with sparse linear concept embeddings (splice). Advances in Neural Information Processing Systems37, 84298–84328 (2024)

2024
[3]

Research Square pp

Blankemeier, L., Cohen, J.P., Kumar, A., Van Veen, D., Gardezi, S.J.S., Paschali, M., Chen, Z., Delbrouck, J.B., Reis, E., Truyts, C., et al.: Merlin: A vision language foundation model for 3d computed tomography. Research Square pp. rs–3 (2024)

2024
[4]

Nucleic acids research32(suppl_1), D267–D270 (2004)

Bodenreider, O.: The unified medical language system (umls): integrating biomed- ical terminology. Nucleic acids research32(suppl_1), D267–D270 (2004)

2004
[5]

Bricken, T.: Towards monosemanticity: Decomposing language mod- els with dictionary learning.https://transformer-circuits.pub/2023/ monosemantic-features(2023), transformer Circuits Thread

2023
[6]

In: International Conference on Medical Image Computing and Computer-Assisted Intervention

Chung, M., Won, J.B., Kim, G., Kim, Y., Ozbulak, U.: Evaluating visual explana- tions of attention maps for transformer-based medical imaging. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. pp. 110–120. Springer (2024)

2024
[7]

Dasdelen, M.F., Lim, H., Buck, M., Götze, K.S., Marr, C., Schneider, S.: Cytosae: Interpretablecellembeddingsforhematology.In:InternationalConferenceonMed- ical Image Computing and Computer-Assisted Intervention. pp. 77–86 (2025)

2025
[8]

IEEE Transactions on Tech- nology and Society4(1), 68–75 (2023)

Dhar, T., Dey, N., Borra, S., Sherratt, R.S.: Challenges of deep learning in medical image analysis—improving explainability and trust. IEEE Transactions on Tech- nology and Society4(1), 68–75 (2023)

2023
[9]

Diagnostic and Interventional Radiology (2025)

D’Antonoli, T.A., Bluethgen, C., Cuocolo, R., Klontzas, M.E., Ponsiglione, A., Kocak, B.: Foundation models for radiology: fundamentals, applications, oppor- tunities, challenges, risks, and prospects. Diagnostic and Interventional Radiology (2025)

2025
[10]

Frontiers in Robotics and AI11, 1444763 (2024)

Ennab, M., Mcheick, H.: Enhancing interpretability and accuracy of ai models in healthcare: a comprehensive review on challenges and future directions. Frontiers in Robotics and AI11, 1444763 (2024)

2024
[11]

Machine Learning and Knowledge Extraction7(1), 12 (2025)

Ennab, M., Mcheick, H.: Advancing ai interpretability in medical imaging: a com- parative analysis of pixel-level interpretability and grad-cam models. Machine Learning and Knowledge Extraction7(1), 12 (2025)

2025
[12]

Computers in Biology and Medicine198, 111200 (2025)

Fayyaz, A.M., Abdulkadir, S.J., Talpur, N., Al-Selwi, S.M., Hassan, S.U., Sum- iea, E.H.: Grad-cam (gradient-weighted class activation mapping): A systematic literature review. Computers in Biology and Medicine198, 111200 (2025)

2025
[13]

In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing

Gallifant, J., Chen, S., Sasse, K., Aerts, H., Hartvigsen, T., Bitterman, D.: Sparse autoencoder features for classifications and transferability. In: Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. pp. 29927–29951 (2025)

2025
[14]

In: International Conference on Information Processing in Medical Imaging

Gong, S., Wang, H., Zhang, X., Dou, Q.: Concepts from neurons: Building inter- pretable medical image diagnostic models by dissecting opaque neural networks. In: International Conference on Information Processing in Medical Imaging. pp. 3–18. Springer (2025)

2025
[15]

Physics in Medicine & Biology66(4), 04TR01 (2021) MedConcept: Interpretability in Medical VLMs 11

Huff, D.T., Weisman, A.J., Jeraj, R.: Interpretation and visualization techniques for deep learning models in medical imaging. Physics in Medicine & Biology66(4), 04TR01 (2021) MedConcept: Interpretability in Medical VLMs 11

2021
[16]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Komorowski, P., Baniecki, H., Biecek, P.: Towards evaluating explanations of vision transformers for medical imaging. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3726–3732 (2023)

2023
[17]

ComputerModeling in Engineering& Sciences 145(2), 1487 (2025)

Lepcha, D.C., Goyal, B., Dogra, A., Alkhayyat, A., Sahu, P.K., Ali, A., Kukreja, V.: Deep learning in medical image analysis: A comprehensive review of algorithms, trends, applications, and challenges. ComputerModeling in Engineering& Sciences 145(2), 1487 (2025)

2025
[18]

Medical Image Analysis97, 103285 (2024)

Li, W., Qu, C., Chen, X., Bassi, P.R., Shi, Y., Lai, Y., Yu, Q., Xue, H., Chen, Y., Lin, X., et al.: Abdomenatlas: A large-scale, detailed-annotated, & multi-center dataset for efficient transfer learning and open algorithmic benchmarking. Medical Image Analysis97, 103285 (2024)

2024
[19]

Yearbook of medical informatics2(01), 41–51 (1993)

Lindberg, D.A., Humphreys, B.L., McCray, A.T.: The unified medical language system. Yearbook of medical informatics2(01), 41–51 (1993)

1993
[20]

OpenAI: Chatgpt (5.2) (2026),https://chat.openai.com/chat

2026
[21]

arXiv preprint arXiv:2504.02821 , year=

Pach, M., Karthik, S., Bouniot, Q., Belongie, S., Akata, Z.: Sparse autoen- coders learn monosemantic features in vision-language models. arXiv preprint arXiv:2504.02821 (2025)

work page arXiv 2025
[22]

In: European Conference on Computer Vision

Rao, S., Mahajan, S., Böhle, M., Schiele, B.: Discover-then-name: Task-agnostic concept bottlenecks via automated concept discovery. In: European Conference on Computer Vision. pp. 444–461. Springer (2024)

2024
[23]

Computers in biology and medicine140, 105111 (2022)

Salahuddin, Z., Woodruff, H.C., Chatterjee, A., Lambin, P.: Transparency of deep neural networks for medical image analysis: A review of interpretability methods. Computers in biology and medicine140, 105111 (2022)

2022
[24]

MedGemma Technical Report

Sellergren, A., Kazemzadeh, S., Jaroensri, T., Kiraly, A., Traverse, M., Kohlberger, T., Xu, S., Jamil, F., Hughes, C., Lau, C., et al.: Medgemma technical report. arXiv preprint arXiv:2507.05201 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V

Shi, W., Li, S., Liang, T., Wan, M., Ma, G., Wang, X., He, X.: Route sparse autoen- coder to interpret large language models. In: Christodoulopoulos, C., Chakraborty, T., Rose, C., Peng, V. (eds.) Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. pp. 6801–6815. Association for Compu- tational Linguistics (Nov 2025)

2025
[26]

npj Artificial Intelligence1(1), 17 (2025)

Wu, J., Wang, Y., Zhong, Z., Liao, W., Trayanova, N., Jiao, Z., Bai, H.X.: Vision- language foundation model for 3d medical imaging. npj Artificial Intelligence1(1), 17 (2025)

2025