CLEAR-HPV: Interpretable concept discovery for human-papillomavirus-associated morphology in whole-slide histology
Pith reviewed 2026-05-21 13:05 UTC · model grok-4.3
The pith
CLEAR-HPV restructures attention-based MIL to discover 10 interpretable morphologic concepts for HPV status while preserving predictive accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CLEAR-HPV operates in an attention-weighted latent space to automatically discover keratinizing, basaloid, and stromal morphologic concepts, generates spatial concept maps, and represents each slide using a compact concept-fraction vector. These vectors preserve the predictive information of the original MIL embeddings while reducing the feature space from 1536 dimensions to 10 interpretable concepts. The approach generalizes consistently across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC datasets.
What carries the argument
The attention-weighted latent space, which repurposes attention scores from a MIL backbone to automatically extract meaningful morphologic concepts without supervision.
If this is right
- Concept-fraction vectors retain the predictive power of high-dimensional embeddings for HPV classification.
- Spatial concept maps provide visual interpretability of morphologic features.
- The framework applies to multiple cancer types and datasets without requiring concept labels.
- Ten concepts suffice to capture HPV-related morphology in whole-slide images.
Where Pith is reading between the lines
- Pathologists could use the concept maps to verify model decisions against known HPV-associated features.
- The reduced dimensionality might facilitate combining these representations with genomic or clinical data.
- Similar attention-based restructuring could apply to other molecular subtypes in pathology.
Load-bearing premise
Attention scores from a standard MIL backbone can be directly used to define a latent space where meaningful morphologic concepts appear automatically without any concept supervision.
What would settle it
If the concept-fraction vectors yield substantially lower accuracy or AUC in predicting HPV status compared to the original MIL embeddings on an independent test set like CPTAC-HNSCC.
Figures
read the original abstract
Human papillomavirus (HPV) status is a critical determinant of prognosis and treatment response in head and neck and cervical cancers. Although attention-based multiple instance learning (MIL) achieves strong slide-level prediction for HPV-related whole-slide histopathology, it provides limited morphologic interpretability. To address this limitation, we introduce Concept-Level Explainable Attention-guided Representation for HPV (CLEAR-HPV), a framework that restructures the MIL latent space using attention to enable concept discovery without requiring concept labels during training. Operating in an attention-weighted latent space, CLEAR-HPV automatically discovers keratinizing, basaloid, and stromal morphologic concepts, generates spatial concept maps, and represents each slide using a compact concept-fraction vector. CLEAR-HPV's concept-fraction vectors preserve the predictive information of the original MIL embeddings while reducing the high-dimensional feature space (e.g., 1536 dimensions) to only 10 interpretable concepts. CLEAR-HPV generalizes consistently across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, providing compact, concept-level interpretability through a general, backbone-agnostic framework for attention-based MIL models of whole-slide histopathology.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces CLEAR-HPV, a framework that restructures the latent space of attention-based multiple instance learning (MIL) models for HPV status prediction in whole-slide histopathology. Without requiring concept labels, it automatically discovers morphologic concepts (e.g., keratinizing, basaloid, stromal), generates spatial concept maps, and represents each slide by a compact 10-dimensional concept-fraction vector. The central claim is that these vectors preserve the predictive information of the original high-dimensional (e.g., 1536-dim) MIL embeddings while providing interpretability and generalizing across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC datasets.
Significance. If the preservation of predictive power and the automatic emergence of meaningful concepts are rigorously demonstrated, this would offer a backbone-agnostic route to concept-level interpretability in attention-based MIL for digital pathology. The dimensionality reduction to 10 concepts could improve clinical trust and enable discovery of HPV-associated morphologies, addressing a recognized limitation of black-box slide-level predictors.
major comments (3)
- [Abstract / Results] Abstract and Results section: the claim that concept-fraction vectors 'preserve the predictive information of the original MIL embeddings' is load-bearing for the contribution, yet no side-by-side quantitative comparison (AUC-ROC, linear-probe accuracy, or mutual information) between the 1536-dim embeddings and the 10-dim vectors is referenced. Without this, the reduction could be lossy if attention weights preferentially emphasize obvious keratinizing patches while attenuating subtler basaloid or stromal HPV signal.
- [Methods] Methods, attention-weighted latent space construction: the assumption that standard MIL attention scores can directly induce a latent space in which unsupervised concept discovery yields robust, non-redundant morphologic concepts requires explicit validation. If attention is dominated by high-scoring patches, the resulting concept fractions may omit predictive but low-attention regions; an ablation replacing attention weights with uniform or random sampling would test this.
- [Experiments] Experiments, cross-dataset generalization: the manuscript asserts consistent generalization across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, but does not report per-dataset AUCs or statistical tests for the concept-fraction vectors. This leaves open whether performance parity holds or whether dataset-specific biases in attention maps drive the apparent consistency.
minor comments (3)
- [Methods] Clarify the exact procedure for deriving the 10 concepts (e.g., clustering method, number of clusters chosen, or post-hoc labeling) and whether any hyper-parameters are tuned on HPV labels.
- [Figures] Figure captions for spatial concept maps should include scale bars, attention threshold values, and a legend mapping colors to the 10 discovered concepts.
- [Related Work] Add a brief comparison to prior unsupervised concept-discovery methods in MIL (e.g., those using prototype learning or post-hoc concept activation vectors) to situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major comment point by point below and have revised the manuscript to strengthen the supporting evidence for our claims.
read point-by-point responses
-
Referee: [Abstract / Results] Abstract and Results section: the claim that concept-fraction vectors 'preserve the predictive information of the original MIL embeddings' is load-bearing for the contribution, yet no side-by-side quantitative comparison (AUC-ROC, linear-probe accuracy, or mutual information) between the 1536-dim embeddings and the 10-dim vectors is referenced. Without this, the reduction could be lossy if attention weights preferentially emphasize obvious keratinizing patches while attenuating subtler basaloid or stromal HPV signal.
Authors: We agree that a direct quantitative comparison is required to support this central claim. In the revised manuscript we have added a dedicated Results subsection with side-by-side evaluations: AUC-ROC and linear-probe accuracy of a downstream classifier trained on the original 1536-dimensional embeddings versus the 10-dimensional concept-fraction vectors, together with mutual-information analysis between the two representations. These new results show that the concept vectors retain the large majority of predictive performance while confirming that attention weighting does not systematically suppress subtler HPV-associated signals. revision: yes
-
Referee: [Methods] Methods, attention-weighted latent space construction: the assumption that standard MIL attention scores can directly induce a latent space in which unsupervised concept discovery yields robust, non-redundant morphologic concepts requires explicit validation. If attention is dominated by high-scoring patches, the resulting concept fractions may omit predictive but low-attention regions; an ablation replacing attention weights with uniform or random sampling would test this.
Authors: We accept that explicit validation of the attention-weighting step is warranted. We have therefore added an ablation study (now reported in the Methods and supplementary material) in which concept discovery is repeated using uniform patch sampling and random patch sampling in place of attention weights. The ablation demonstrates that attention-weighted sampling yields more coherent, less redundant concepts and higher downstream predictive performance than either uniform or random baselines, thereby supporting the original design choice. revision: yes
-
Referee: [Experiments] Experiments, cross-dataset generalization: the manuscript asserts consistent generalization across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, but does not report per-dataset AUCs or statistical tests for the concept-fraction vectors. This leaves open whether performance parity holds or whether dataset-specific biases in attention maps drive the apparent consistency.
Authors: We have expanded the Experiments section to include per-dataset AUC-ROC values for the concept-fraction vectors on TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC. We also report statistical comparisons (DeLong tests between AUCs and a one-way ANOVA across datasets) that confirm performance parity and indicate that dataset-specific attention biases do not drive the observed consistency. revision: yes
Circularity Check
No circularity: method adds interpretability layer without reducing claims to inputs by construction
full rationale
The derivation chain begins with a standard attention-based MIL backbone whose embeddings are restructured via attention weights to produce concept-fraction vectors. This restructuring is presented as an explicit algorithmic step that yields a lower-dimensional representation; the preservation of HPV-predictive information is asserted as an empirical outcome rather than being true by definition of the fractions themselves. No equations, fitted parameters, or self-citations are shown that would make the output equivalent to the input. The framework is backbone-agnostic and operates without concept supervision, keeping the central claim independent of the original 1536-dimensional embeddings.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Attention scores from a pretrained MIL model can be used to reweight the latent space so that morphologic concepts become linearly separable or discoverable without supervision.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinctionreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CLEAR-HPV’s concept-fraction vectors preserve the predictive information of the original MIL embeddings while reducing the high-dimensional feature space (e.g., 1536 dimensions) to only 10 interpretable concepts.
-
IndisputableMonolith/Cost/FunctionalEquationwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Operating in an attention-weighted latent space, CLEAR-HPV automatically discovers keratinizing, basaloid, and stromal morphologic concepts
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.