CLEAR-HPV: Interpretable concept discovery for human-papillomavirus-associated morphology in whole-slide histology

Hao Wang; Shiwei Tan; Weiyi Qin; Yingci Liu-Swetz

arxiv: 2602.05126 · v3 · pith:PCN5GDEOnew · submitted 2026-02-04 · 💻 cs.CV

CLEAR-HPV: Interpretable concept discovery for human-papillomavirus-associated morphology in whole-slide histology

Weiyi Qin , Yingci Liu-Swetz , Shiwei Tan , Hao Wang This is my paper

Pith reviewed 2026-05-21 13:05 UTC · model grok-4.3

classification 💻 cs.CV

keywords HPVhistopathologyconcept discoverymultiple instance learningwhole-slide imaginginterpretabilityhead and neck cancercervical cancer

0 comments

The pith

CLEAR-HPV restructures attention-based MIL to discover 10 interpretable morphologic concepts for HPV status while preserving predictive accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that attention scores can guide the creation of a latent space in MIL models where morphologic concepts like keratinizing, basaloid, and stromal patterns emerge automatically for HPV-associated cancers in head and neck and cervical tissues. This matters because current attention-based approaches predict HPV status effectively from whole-slide images but offer little insight into the specific tissue features driving those predictions. By producing compact concept-fraction vectors and spatial maps, CLEAR-HPV adds interpretability in a backbone-agnostic way that works across different datasets. A reader would care if this enables better clinical understanding and trust in AI-assisted pathology without sacrificing performance.

Core claim

CLEAR-HPV operates in an attention-weighted latent space to automatically discover keratinizing, basaloid, and stromal morphologic concepts, generates spatial concept maps, and represents each slide using a compact concept-fraction vector. These vectors preserve the predictive information of the original MIL embeddings while reducing the feature space from 1536 dimensions to 10 interpretable concepts. The approach generalizes consistently across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC datasets.

What carries the argument

The attention-weighted latent space, which repurposes attention scores from a MIL backbone to automatically extract meaningful morphologic concepts without supervision.

If this is right

Concept-fraction vectors retain the predictive power of high-dimensional embeddings for HPV classification.
Spatial concept maps provide visual interpretability of morphologic features.
The framework applies to multiple cancer types and datasets without requiring concept labels.
Ten concepts suffice to capture HPV-related morphology in whole-slide images.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Pathologists could use the concept maps to verify model decisions against known HPV-associated features.
The reduced dimensionality might facilitate combining these representations with genomic or clinical data.
Similar attention-based restructuring could apply to other molecular subtypes in pathology.

Load-bearing premise

Attention scores from a standard MIL backbone can be directly used to define a latent space where meaningful morphologic concepts appear automatically without any concept supervision.

What would settle it

If the concept-fraction vectors yield substantially lower accuracy or AUC in predicting HPV status compared to the original MIL embeddings on an independent test set like CPTAC-HNSCC.

Figures

Figures reproduced from arXiv: 2602.05126 by Hao Wang, Shiwei Tan, Weiyi Qin, Yingci Liu-Swetz.

**Figure 1.** Figure 1: Overview of the CLEAR-HPV framework. (A) Data processing pipeline: WSIs are decomposed into fixed-size tiles, encoded with a pretrained ViT or CNN, and converted into patchlevel feature embeddings. (B) An attention-based MIL classifier projects embeddings into the h-space latent representation and uses multi-head attention to compute tile-level contributions, which are pooled into a single slide-level em… view at source ↗

**Figure 2.** Figure 2: Recovery score relative to the interpreted MIL model (CLAM) across ACC, AUC, F1, Precision, Recall (i.e., sensitivity), and Specificity. For each method, the Euclidean distance d between its metric vector (i.e., concatenation of Accuracy, AUC, etc.) and the interpreted model’s is computed and converted to a similarity score s = 1 1+d . Higher scores indicate closer agreement with CLAM [PITH_FULL_IMAGE:f… view at source ↗

**Figure 3.** Figure 3: Class-averaged concept-fraction vectors across concept-discovery settings on TCGA-HNSCC. Concept-fraction vectors are computed per slide as the fraction of tiles assigned to each discovered concept, optionally weighted by MIL attention. These slide-level vectors are then averaged within each group to obtain class-averaged profiles that summarize cohort-level morphologic composition and highlight difference… view at source ↗

**Figure 4.** Figure 4: Top tiles for key concepts discovered by CLEAR-HPV (A) and the corresponding slide-level distributions in the dataset TCGA-HNSCC (B). (A) Top (representative) tiles for five CLEAR-HPV concepts chosen for their consistent appearance and clear morphologic identity: C5 (basaloid squamous epithelium), C7 (keratinizing squamous epithelium), C9 (fibrous stroma), C4 (connective stroma), and C2 (inflammatory cells… view at source ↗

**Figure 5.** Figure 5: Visualization of attention-weighted concept discovery using CLEAR-HPV. (A) For representative HPV-positive and HPV-negative WSIs from TCGA-HNSCC, we show, in four columns: (i) the original H&E whole slide image, (ii) the h-space spatial concept map, (iii) the high-attention spatial concept map, and (iv) regions of interest (ROIs) with their corresponding concept-fraction distributions produced by our CLEA… view at source ↗

**Figure 6.** Figure 6: Cross-cohort consistency of HPV-related concepts among top-8 tiles. We show representative high-attention tiles for the HPV-positive-related “basaloid” concept C5 and the “keratinizing” concept C7 from two external cohorts, TCGA-CESC (top) and CPTAC-HNSCC (bottom). Across both datasets, C5 consistently reflects basaloid morphology characteristic of HPV-positive tumors, while C7 reflects keratinizing morph… view at source ↗

read the original abstract

Human papillomavirus (HPV) status is a critical determinant of prognosis and treatment response in head and neck and cervical cancers. Although attention-based multiple instance learning (MIL) achieves strong slide-level prediction for HPV-related whole-slide histopathology, it provides limited morphologic interpretability. To address this limitation, we introduce Concept-Level Explainable Attention-guided Representation for HPV (CLEAR-HPV), a framework that restructures the MIL latent space using attention to enable concept discovery without requiring concept labels during training. Operating in an attention-weighted latent space, CLEAR-HPV automatically discovers keratinizing, basaloid, and stromal morphologic concepts, generates spatial concept maps, and represents each slide using a compact concept-fraction vector. CLEAR-HPV's concept-fraction vectors preserve the predictive information of the original MIL embeddings while reducing the high-dimensional feature space (e.g., 1536 dimensions) to only 10 interpretable concepts. CLEAR-HPV generalizes consistently across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, providing compact, concept-level interpretability through a general, backbone-agnostic framework for attention-based MIL models of whole-slide histopathology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CLEAR-HPV adds unsupervised concept discovery on top of attention MIL for HPV slides and claims the 10-dim fraction vectors keep the original predictive power, but that preservation step still needs direct verification.

read the letter

CLEAR-HPV gives a practical way to extract a handful of morphologic concepts from attention MIL models for HPV status in head and neck and cervical slides. The framework turns the attention scores into a way to discover concepts like keratinizing and basaloid areas and then represents slides as simple fraction vectors over those concepts. It also produces spatial concept maps along the way. The main thing to know is that this runs without concept labels at training time and is meant to sit on top of existing MIL backbones. The new part is restructuring the latent space with attention to do this discovery without any concept labels during training. It produces spatial maps and keeps the setup backbone-agnostic. That combination is not standard yet in the MIL pathology literature. It also shows consistent behavior on TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, which is a plus for a methods paper. The work does a decent job highlighting the interpretability problem in these models and offering a compact representation that could be easier for clinicians to look at. The main soft spot is the preservation claim. The paper says the 10-concept vectors keep the predictive information from the original 1536-dim embeddings, but without a direct test like AUC comparison or linear probe accuracy between the two representations, it is hard to judge how complete that transfer really is. If attention weights favor only the most obvious patches, subtler HPV-related features might get lost in the fractions. That needs explicit checking to make the central result convincing. The assumption that meaningful concepts emerge automatically from the attention-weighted space is reasonable to try but could depend on how well the MIL backbone already separates the relevant morphology. This paper is for computational pathologists and ML researchers focused on explainable models for whole-slide images. Someone looking for ways to add concept-level understanding to existing attention pipelines would get something out of it. The idea is solid enough on paper to go to a serious referee, though the results section will need to address the preservation question head-on. I would send it to peer review.

Referee Report

3 major / 3 minor

Summary. The paper introduces CLEAR-HPV, a framework that restructures the latent space of attention-based multiple instance learning (MIL) models for HPV status prediction in whole-slide histopathology. Without requiring concept labels, it automatically discovers morphologic concepts (e.g., keratinizing, basaloid, stromal), generates spatial concept maps, and represents each slide by a compact 10-dimensional concept-fraction vector. The central claim is that these vectors preserve the predictive information of the original high-dimensional (e.g., 1536-dim) MIL embeddings while providing interpretability and generalizing across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC datasets.

Significance. If the preservation of predictive power and the automatic emergence of meaningful concepts are rigorously demonstrated, this would offer a backbone-agnostic route to concept-level interpretability in attention-based MIL for digital pathology. The dimensionality reduction to 10 concepts could improve clinical trust and enable discovery of HPV-associated morphologies, addressing a recognized limitation of black-box slide-level predictors.

major comments (3)

[Abstract / Results] Abstract and Results section: the claim that concept-fraction vectors 'preserve the predictive information of the original MIL embeddings' is load-bearing for the contribution, yet no side-by-side quantitative comparison (AUC-ROC, linear-probe accuracy, or mutual information) between the 1536-dim embeddings and the 10-dim vectors is referenced. Without this, the reduction could be lossy if attention weights preferentially emphasize obvious keratinizing patches while attenuating subtler basaloid or stromal HPV signal.
[Methods] Methods, attention-weighted latent space construction: the assumption that standard MIL attention scores can directly induce a latent space in which unsupervised concept discovery yields robust, non-redundant morphologic concepts requires explicit validation. If attention is dominated by high-scoring patches, the resulting concept fractions may omit predictive but low-attention regions; an ablation replacing attention weights with uniform or random sampling would test this.
[Experiments] Experiments, cross-dataset generalization: the manuscript asserts consistent generalization across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, but does not report per-dataset AUCs or statistical tests for the concept-fraction vectors. This leaves open whether performance parity holds or whether dataset-specific biases in attention maps drive the apparent consistency.

minor comments (3)

[Methods] Clarify the exact procedure for deriving the 10 concepts (e.g., clustering method, number of clusters chosen, or post-hoc labeling) and whether any hyper-parameters are tuned on HPV labels.
[Figures] Figure captions for spatial concept maps should include scale bars, attention threshold values, and a legend mapping colors to the 10 discovered concepts.
[Related Work] Add a brief comparison to prior unsupervised concept-discovery methods in MIL (e.g., those using prototype learning or post-hoc concept activation vectors) to situate the contribution.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment point by point below and have revised the manuscript to strengthen the supporting evidence for our claims.

read point-by-point responses

Referee: [Abstract / Results] Abstract and Results section: the claim that concept-fraction vectors 'preserve the predictive information of the original MIL embeddings' is load-bearing for the contribution, yet no side-by-side quantitative comparison (AUC-ROC, linear-probe accuracy, or mutual information) between the 1536-dim embeddings and the 10-dim vectors is referenced. Without this, the reduction could be lossy if attention weights preferentially emphasize obvious keratinizing patches while attenuating subtler basaloid or stromal HPV signal.

Authors: We agree that a direct quantitative comparison is required to support this central claim. In the revised manuscript we have added a dedicated Results subsection with side-by-side evaluations: AUC-ROC and linear-probe accuracy of a downstream classifier trained on the original 1536-dimensional embeddings versus the 10-dimensional concept-fraction vectors, together with mutual-information analysis between the two representations. These new results show that the concept vectors retain the large majority of predictive performance while confirming that attention weighting does not systematically suppress subtler HPV-associated signals. revision: yes
Referee: [Methods] Methods, attention-weighted latent space construction: the assumption that standard MIL attention scores can directly induce a latent space in which unsupervised concept discovery yields robust, non-redundant morphologic concepts requires explicit validation. If attention is dominated by high-scoring patches, the resulting concept fractions may omit predictive but low-attention regions; an ablation replacing attention weights with uniform or random sampling would test this.

Authors: We accept that explicit validation of the attention-weighting step is warranted. We have therefore added an ablation study (now reported in the Methods and supplementary material) in which concept discovery is repeated using uniform patch sampling and random patch sampling in place of attention weights. The ablation demonstrates that attention-weighted sampling yields more coherent, less redundant concepts and higher downstream predictive performance than either uniform or random baselines, thereby supporting the original design choice. revision: yes
Referee: [Experiments] Experiments, cross-dataset generalization: the manuscript asserts consistent generalization across TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC, but does not report per-dataset AUCs or statistical tests for the concept-fraction vectors. This leaves open whether performance parity holds or whether dataset-specific biases in attention maps drive the apparent consistency.

Authors: We have expanded the Experiments section to include per-dataset AUC-ROC values for the concept-fraction vectors on TCGA-HNSCC, TCGA-CESC, and CPTAC-HNSCC. We also report statistical comparisons (DeLong tests between AUCs and a one-way ANOVA across datasets) that confirm performance parity and indicate that dataset-specific attention biases do not drive the observed consistency. revision: yes

Circularity Check

0 steps flagged

No circularity: method adds interpretability layer without reducing claims to inputs by construction

full rationale

The derivation chain begins with a standard attention-based MIL backbone whose embeddings are restructured via attention weights to produce concept-fraction vectors. This restructuring is presented as an explicit algorithmic step that yields a lower-dimensional representation; the preservation of HPV-predictive information is asserted as an empirical outcome rather than being true by definition of the fractions themselves. No equations, fitted parameters, or self-citations are shown that would make the output equivalent to the input. The framework is backbone-agnostic and operates without concept supervision, keeping the central claim independent of the original 1536-dimensional embeddings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions from attention-based MIL literature plus the novel claim that attention weighting alone suffices for unsupervised concept emergence.

axioms (1)

domain assumption Attention scores from a pretrained MIL model can be used to reweight the latent space so that morphologic concepts become linearly separable or discoverable without supervision.
This premise is invoked when the abstract states that operating in the attention-weighted latent space enables automatic concept discovery.

pith-pipeline@v0.9.0 · 5759 in / 1236 out tokens · 46086 ms · 2026-05-21T13:05:51.791792+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

CLEAR-HPV’s concept-fraction vectors preserve the predictive information of the original MIL embeddings while reducing the high-dimensional feature space (e.g., 1536 dimensions) to only 10 interpretable concepts.
IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Operating in an attention-weighted latent space, CLEAR-HPV automatically discovers keratinizing, basaloid, and stromal morphologic concepts

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.