When Fine-Tuning Changes the Evidence: Architecture-Dependent Semantic Drift in Chest X-Ray Explanations

Daniel Ting; Kabilan Elangovan

arxiv: 2604.08513 · v1 · submitted 2026-04-09 · 💻 cs.CV

When Fine-Tuning Changes the Evidence: Architecture-Dependent Semantic Drift in Chest X-Ray Explanations

Kabilan Elangovan , Daniel Ting This is my paper

Pith reviewed 2026-05-10 18:01 UTC · model grok-4.3

classification 💻 cs.CV

keywords semantic driftattribution mapsfine-tuningchest X-rayexplanation stabilityGradCAMLayerCAMtransfer learning

0 comments

The pith

Fine-tuning can reorganize the visual evidence inside a chest X-ray classifier even when diagnostic accuracy stays high, and the reorganization pattern depends on both the model architecture and the explanation method used.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines what happens to a model's supporting evidence when a pre-trained network is fine-tuned on a five-class chest X-ray task. It defines semantic drift as the systematic shift in attribution maps between the transfer-learning stage and the full fine-tuning stage, even when classification performance converges. Across DenseNet201, ResNet50V2, and InceptionV3, coarse anatomical regions remain roughly consistent, yet finer overlap metrics show architecture-specific rearrangements of the evidence. The same stability ordering can reverse when switching from LayerCAM to GradCAM++ under identical converged accuracy. The work therefore treats explanation stability as an interaction effect among architecture, training phase, and attribution technique rather than a fixed property of the learned decision.

Core claim

In a controlled two-stage protocol on a five-class chest X-ray dataset, transfer learning followed by fine-tuning produces systematic changes in the spatial structure of attribution maps despite stable predictive accuracy. Coarse localization of anatomy is preserved across architectures, but reference-free overlap and structural-consistency metrics reveal pronounced, architecture-dependent reorganization of the evidential support. Stability rankings between models reverse when the attribution method is changed from LayerCAM to GradCAM++, demonstrating that explanation stability is not an intrinsic model property but an interaction between architecture, optimization stage, and attribution cue

What carries the argument

Semantic drift, defined as the systematic reorganization of attribution structure between transfer-learning and fine-tuning stages, quantified by reference-free spatial overlap and consistency metrics.

If this is right

Model selection for clinical deployment must consider not only final accuracy but also the stability of the supporting evidence across training phases.
When two architectures reach similar accuracy, the one with lower semantic drift may preserve more consistent visual reasoning for downstream review.
Switching explanation methods can change which architecture appears more stable, so single-method audits are insufficient.
Coarse anatomical localization can remain reliable even when finer evidential structure drifts, suggesting a hierarchy of explanation reliability.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Clinics that retrain models periodically may need to re-verify that the new evidence regions align with radiologist expectations even if accuracy metrics look unchanged.
If semantic drift correlates with particular disease classes that have overlapping visual features, targeted regularization during fine-tuning could reduce it.
The reversal of stability rankings across attribution methods implies that future benchmarks should report results from multiple explanation techniques by default.

Load-bearing premise

That observed changes in attribution maps between training stages reflect genuine shifts in the model's visual reasoning rather than artifacts created by the choice of attribution method or the particular overlap metrics.

What would settle it

A controlled experiment in which the same converged models are re-evaluated with an entirely different attribution technique (for example, integrated gradients) and the architecture-dependent reversal in stability rankings disappears.

Figures

Figures reproduced from arXiv: 2604.08513 by Daniel Ting, Kabilan Elangovan.

**Figure 1.** Figure 1: Epoch selection justification. Epoch 8 marks the transfer learning plateau; epoch 19 represents fine-tuning convergence [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Method-dependent stability rankings reveal architectural reversal. Left (LayerCAM): InceptionV3 (IoU=0.777) outperforms DenseNet201 (0.699) and ResNet50V2 (0.519). Right (Grad-CAM++): DenseNet201 (0.690) leads InceptionV3 (0.643) and ResNet50V2 (0.383) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: DenseNet201 cross-method comparison: LayerCAM (top) vs Grad-CAM++ (bottom). Dense connectivity yields coherent explanation refinement across training phases with limited cross-method divergence in salient structure. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Transfer learning followed by fine-tuning is widely adopted in medical image classification due to consistent gains in diagnostic performance. However, in multi-class settings with overlapping visual features, improvements in accuracy do not guarantee stability of the visual evidence used to support predictions. We define semantic drift as systematic changes in the attribution structure supporting a model's predictions between transfer learning and full fine-tuning, reflecting potential shifts in underlying visual reasoning despite stable classification performance. Using a five-class chest X-ray task, we evaluate DenseNet201, ResNet50V2, and InceptionV3 under a two-stage training protocol and quantify drift with reference-free metrics capturing spatial localization and structural consistency of attribution maps. Across architectures, coarse anatomical localization remains stable, while overlap IoU reveals pronounced architecture-dependent reorganization of evidential structure. Beyond single-method analysis, stability rankings can reverse across LayerCAM and GradCAM++ under converged predictive performance, establishing explanation stability as an interaction between architecture, optimization phase, and attribution objective.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Fine-tuning shifts CAM attributions in chest X-ray models in architecture-dependent ways, but the shifts may largely trace to how the explanation methods respond to weight changes rather than to new visual reasoning.

read the letter

The paper's core observation is that after fine-tuning DenseNet201, ResNet50V2, and InceptionV3 on a five-class chest X-ray task, the overlap IoU between attribution maps from transfer learning and full fine-tuning varies by architecture, while coarse localization stays roughly the same. They also note that which method looks more stable flips between LayerCAM and GradCAM++ once accuracy converges. That reversal is the piece not already in the standard XAI-in-medicine literature, and it comes from a clean two-stage protocol on common backbones with reference-free metrics, so the setup is easy to reproduce and directly relevant to anyone who fine-tunes diagnostic models and then trusts the saliency maps.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an empirical study on semantic drift in chest X-ray classification models. It defines semantic drift as systematic changes in attribution structure between transfer learning and full fine-tuning stages, despite stable classification performance. Using DenseNet201, ResNet50V2, and InceptionV3 on a five-class task, the authors quantify drift via reference-free metrics for spatial localization and structural consistency (overlap IoU). The central claims are that coarse anatomical localization remains stable across architectures while IoU reveals pronounced architecture-dependent reorganization, and that stability rankings reverse between LayerCAM and GradCAM++ under converged accuracy.

Significance. If the attribution changes genuinely reflect shifts in visual reasoning, the work would usefully demonstrate that accuracy gains from fine-tuning do not guarantee explanation stability, with implications for trustworthy medical AI. The multi-architecture, multi-method design is a strength, as it reveals interactions between model choice, training phase, and attribution objective. The reference-free metrics are practical for this setting. However, without validation against alternative explanation techniques, the significance for claims about underlying model reasoning is limited.

major comments (2)

[Abstract] Abstract: The claim that overlap IoU reveals 'pronounced architecture-dependent reorganization of evidential structure' reflecting 'shifts in underlying visual reasoning' is load-bearing for the central claim. This interpretation assumes LayerCAM and GradCAM++ faithfully capture decision processes pre- and post-fine-tuning. The reported reversal of stability rankings between the two methods under converged predictive performance indicates that measured drift is at least partly an interaction with the attribution methods' sensitivities to weight/gradient changes during optimization, rather than an invariant model property. Cross-validation with perturbation-based explanations is needed to attribute IoU differences to visual reasoning changes.
[Results] Results section: Support for 'pronounced' reorganization remains qualitative, as the experimental protocol lacks quantitative tables, statistical tests, variance across runs, or full data exclusion details. This undermines assessment of the magnitude and reliability of architecture-dependent IoU effects and the cross-period claims.

minor comments (2)

[Methods] Clarify the precise formulas and implementation details for the overlap IoU and localization stability metrics in the methods section to support reproducibility.
Figure captions should explicitly state the number of samples, runs, and any preprocessing steps used to generate the reported attribution maps.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. The comments help clarify the scope and limitations of our claims regarding semantic drift. We address each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [Abstract] The claim that overlap IoU reveals 'pronounced architecture-dependent reorganization of evidential structure' reflecting 'shifts in underlying visual reasoning' is load-bearing for the central claim. This interpretation assumes LayerCAM and GradCAM++ faithfully capture decision processes pre- and post-fine-tuning. The reported reversal of stability rankings between the two methods under converged predictive performance indicates that measured drift is at least partly an interaction with the attribution methods' sensitivities to weight/gradient changes during optimization, rather than an invariant model property. Cross-validation with perturbation-based explanations is needed to attribute IoU differences to visual reasoning changes.

Authors: We agree that the observed reversal of stability rankings demonstrates sensitivity to the choice of attribution method and that this interaction is central to our findings. The manuscript explicitly presents this reversal as evidence that explanation stability is architecture- and method-dependent rather than an invariant property of the underlying model. While gradient-based methods have known limitations in faithfully representing decision processes, the consistent detection of architecture-specific IoU changes across both LayerCAM and GradCAM++ supports that the reorganization reflects genuine shifts in evidential structure. We have added a dedicated limitations paragraph in the revised Discussion section that acknowledges these method-specific sensitivities and explicitly recommends perturbation-based techniques (e.g., occlusion analysis) for future validation studies. We maintain that the current multi-architecture, multi-method results already provide actionable insights for medical AI without requiring new experiments in this revision. revision: partial
Referee: [Results] Support for 'pronounced' reorganization remains qualitative, as the experimental protocol lacks quantitative tables, statistical tests, variance across runs, or full data exclusion details. This undermines assessment of the magnitude and reliability of architecture-dependent IoU effects and the cross-period claims.

Authors: We accept this assessment and have strengthened the quantitative presentation. The revised Results section now includes a new table reporting mean IoU values with standard deviations computed over three independent training runs for each architecture, explanation method, and training stage. We have added Wilcoxon signed-rank tests comparing IoU between transfer-learning and fine-tuning stages, with p-values and effect sizes reported. The Methods section has been expanded with complete data exclusion criteria (including exact counts of images removed for quality or annotation issues) and the full preprocessing pipeline. These additions enable direct evaluation of effect magnitude and reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical observational study

full rationale

The paper defines semantic drift descriptively and quantifies it via direct computation of reference-free metrics (spatial localization, structural consistency, overlap IoU) on attribution maps produced by standard LayerCAM and GradCAM++ methods applied to models before and after fine-tuning. No equations derive a target quantity from fitted parameters, no predictions reduce to inputs by construction, and no self-citations serve as load-bearing uniqueness theorems or ansatzes. All reported findings (architecture-dependent IoU changes, stability ranking reversals) are observational outcomes from the computed maps, not tautological reductions. This matches the default expectation for non-circular empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations or new postulated entities; the work is an empirical comparison relying on standard computer vision assumptions about attribution methods reflecting model focus.

pith-pipeline@v0.9.0 · 5468 in / 1097 out tokens · 25312 ms · 2026-05-10T18:01:04.090620+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross

arXiv:2206.01254. Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. InInterna- tional Conference on Learning Representations (ICLR), 2018. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Gra...

work page arXiv 2018
[2]

Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming- Ming Cheng, and Yunchao Wei

IEEE, 2018. Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming- Ming Cheng, and Yunchao Wei. LayerCAM: Explor- ing hierarchical class activation maps for localization. IEEE Transactions on Image Processing, 30:5875–5888, 2021. Adriel Saporta, Xiaotong Gui, Ashwin Agrawal, Anuj Pa- reek, Steven QH Truong, Chanh DT Nguyen, Van-Doan Ngo, Jayne Seekins, Francis...

work page 2018

[1] [1]

Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross

arXiv:2206.01254. Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. InInterna- tional Conference on Learning Representations (ICLR), 2018. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Gra...

work page arXiv 2018

[2] [2]

Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming- Ming Cheng, and Yunchao Wei

IEEE, 2018. Peng-Tao Jiang, Chang-Bin Zhang, Qibin Hou, Ming- Ming Cheng, and Yunchao Wei. LayerCAM: Explor- ing hierarchical class activation maps for localization. IEEE Transactions on Image Processing, 30:5875–5888, 2021. Adriel Saporta, Xiaotong Gui, Ashwin Agrawal, Anuj Pa- reek, Steven QH Truong, Chanh DT Nguyen, Van-Doan Ngo, Jayne Seekins, Francis...

work page 2018