Lightweight SAR Ship Detection via Contrastive Distillation
Pith reviewed 2026-06-29 12:40 UTC · model grok-4.3
The pith
Contrastive distillation transfers relational geometry from teacher to student detectors for improved SAR ship detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The SURGE framework transfers relational geometry from a powerful teacher detector to a compact student detector using a contrastive InfoNCE objective in a shared projection embedding space, providing a common region-level distillation interface that works across detector architectures without modification and delivers up to 6.2 mAP and 8.0 AP75 gains on the SSDD and HRSID benchmarks, sometimes surpassing the teacher.
What carries the argument
Contrastive InfoNCE objective in a shared projection embedding space that captures and transfers geometric relationships among object representations at the region level.
If this is right
- Two-stage detectors achieve up to 6.2 mAP and 8.0 AP75 gains over the baseline student on SSDD and HRSID.
- Student models can exceed the performance of the teacher detector.
- The same framework applies to one-stage and transformer-based detectors without any architecture changes.
- It provides the first transformer-based knowledge distillation approach for SAR ship detection.
Where Pith is reading between the lines
- The relational focus could apply to other remote-sensing detection tasks where spatial structure between objects matters more than local appearance.
- Deployment on edge hardware for real-time SAR monitoring becomes more feasible if the gains hold across additional datasets.
- Combining the contrastive interface with model compression techniques might yield further efficiency without losing relational accuracy.
Load-bearing premise
That geometric relationships among object representations are neglected by standard feature or logit matching and can be transferred successfully from teacher to student via contrastive loss across different detector architectures.
What would settle it
If experiments on the SSDD or HRSID benchmarks show that adding the contrastive distillation produces no mAP or AP75 gains over the baseline student or fails to allow the student to surpass the teacher, the central claim would be falsified.
Figures
read the original abstract
Deep convolutional and transformer-based detectors achieve strong performance for SAR ship detection but are often computationally prohibitive for real-time or onboard deployment. Lightweight models offer improved efficiency yet struggle to capture the complex structural relationships inherent in SAR backscatter. Most existing SAR knowledge-distillation approaches rely on feature or logit matching, which enforces localized activation similarity while neglecting the geometric relationships among object representations. We propose a Structured Unified Relational knowledGE distillation framework for SAR Ship detection (SURGE) that transfers relational geometry from a powerful teacher detector to a compact student detector using a contrastive InfoNCE objective in a shared projection embedding space. To the best of our knowledge, this work presents the first transformer-based SAR ship detector knowledge distillation framework in SAR domain. The framework is architecture-agnostic in the sense that it provides a common region-level distillation interface for two-stage, one-stage and transformer-based detectors without modifying their deployed architectures. Experiments on the SSDD and HRSID benchmarks demonstrate that the proposed method yields substantial improvements for two-stage detectors, achieving up to 6.2 mAP and 8.0 AP75 gains over baseline student and even surpassing teacher performance
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SURGE, a Structured Unified Relational knowledGE distillation framework for SAR ship detection that employs a contrastive InfoNCE objective in a shared projection space to transfer relational geometry from a teacher detector to a lightweight student. It positions the method as architecture-agnostic (supporting two-stage, one-stage, and transformer detectors without architectural changes) and the first transformer-based KD approach in the SAR domain. Experiments on SSDD and HRSID are claimed to yield up to 6.2 mAP and 8.0 AP75 gains over the baseline student, sometimes surpassing the teacher.
Significance. If the reported gains prove robust and attributable to the contrastive relational transfer, the framework could provide a practical, general-purpose distillation interface for deploying efficient SAR detectors while preserving geometric object relationships that standard feature or logit matching overlooks.
major comments (2)
- [Experiments] The central performance claims (up to 6.2 mAP / 8.0 AP75 gains and student > teacher) rest on the premise that InfoNCE contrastive distillation transfers geometric relationships better than standard KD; however, no controlled ablation holding optimizer, schedule, augmentations, and training duration fixed while swapping only the distillation objective is reported, leaving open the possibility that gains arise from training differences rather than relational transfer.
- [Abstract and Experiments] The abstract and experimental description supply no dataset splits, baseline implementations, statistical tests, error bars, or variance across runs, which is load-bearing for assessing whether the reported improvements on SSDD/HRSID are reliable or reproducible.
minor comments (2)
- [Abstract] The claim of being 'architecture-agnostic' and providing a 'common region-level distillation interface' for one-stage and transformer detectors is stated but not demonstrated with results beyond two-stage detectors.
- [Method] Notation for the shared projection embedding space and the precise form of the InfoNCE loss (including temperature and negative sampling strategy) should be formalized with equations for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below and will revise the manuscript accordingly to strengthen the experimental evidence.
read point-by-point responses
-
Referee: [Experiments] The central performance claims (up to 6.2 mAP / 8.0 AP75 gains and student > teacher) rest on the premise that InfoNCE contrastive distillation transfers geometric relationships better than standard KD; however, no controlled ablation holding optimizer, schedule, augmentations, and training duration fixed while swapping only the distillation objective is reported, leaving open the possibility that gains arise from training differences rather than relational transfer.
Authors: We agree that a controlled ablation isolating the distillation objective is necessary to attribute gains specifically to the contrastive relational transfer rather than other training factors. In the revised manuscript we will add such an ablation: all training hyperparameters (optimizer, schedule, augmentations, epochs) will be held fixed while only the distillation loss is swapped (standard feature/logit KD versus the proposed InfoNCE relational objective). This will directly test whether the relational geometry transfer is the source of the observed improvements. revision: yes
-
Referee: [Abstract and Experiments] The abstract and experimental description supply no dataset splits, baseline implementations, statistical tests, error bars, or variance across runs, which is load-bearing for assessing whether the reported improvements on SSDD/HRSID are reliable or reproducible.
Authors: We acknowledge that the current manuscript lacks these details. The revised version will explicitly state the train/test splits for SSDD and HRSID, provide implementation details for all baselines (including any re-implementation choices), and report mean performance with standard deviation across at least three independent runs together with statistical significance tests (e.g., paired t-test) to quantify reproducibility. revision: yes
Circularity Check
No circularity; empirical distillation framework with benchmark validation
full rationale
The paper presents an empirical knowledge-distillation method (SURGE) using a contrastive InfoNCE loss for SAR ship detection and reports performance gains on SSDD/HRSID benchmarks. No derivation chain, uniqueness theorem, fitted-parameter prediction, or self-citation load-bearing step is present; the central claims rest on experimental results rather than any reduction of outputs to inputs by construction. The architecture-agnostic framing and 'first transformer-based' claim are statements of scope, not circular logic.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Contrastive InfoNCE objective in shared embedding space transfers relational geometry effectively
Reference graph
Works this paper leans on
-
[1]
Carion et al. 2020. End-to-end object detection with transformers. In European conference on computer vision. Springer, 213–229
2020
-
[2]
Chen et al . 2017. Learning efficient object detection models with knowledge distillation.Advances in neural information processing systems30 (2017)
2017
-
[3]
Feng et al. 2023. OEGR-DETR: A novel detection transformer based on orientation enhancement and group relations for SAR object detection. Remote Sensing16, 1 (2023), 106
2023
-
[4]
Gao et al . 2022. RetinaNet-based compact polarization SAR ship detection.IEEE Journal on Miniaturization for Air and Space Systems 3, 3 (2022), 146–152
2022
-
[5]
Girshick et al. 2015. Fast r-cnn. InProceedings of the IEEE international conference on computer vision. 1440–1448
2015
-
[6]
Han et al. 2024. Improving sar automatic target recognition via trusted knowledge distillation from simulated data.IEEE Transactions on Geoscience and Remote Sensing62 (2024), 1–14
2024
-
[7]
Lang et al. 2025. Recent Advances in Deep Learning Based SAR Image Targets Detection and Recognition.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing(2025)
2025
-
[8]
Lee et al. 2021. Privileged knowledge distillation for SAR building extraction. In2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS. IEEE, 3014–3017
2021
-
[9]
Li et al. 2017. Ship detection in SAR images based on an improved faster R-CNN. In2017 SAR in Big Data Era: Models, Methods and Applications (BIGSARDATA). IEEE, 1–6
2017
-
[10]
Lin et al. 2017. Focal Loss for Dense Object Detection. InProceedings of the IEEE International Conference on Computer Vision (ICCV)
2017
-
[11]
Min et al. 2019. A gradually distilled CNN for SAR target recognition. IEEE access7 (2019), 42190–42200
2019
- [12]
-
[13]
Park et al. 2019. Relational knowledge distillation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition. 3967–3976
2019
- [14]
-
[15]
Tung et al . 2019. Similarity-preserving knowledge distillation. In Proceedings of the IEEE/CVF international conference on computer vision. 1365–1374
2019
-
[16]
Wang et al. 2021. Boosting lightweight CNNs through network prun- ing and knowledge distillation for SAR target recognition.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing14 (2021), 8386–8397
2021
-
[17]
Wang et al. 2021. SAR Target Classification Based on Knowledge Distillation. In2021 CIE International Conference on Radar (Radar). IEEE, 2095–2098
2021
-
[18]
Wang et al. 2024. M-FSDistill: A feature map knowledge distillation algorithm for SAR ship detection.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing(2024)
2024
-
[19]
Wu et al. 2018. Unsupervised feature learning via non-parametric instance discrimination. InProceedings of the IEEE conference on com- puter vision and pattern recognition. 3733–3742
2018
-
[20]
Yang et al. 2022. Masked generative distillation. InEuropean Conference on Computer Vision. Springer, 53–69
2022
-
[21]
Yang et al. 2022. SAR target recognition based on inverted residual and knowledge distillation. InInternational Conference on Advanced Algorithms and Neural Networks (AANN 2022), Vol. 12285. SPIE, 210– 217
2022
-
[22]
Yin et al. 2025. Ship detection transformer in SAR images based on key scattering points feature aggregation and context feature refine- ment.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing(2025). Lightweight SAR Ship Detection via Contrastive Distillation
2025
-
[23]
Yu et al. 2023. Multilevel adaptive knowledge distillation network for incremental SAR target recognition.IEEE Geoscience and Remote Sensing Letters20 (2023), 1–5
2023
-
[24]
Zhu et al. 2021. Deep learning meets SAR: Concepts, models, pitfalls, and perspectives.IEEE Geoscience and Remote Sensing Magazine9, 4 (2021), 143–172
2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.