A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification
Pith reviewed 2026-05-13 19:48 UTC · model grok-4.3
The pith
Vision Transformers with focal loss and leakage-aware splitting set a SAR sea ice classification baseline that improves minority-class precision.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
A ViT-Large model trained with focal loss on full-resolution SAR patches, after leakage-aware stratified splitting and training-set normalization, reaches 69.6 percent held-out accuracy, 68.8 percent weighted F1 score, and 83.9 percent precision on the minority multi-year ice class, while delivering a cleaner precision-recall balance for rare ice types than either cross-entropy or weighted cross-entropy training on the same data.
What carries the argument
Leakage-aware stratified patch splitting of full-resolution Sentinel-1 scenes together with training-set normalization and focal-loss training of the ViT-Large model.
If this is right
- Focal loss produces a more useful precision-recall trade-off for rare ice classes than weighting the cross-entropy loss by class frequency.
- The data-handling pipeline of full-resolution inputs, leakage-aware splitting, and training-set normalization can be reused as a reference point for later fusion of SAR with optical, thermal, or meteorological inputs.
- The 83.9 percent precision on multi-year ice supplies a concrete numeric target that any improved method must beat on the same held-out split.
- Because the baseline avoids down-sampling, fine morphological details that distinguish ice types remain available to the model.
Where Pith is reading between the lines
- If the test distribution does match operational conditions, routine sea-ice maps could be generated with less dependence on new expert ice charts each season.
- The same emphasis on careful patch splitting and normalization could be tested on other remote-sensing classification tasks that also suffer from severe class imbalance.
- Higher precision on multi-year ice might allow climate models to track the persistence of older ice more reliably when the baseline is run over long time series.
Load-bearing premise
The held-out test patches created by the leakage-aware stratified splitting have a distribution close enough to future operational SAR scenes that the reported accuracy and precision numbers will translate to real-world performance.
What would settle it
Applying the same trained ViT-Large model to a fresh collection of SAR scenes from different seasons, sensors, or geographic regions and obtaining markedly lower accuracy or precision on the multi-year ice class than 83.9 percent.
Figures
read the original abstract
Accurate and automated sea ice classification is important for climate monitoring and maritime safety in the Arctic. While Synthetic Aperture Radar (SAR) is the operational standard because of its all-weather capability, it remains challenging to distinguish morphologically similar ice classes under severe class imbalance. Rather than claiming a fully validated multimodal system, this paper establishes a trustworthy SAR only baseline that future fusion work can build upon. Using the AI4Arctic/ASIP Sea Ice Dataset (v2), which contains 461 Sentinel-1 scenes matched with expert ice charts, we combine full-resolution Sentinel-1 Extra Wide inputs, leakage-aware stratified patch splitting, SIGRID-3 stage-of-development labels, and training-set normalization to evaluate Vision Transformer baselines. We compare ViT-Base models trained with cross entropy and weighted cross-entropy against a ViT-Large model trained with focal loss. Among the tested configurations, ViT-Large with focal loss achieves 69.6% held-out accuracy, 68.8% weighted F1, and 83.9% precision on the minority Multi-Year Ice class. These results show that focal-loss training offers a more useful precision-recall trade-off than weighted cross-entropy for rare ice classes and establishes a cleaner baseline for future multimodal fusion with optical, thermal, or meteorological data.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper establishes a data-centric SAR-only baseline for sea ice classification on the AI4Arctic/ASIP v2 dataset (461 Sentinel-1 scenes). It evaluates ViT-Base models with cross-entropy and weighted cross-entropy loss against a ViT-Large model with focal loss, using full-resolution Extra Wide inputs, leakage-aware stratified patch splitting, SIGRID-3 labels, and training-set normalization. The central empirical result is that ViT-Large + focal loss reaches 69.6% held-out accuracy, 68.8% weighted F1, and 83.9% precision on the minority Multi-Year Ice class, positioning the work as a reference for future multimodal fusion.
Significance. If the held-out metrics prove reliable, the paper supplies a concrete, reproducible baseline that quantifies the precision-recall trade-off achievable with focal loss on imbalanced SAR ice classes. Its strengths include the use of an external public dataset, explicit handling of leakage in patch splitting, and direct reporting of per-class precision on the rare Multi-Year Ice category, which future work can cite when adding optical or meteorological modalities.
major comments (2)
- [§3.2] §3.2 (Data Preparation): The leakage-aware stratified patch splitting is load-bearing for the claim that the 69.6% accuracy reflects generalization to new operational scenes, yet the manuscript provides no quantitative check (e.g., intra-scene incidence-angle variance or temporal adjacency statistics) on residual correlations that may remain after splitting. Without this, the held-out set may still permit interpolation rather than extrapolation.
- [Table 2] Table 2 (Results): The reported accuracy, F1, and per-class precision values are given as single point estimates with no error bars, standard deviations from repeated runs, or sensitivity analysis to hyperparameter choices, weakening the ability to judge whether the focal-loss advantage over weighted cross-entropy is statistically stable.
minor comments (2)
- [§4.1] §4.1: The description of training-set normalization could explicitly state whether the same statistics are applied to the validation and test patches or recomputed per split.
- [Figure 3] Figure 3: The confusion-matrix visualization would benefit from normalized row sums to make the per-class recall trade-offs immediately visible.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the presentation of our data-centric baseline. We address each major comment below and have revised the manuscript to incorporate additional analyses where feasible.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Data Preparation): The leakage-aware stratified patch splitting is load-bearing for the claim that the 69.6% accuracy reflects generalization to new operational scenes, yet the manuscript provides no quantitative check (e.g., intra-scene incidence-angle variance or temporal adjacency statistics) on residual correlations that may remain after splitting. Without this, the held-out set may still permit interpolation rather than extrapolation.
Authors: We agree that explicit quantitative checks on residual correlations would further support the generalization claim. Our leakage-aware split ensures no patches from the same Sentinel-1 scene appear in both training and test sets, with stratification by scene-level ice-class distribution. The original manuscript did not report incidence-angle variance or temporal adjacency statistics. In the revision we add a new paragraph in §3.2 together with a supplementary table showing the mean and standard deviation of incidence angles per split and the temporal separation (in days) between training and test scenes, confirming that test scenes are drawn from distinct acquisition periods. revision: yes
-
Referee: [Table 2] Table 2 (Results): The reported accuracy, F1, and per-class precision values are given as single point estimates with no error bars, standard deviations from repeated runs, or sensitivity analysis to hyperparameter choices, weakening the ability to judge whether the focal-loss advantage over weighted cross-entropy is statistically stable.
Authors: We acknowledge that single-point estimates limit assessment of stability. The reported numbers reflect a single training run with fixed random seed chosen for reproducibility. For the revision we have re-trained all three model configurations (ViT-Base CE, ViT-Base WCE, ViT-Large focal) with three independent seeds and will replace the point estimates in Table 2 with mean ± standard deviation. We also add a short paragraph in §4.2 reporting a one-dimensional sensitivity sweep over the focal-loss γ parameter (γ ∈ {1,2,3}) to demonstrate that the precision advantage on Multi-Year Ice remains consistent. revision: yes
Circularity Check
No significant circularity in empirical ViT baseline evaluation
full rationale
The paper conducts a purely empirical evaluation of Vision Transformer models (ViT-Base and ViT-Large) on the external public AI4Arctic/ASIP Sea Ice Dataset v2. It applies leakage-aware stratified patch splitting, SIGRID-3 labels, and training-set normalization, then reports direct held-out metrics such as 69.6% accuracy, 68.8% weighted F1, and 83.9% precision on the minority class for the ViT-Large + focal loss configuration. No equations, derivations, or self-citations reduce these metrics to quantities defined by fitted parameters within the paper. The central claims rest on standard ML training and testing procedures against an independent dataset split, with no self-definitional loops, fitted-input predictions, or load-bearing self-citations. The work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Satellite sar data-based sea ice classification: An overview,
N. Zakhvatkina, V . Smirnov, and I. Bychkova, “Satellite sar data-based sea ice classification: An overview,”Geosciences, vol. 9, no. 4, p. 152, 2019
work page 2019
-
[2]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[3]
Transformers in remote sensing: A survey,
A. A. Aleissaeeet al., “Transformers in remote sensing: A survey,” Remote Sensing, vol. 15, no. 7, p. 1860, 2023
work page 2023
-
[4]
J. Zhang, W. Zhang, X. Zhou, Q. Chu, X. Yin, G. Li, X. Dai, S. Hu, and F. Jin, “Cnn and transformer fusion network for sea ice classification using gaofen-3 polarimetric sar images,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, 2024
work page 2024
-
[5]
Use of vision transformer to classify sea surface phenomena in sar imagery,
J. Xia, R. Romeiser, W. Zhang, and T. ¨Ozg¨okmen, “Use of vision transformer to classify sea surface phenomena in sar imagery,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 10 937–10 950, 2025
work page 2025
-
[6]
Sea ice classification with dual-polarized sar imagery: a hierarchical pipeline,
X. Chen, K. A. Scott, M. Jiang, Y . Fang, L. Xu, and D. A. Clausi, “Sea ice classification with dual-polarized sar imagery: a hierarchical pipeline,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2023, pp. 224– 232
work page 2023
-
[7]
Monitoring polar sea ice using optical and sar data,
H. Sun, C. Li, and Y . Cheng, “Monitoring polar sea ice using optical and sar data,”Marine Technology Society Journal, vol. 53, no. 6, pp. 35–41, 2019
work page 2019
-
[8]
Fusion of sar and optical image for sea ice extraction,
W. Li, L. Liu, and J. Zhang, “Fusion of sar and optical image for sea ice extraction,”Journal of Ocean University of China, vol. 20, no. 6, pp. 1440–1450, 2021
work page 2021
-
[9]
Sea ice classification using combined sentinel-1 and sentinel-3 data,
S. Wiehle, D. Murashkin, A. Frost, C. K ¨onig, and T. K ¨onig, “Sea ice classification using combined sentinel-1 and sentinel-3 data,” in 2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2024, pp. 102–106
work page 2024
-
[10]
L. de Lo ¨e, D. A. Clausi, and K. A. Scott, “Fusing ice surface temperature with the ai4arctic dataset for improved deep learning-based sea ice map- ping,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, 2025
work page 2025
-
[11]
Sea ice segmentation from sar data by convolutional transformer networks,
N.-C. Ristea, A. Anghel, and M. Datcu, “Sea ice segmentation from sar data by convolutional transformer networks,” inIGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, 2023, pp. 168–171
work page 2023
-
[12]
Multi-head transposed attention transformer for sea ice seg- mentation in sar imagery,
N.-C. Ristea, A. Anghel, A. Mouche, F. Nouguier, A. Grouazel, and M. Datcu, “Multi-head transposed attention transformer for sea ice seg- mentation in sar imagery,” inIGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, 2024, pp. 183–187
work page 2024
-
[13]
Deep learning in sea ice remote sensing: Challenges and opportunities,
T. R. Anderssonet al., “Deep learning in sea ice remote sensing: Challenges and opportunities,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 234–250, 2025
work page 2025
-
[14]
Advancing arctic sea ice remote sensing with ai and deep learning: Opportunities and challenges,
W. Li, C.-Y . Hsu, and M. Tedesco, “Advancing arctic sea ice remote sensing with ai and deep learning: Opportunities and challenges,” Remote Sensing, vol. 16, no. 20, p. 3764, 2024
work page 2024
-
[15]
Pan- arctic sea ice concentration from sar and passive microwave,
T. Wulf, J. Buus-Hinkler, S. Singha, H. Shi, and M. B. Kreiner, “Pan- arctic sea ice concentration from sar and passive microwave,”The Cryosphere, vol. 18, pp. 5277–5300, 2024
work page 2024
-
[16]
A comparative study of data input selection for deep learning-based automated sea ice mapping,
X. Chen, F. J. Cantu, M. Patel, L. Xu, N. Brubacher, K. A. Scott, and D. A. Clausi, “A comparative study of data input selection for deep learning-based automated sea ice mapping,”International Journal of Applied Earth Observation and Geoinformation, vol. 131, p. 103986, 2024
work page 2024
-
[17]
W. Chen, M. Tsamados, R. Willatt, S. Takaoet al., “Co-located olci optical imagery and sar altimetry from sentinel-3 for enhanced arctic spring sea ice surface classification,”Frontiers in Remote Sensing, vol. 5, p. 1401653, 2024
work page 2024
-
[18]
Ai4arctic / asip sea ice dataset - version 2,
R. Saldo, M. Brandt Kreiner, J. Buus-Hinkler, L. T. Pedersen, D. Malmgren-Hansen, A. A. Nielsenet al., “Ai4arctic / asip sea ice dataset - version 2,” 2021, dataset
work page 2021
-
[19]
Focal loss for dense object detection,
T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.