arxiv: 2604.03094 · v1 · submitted 2026-04-03 · 💻 cs.CV · cs.AI

A Data-Centric Vision Transformer Baseline for SAR Sea Ice Classification

David Mike-Ewewie , Panhapiseth Lim , Priyanka Kumar This is my paper

Pith reviewed 2026-05-13 19:48 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords sea ice classificationvision transformerSARfocal lossclass imbalancedata-centric baselineSentinel-1multi-year ice

0 comments

The pith

Vision Transformers with focal loss and leakage-aware splitting set a SAR sea ice classification baseline that improves minority-class precision.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to create a trustworthy SAR-only baseline for automated sea ice classification that future multimodal systems can reference. Accurate maps of ice types matter for tracking Arctic climate change and for safe ship routing, yet class imbalance makes rare types like multi-year ice hard to separate from more common ones. The work applies full-resolution Sentinel-1 Extra Wide imagery, stratified patch splitting that avoids data leakage between train and test sets, and training-set-only normalization before feeding the patches into Vision Transformer models. It directly compares a ViT-Base model trained with cross-entropy or weighted cross-entropy against a ViT-Large model trained with focal loss, reporting that the latter reaches 69.6 percent held-out accuracy, 68.8 percent weighted F1, and 83.9 percent precision on the scarce multi-year ice class. The central demonstration is that focal loss yields a more balanced precision-recall trade-off for the minority classes than re-weighting the loss.

Core claim

A ViT-Large model trained with focal loss on full-resolution SAR patches, after leakage-aware stratified splitting and training-set normalization, reaches 69.6 percent held-out accuracy, 68.8 percent weighted F1 score, and 83.9 percent precision on the minority multi-year ice class, while delivering a cleaner precision-recall balance for rare ice types than either cross-entropy or weighted cross-entropy training on the same data.

What carries the argument

Leakage-aware stratified patch splitting of full-resolution Sentinel-1 scenes together with training-set normalization and focal-loss training of the ViT-Large model.

If this is right

Focal loss produces a more useful precision-recall trade-off for rare ice classes than weighting the cross-entropy loss by class frequency.
The data-handling pipeline of full-resolution inputs, leakage-aware splitting, and training-set normalization can be reused as a reference point for later fusion of SAR with optical, thermal, or meteorological inputs.
The 83.9 percent precision on multi-year ice supplies a concrete numeric target that any improved method must beat on the same held-out split.
Because the baseline avoids down-sampling, fine morphological details that distinguish ice types remain available to the model.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the test distribution does match operational conditions, routine sea-ice maps could be generated with less dependence on new expert ice charts each season.
The same emphasis on careful patch splitting and normalization could be tested on other remote-sensing classification tasks that also suffer from severe class imbalance.
Higher precision on multi-year ice might allow climate models to track the persistence of older ice more reliably when the baseline is run over long time series.

Load-bearing premise

The held-out test patches created by the leakage-aware stratified splitting have a distribution close enough to future operational SAR scenes that the reported accuracy and precision numbers will translate to real-world performance.

What would settle it

Applying the same trained ViT-Large model to a fresh collection of SAR scenes from different seasons, sensors, or geographic regions and obtaining markedly lower accuracy or precision on the multi-year ice class than 83.9 percent.

Figures

Figures reproduced from arXiv: 2604.03094 by David Mike-Ewewie, Panhapiseth Lim, Priyanka Kumar.

**Figure 2.** Figure 2: Confusion Matrix for the Champion ViT-Large Model. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

read the original abstract

Accurate and automated sea ice classification is important for climate monitoring and maritime safety in the Arctic. While Synthetic Aperture Radar (SAR) is the operational standard because of its all-weather capability, it remains challenging to distinguish morphologically similar ice classes under severe class imbalance. Rather than claiming a fully validated multimodal system, this paper establishes a trustworthy SAR only baseline that future fusion work can build upon. Using the AI4Arctic/ASIP Sea Ice Dataset (v2), which contains 461 Sentinel-1 scenes matched with expert ice charts, we combine full-resolution Sentinel-1 Extra Wide inputs, leakage-aware stratified patch splitting, SIGRID-3 stage-of-development labels, and training-set normalization to evaluate Vision Transformer baselines. We compare ViT-Base models trained with cross entropy and weighted cross-entropy against a ViT-Large model trained with focal loss. Among the tested configurations, ViT-Large with focal loss achieves 69.6% held-out accuracy, 68.8% weighted F1, and 83.9% precision on the minority Multi-Year Ice class. These results show that focal-loss training offers a more useful precision-recall trade-off than weighted cross-entropy for rare ice classes and establishes a cleaner baseline for future multimodal fusion with optical, thermal, or meteorological data.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper sets a practical SAR sea ice baseline with ViT-Large and focal loss at 69.6% accuracy and strong minority-class precision, but the patch split may leave some scene-level leakage untested.

read the letter

The main thing to know is that this paper gives a usable empirical baseline for SAR sea ice classification by running Vision Transformers on the AI4Arctic/ASIP v2 dataset of 461 Sentinel-1 scenes. They use full-resolution Extra Wide inputs, a leakage-aware stratified patch split, SIGRID-3 labels, and training-set normalization, then compare ViT-Base with cross-entropy or weighted cross-entropy against ViT-Large trained with focal loss. The focal-loss run reaches 69.6% held-out accuracy, 68.8% weighted F1, and 83.9% precision on the minority multi-year ice class, which is the clearest practical takeaway for anyone needing better handling of rare ice types in climate or maritime work.

Referee Report

2 major / 2 minor

Summary. The paper establishes a data-centric SAR-only baseline for sea ice classification on the AI4Arctic/ASIP v2 dataset (461 Sentinel-1 scenes). It evaluates ViT-Base models with cross-entropy and weighted cross-entropy loss against a ViT-Large model with focal loss, using full-resolution Extra Wide inputs, leakage-aware stratified patch splitting, SIGRID-3 labels, and training-set normalization. The central empirical result is that ViT-Large + focal loss reaches 69.6% held-out accuracy, 68.8% weighted F1, and 83.9% precision on the minority Multi-Year Ice class, positioning the work as a reference for future multimodal fusion.

Significance. If the held-out metrics prove reliable, the paper supplies a concrete, reproducible baseline that quantifies the precision-recall trade-off achievable with focal loss on imbalanced SAR ice classes. Its strengths include the use of an external public dataset, explicit handling of leakage in patch splitting, and direct reporting of per-class precision on the rare Multi-Year Ice category, which future work can cite when adding optical or meteorological modalities.

major comments (2)

[§3.2] §3.2 (Data Preparation): The leakage-aware stratified patch splitting is load-bearing for the claim that the 69.6% accuracy reflects generalization to new operational scenes, yet the manuscript provides no quantitative check (e.g., intra-scene incidence-angle variance or temporal adjacency statistics) on residual correlations that may remain after splitting. Without this, the held-out set may still permit interpolation rather than extrapolation.
[Table 2] Table 2 (Results): The reported accuracy, F1, and per-class precision values are given as single point estimates with no error bars, standard deviations from repeated runs, or sensitivity analysis to hyperparameter choices, weakening the ability to judge whether the focal-loss advantage over weighted cross-entropy is statistically stable.

minor comments (2)

[§4.1] §4.1: The description of training-set normalization could explicitly state whether the same statistics are applied to the validation and test patches or recomputed per split.
[Figure 3] Figure 3: The confusion-matrix visualization would benefit from normalized row sums to make the per-class recall trade-offs immediately visible.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the presentation of our data-centric baseline. We address each major comment below and have revised the manuscript to incorporate additional analyses where feasible.

read point-by-point responses

Referee: [§3.2] §3.2 (Data Preparation): The leakage-aware stratified patch splitting is load-bearing for the claim that the 69.6% accuracy reflects generalization to new operational scenes, yet the manuscript provides no quantitative check (e.g., intra-scene incidence-angle variance or temporal adjacency statistics) on residual correlations that may remain after splitting. Without this, the held-out set may still permit interpolation rather than extrapolation.

Authors: We agree that explicit quantitative checks on residual correlations would further support the generalization claim. Our leakage-aware split ensures no patches from the same Sentinel-1 scene appear in both training and test sets, with stratification by scene-level ice-class distribution. The original manuscript did not report incidence-angle variance or temporal adjacency statistics. In the revision we add a new paragraph in §3.2 together with a supplementary table showing the mean and standard deviation of incidence angles per split and the temporal separation (in days) between training and test scenes, confirming that test scenes are drawn from distinct acquisition periods. revision: yes
Referee: [Table 2] Table 2 (Results): The reported accuracy, F1, and per-class precision values are given as single point estimates with no error bars, standard deviations from repeated runs, or sensitivity analysis to hyperparameter choices, weakening the ability to judge whether the focal-loss advantage over weighted cross-entropy is statistically stable.

Authors: We acknowledge that single-point estimates limit assessment of stability. The reported numbers reflect a single training run with fixed random seed chosen for reproducibility. For the revision we have re-trained all three model configurations (ViT-Base CE, ViT-Base WCE, ViT-Large focal) with three independent seeds and will replace the point estimates in Table 2 with mean ± standard deviation. We also add a short paragraph in §4.2 reporting a one-dimensional sensitivity sweep over the focal-loss γ parameter (γ ∈ {1,2,3}) to demonstrate that the precision advantage on Multi-Year Ice remains consistent. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical ViT baseline evaluation

full rationale

The paper conducts a purely empirical evaluation of Vision Transformer models (ViT-Base and ViT-Large) on the external public AI4Arctic/ASIP Sea Ice Dataset v2. It applies leakage-aware stratified patch splitting, SIGRID-3 labels, and training-set normalization, then reports direct held-out metrics such as 69.6% accuracy, 68.8% weighted F1, and 83.9% precision on the minority class for the ViT-Large + focal loss configuration. No equations, derivations, or self-citations reduce these metrics to quantities defined by fitted parameters within the paper. The central claims rest on standard ML training and testing procedures against an independent dataset split, with no self-definitional loops, fitted-input predictions, or load-bearing self-citations. The work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Relies on standard supervised-learning assumptions (i.i.d. patches after splitting, expert label quality, and that focal-loss hyperparameters transfer reasonably). No new physical entities or ad-hoc constants are introduced beyond typical neural-network training choices.

pith-pipeline@v0.9.0 · 5534 in / 1182 out tokens · 47211 ms · 2026-05-13T19:48:59.691842+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Satellite sar data-based sea ice classification: An overview,

N. Zakhvatkina, V . Smirnov, and I. Bychkova, “Satellite sar data-based sea ice classification: An overview,”Geosciences, vol. 9, no. 4, p. 152, 2019

work page 2019
[2]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

A. Dosovitskiyet al., “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[3]

Transformers in remote sensing: A survey,

A. A. Aleissaeeet al., “Transformers in remote sensing: A survey,” Remote Sensing, vol. 15, no. 7, p. 1860, 2023

work page 2023
[4]

Cnn and transformer fusion network for sea ice classification using gaofen-3 polarimetric sar images,

J. Zhang, W. Zhang, X. Zhou, Q. Chu, X. Yin, G. Li, X. Dai, S. Hu, and F. Jin, “Cnn and transformer fusion network for sea ice classification using gaofen-3 polarimetric sar images,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 17, 2024

work page 2024
[5]

Use of vision transformer to classify sea surface phenomena in sar imagery,

J. Xia, R. Romeiser, W. Zhang, and T. ¨Ozg¨okmen, “Use of vision transformer to classify sea surface phenomena in sar imagery,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 10 937–10 950, 2025

work page 2025
[6]

Sea ice classification with dual-polarized sar imagery: a hierarchical pipeline,

X. Chen, K. A. Scott, M. Jiang, Y . Fang, L. Xu, and D. A. Clausi, “Sea ice classification with dual-polarized sar imagery: a hierarchical pipeline,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), 2023, pp. 224– 232

work page 2023
[7]

Monitoring polar sea ice using optical and sar data,

H. Sun, C. Li, and Y . Cheng, “Monitoring polar sea ice using optical and sar data,”Marine Technology Society Journal, vol. 53, no. 6, pp. 35–41, 2019

work page 2019
[8]

Fusion of sar and optical image for sea ice extraction,

W. Li, L. Liu, and J. Zhang, “Fusion of sar and optical image for sea ice extraction,”Journal of Ocean University of China, vol. 20, no. 6, pp. 1440–1450, 2021

work page 2021
[9]

Sea ice classification using combined sentinel-1 and sentinel-3 data,

S. Wiehle, D. Murashkin, A. Frost, C. K ¨onig, and T. K ¨onig, “Sea ice classification using combined sentinel-1 and sentinel-3 data,” in 2024 IEEE International Geoscience and Remote Sensing Symposium (IGARSS). IEEE, 2024, pp. 102–106

work page 2024
[10]

Fusing ice surface temperature with the ai4arctic dataset for improved deep learning-based sea ice map- ping,

L. de Lo ¨e, D. A. Clausi, and K. A. Scott, “Fusing ice surface temperature with the ai4arctic dataset for improved deep learning-based sea ice map- ping,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, 2025

work page 2025
[11]

Sea ice segmentation from sar data by convolutional transformer networks,

N.-C. Ristea, A. Anghel, and M. Datcu, “Sea ice segmentation from sar data by convolutional transformer networks,” inIGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, 2023, pp. 168–171

work page 2023
[12]

Multi-head transposed attention transformer for sea ice seg- mentation in sar imagery,

N.-C. Ristea, A. Anghel, A. Mouche, F. Nouguier, A. Grouazel, and M. Datcu, “Multi-head transposed attention transformer for sea ice seg- mentation in sar imagery,” inIGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium, 2024, pp. 183–187

work page 2024
[13]

Deep learning in sea ice remote sensing: Challenges and opportunities,

T. R. Anderssonet al., “Deep learning in sea ice remote sensing: Challenges and opportunities,”IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 18, pp. 234–250, 2025

work page 2025
[14]

Advancing arctic sea ice remote sensing with ai and deep learning: Opportunities and challenges,

W. Li, C.-Y . Hsu, and M. Tedesco, “Advancing arctic sea ice remote sensing with ai and deep learning: Opportunities and challenges,” Remote Sensing, vol. 16, no. 20, p. 3764, 2024

work page 2024
[15]

Pan- arctic sea ice concentration from sar and passive microwave,

T. Wulf, J. Buus-Hinkler, S. Singha, H. Shi, and M. B. Kreiner, “Pan- arctic sea ice concentration from sar and passive microwave,”The Cryosphere, vol. 18, pp. 5277–5300, 2024

work page 2024
[16]

A comparative study of data input selection for deep learning-based automated sea ice mapping,

X. Chen, F. J. Cantu, M. Patel, L. Xu, N. Brubacher, K. A. Scott, and D. A. Clausi, “A comparative study of data input selection for deep learning-based automated sea ice mapping,”International Journal of Applied Earth Observation and Geoinformation, vol. 131, p. 103986, 2024

work page 2024
[17]

Co-located olci optical imagery and sar altimetry from sentinel-3 for enhanced arctic spring sea ice surface classification,

W. Chen, M. Tsamados, R. Willatt, S. Takaoet al., “Co-located olci optical imagery and sar altimetry from sentinel-3 for enhanced arctic spring sea ice surface classification,”Frontiers in Remote Sensing, vol. 5, p. 1401653, 2024

work page 2024
[18]

Ai4arctic / asip sea ice dataset - version 2,

R. Saldo, M. Brandt Kreiner, J. Buus-Hinkler, L. T. Pedersen, D. Malmgren-Hansen, A. A. Nielsenet al., “Ai4arctic / asip sea ice dataset - version 2,” 2021, dataset

work page 2021
[19]

Focal loss for dense object detection,

T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 42, no. 2, pp. 318–327, 2020

work page 2020