pith. sign in

arxiv: 2604.08171 · v1 · submitted 2026-04-09 · 💻 cs.CV · cs.AI

OceanMAE: A Foundation Model for Ocean Remote Sensing

Pith reviewed 2026-05-10 17:55 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords ocean remote sensingmasked autoencoderself-supervised learningmarine segmentationbathymetry estimationfoundation modelSentinel-2domain adaptation
0
0 comments X

The pith

Integrating physically meaningful ocean descriptors into masked autoencoder pre-training improves downstream marine segmentation quality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that an ocean-specific masked autoencoder called OceanMAE can learn more useful representations by adding auxiliary ocean descriptors to standard MAE pre-training on large unlabeled Sentinel-2 data. A sympathetic reader would care because ocean remote sensing suffers from scarce labels and models pre-trained mostly on land imagery, limiting accuracy on tasks such as debris detection and bathymetry. The work shows this domain-aligned approach yields its clearest gains on segmentation benchmarks while remaining competitive on regression. It further demonstrates through direct comparison that the added descriptors contribute to better transfer performance rather than generic self-supervision alone.

Core claim

OceanMAE extends standard MAE pre-training by jointly encoding multispectral Sentinel-2 observations and physically meaningful ocean descriptors on the Hydro dataset, producing latent representations that transfer to a modified UNet framework and deliver stronger marine pollutant and debris segmentation on MADOS and MARIDA together with competitive bathymetry results on MagicBathyNet.

What carries the argument

The auxiliary ocean descriptors added to the masked autoencoder pre-training objective, which guide the model toward ocean-aware latent representations from unlabeled multispectral imagery.

If this is right

  • OceanMAE produces its largest accuracy gains on marine debris and pollutant segmentation tasks.
  • Bathymetry estimation benefits remain competitive and vary with the specific regression setup.
  • A controlled ablation confirms that the ocean descriptors themselves drive measurable downstream improvement over a plain MAE baseline.
  • The resulting representations support transfer to both segmentation and regression heads via a shared UNet-style decoder.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same descriptor-injection pattern could be tested on other remote-sensing domains that possess domain-specific physical variables.
  • If the descriptors remain effective across different sensor resolutions, the method offers a route to build more general ocean foundation models without task-specific labels.
  • Public release of code and weights allows direct replication and extension on additional ocean datasets.

Load-bearing premise

The selected ocean descriptors are physically meaningful and sufficiently independent of the downstream task labels that their use in pre-training genuinely aids generalization rather than introducing dataset-specific leakage.

What would settle it

Retraining the same architecture on the Hydro dataset without the auxiliary ocean descriptors and observing no improvement or a drop in segmentation metrics on the MARIDA test set relative to the full OceanMAE model.

Figures

Figures reproduced from arXiv: 2604.08171 by Beg\"um Demir, Behnood Rasti, Panagiotis Agrafiotis, Viola-Joanna Stamer.

Figure 1
Figure 1. Figure 1: Overview of the OceanMAE architecture adapted from [14]. Input [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Architecture of the modified UNet for downstream ocean tasks. The [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of pollutants and sea-surface segmentation [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Accurate ocean mapping is essential for applications such as bathymetry estimation, seabed characterization, marine litter detection, and ecosystem monitoring. However, ocean remote sensing (RS) remains constrained by limited labeled data and by the reduced transferability of models pre-trained mainly on land-dominated Earth observation imagery. In this paper, we propose OceanMAE, an ocean-specific masked autoencoder that extends standard MAE pre-training by integrating multispectral Sentinel-2 observations with physically meaningful ocean descriptors during self-supervised learning. By incorporating these auxiliary ocean features, OceanMAE is designed to learn more informative and ocean-aware latent representations from large- scale unlabeled data. To transfer these representations to downstream applications, we further employ a modified UNet-based framework for marine segmentation and bathymetry estimation. Pre-trained on the Hydro dataset, OceanMAE is evaluated on MADOS and MARIDA for marine pollutant and debris segmentation, and on MagicBathyNet for bathymetry regression. The experiments show that OceanMAE yields the strongest gains on marine segmentation, while bathymetry benefits are competitive and task-dependent. In addition, an ablation against a standard MAE on MARIDA indicates that incorporating auxiliary ocean descriptors during pre-training improves downstream segmentation quality. These findings highlight the value of physically informed and domain-aligned self-supervised pre- training for ocean RS. Code and weights are publicly available at https://git.tu-berlin.de/joanna.stamer/SSLORS2.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces OceanMAE, a masked autoencoder pre-trained on the Hydro dataset that augments standard MAE with auxiliary ocean descriptors (e.g., chlorophyll concentration, sea-surface temperature) derived from Sentinel-2 multispectral observations. The model is transferred via a modified UNet to downstream tasks: marine pollutant/debris segmentation on MADOS and MARIDA, and bathymetry regression on MagicBathyNet. The central empirical claim is that the auxiliary descriptors yield stronger gains on segmentation than a standard MAE baseline, as shown by an ablation on MARIDA.

Significance. If the performance gains are shown to arise from genuinely ocean-aware representations rather than leakage, the work would provide a useful domain-adapted foundation model for ocean remote sensing, where labeled data are scarce. Public release of code and weights supports reproducibility and is a clear strength.

major comments (2)
  1. [Ablation study] Ablation study (abstract and experiments section): The reported improvement of OceanMAE over standard MAE on MARIDA does not include any quantitative check (correlation coefficients, mutual information, or per-descriptor ablation) that the auxiliary descriptors are statistically independent of the marine debris/pollutant segmentation labels. Without this, the performance gap could be explained by implicit weak supervision during pre-training rather than improved generalization.
  2. [Methods] Methods section: The description of how auxiliary ocean descriptors are encoded, normalized, and fused into the MAE encoder/decoder (including changes to input dimensionality, positional embeddings, or the reconstruction loss) is insufficiently detailed to allow replication or to assess whether the integration is parameter-free or introduces new hyperparameters that could affect the claimed gains.
minor comments (2)
  1. [Abstract] Abstract: No quantitative metrics, dataset sizes, or error bars are provided despite the claim of 'strongest gains'; adding these would strengthen the summary.
  2. [Experiments] Evaluation protocol: Clarify the exact fine-tuning procedure, number of epochs, learning-rate schedule, and whether the same data augmentations are used for both OceanMAE and the standard MAE baseline.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments. We address each major point below and will revise the manuscript accordingly to improve clarity and strengthen the empirical claims.

read point-by-point responses
  1. Referee: [Ablation study] Ablation study (abstract and experiments section): The reported improvement of OceanMAE over standard MAE on MARIDA does not include any quantitative check (correlation coefficients, mutual information, or per-descriptor ablation) that the auxiliary descriptors are statistically independent of the marine debris/pollutant segmentation labels. Without this, the performance gap could be explained by implicit weak supervision during pre-training rather than improved generalization.

    Authors: We agree this is a valid concern: without explicit independence checks, the observed gains could partly reflect correlations between the auxiliary descriptors and the downstream labels rather than purely improved generalization. Although pre-training remains fully self-supervised (no segmentation labels are used), the descriptors are derived from the same Sentinel-2 observations and could carry implicit information. In the revised manuscript we will add a dedicated analysis subsection that reports Pearson correlations and mutual information between each auxiliary descriptor and the MARIDA labels, together with per-descriptor ablation results. These additions will allow readers to assess the degree of any leakage and better attribute the performance improvements. revision: yes

  2. Referee: [Methods] Methods section: The description of how auxiliary ocean descriptors are encoded, normalized, and fused into the MAE encoder/decoder (including changes to input dimensionality, positional embeddings, or the reconstruction loss) is insufficiently detailed to allow replication or to assess whether the integration is parameter-free or introduces new hyperparameters that could affect the claimed gains.

    Authors: We acknowledge that the current methods description is too high-level for full reproducibility. In the revised version we will expand the OceanMAE architecture subsection to specify: (i) the exact normalization applied to each descriptor (z-score using Hydro dataset statistics), (ii) the encoding mechanism (concatenation as additional input channels with an adjusted linear patch embedding layer), (iii) any consequent changes to positional embeddings, and (iv) confirmation that the reconstruction loss remains the standard masked MSE with no auxiliary terms. We will also state explicitly that the only new design choice is the selection of the four descriptors; no additional hyperparameters are introduced beyond the original MAE configuration. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ablation with no derivation chain

full rationale

The paper presents no mathematical derivation, first-principles result, or predictive claim that reduces to its own inputs by construction. All load-bearing evidence consists of empirical ablations (OceanMAE vs. standard MAE on MARIDA) and downstream evaluations on public datasets (MADOS, MARIDA, MagicBathyNet). Pre-training incorporates auxiliary ocean descriptors by design, but the performance gap is measured externally rather than being tautological. No self-citation load-bearing steps, uniqueness theorems, or fitted parameters renamed as predictions appear. The work is self-contained as an empirical study with public code.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of masked autoencoder pre-training (reconstruction of masked patches yields useful representations) and transfer learning (representations learned on unlabeled data transfer to labeled downstream tasks). No free parameters, axioms, or invented entities are explicitly introduced in the abstract beyond the model itself.

axioms (1)
  • domain assumption Masked reconstruction on multispectral imagery plus auxiliary descriptors produces ocean-aware latent features that transfer to segmentation and regression.
    Invoked implicitly when claiming that the pre-trained representations improve downstream performance.

pith-pipeline@v0.9.0 · 5566 in / 1430 out tokens · 21713 ms · 2026-05-10T17:55:55.872750+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    Hydro foundation model,

    I.Corley and C.Robinson, “Hydro foundation model,” 2024. [Online]. Available: https://github.com/isaaccorley/hydro-foundation-model

  2. [2]

    Detecting marine pollutants and sea surface features with deep learning in sentinel-2 imagery,

    K.Kikaki, I.Kakogeorgiou, I.Hoteit, and K.Karantzalos, “Detecting marine pollutants and sea surface features with deep learning in sentinel-2 imagery,” vol. 210, pp. 39–54. [Online]. Available: https://www.sciencedirect.com/science/article/pii/S0924271624000625

  3. [3]

    MARIDA: A benchmark for marine debris detection from sentinel-2 remote sensing data,

    K.Kikaki, I.Kakogeorgiou, P.Mikeli, D.E.Raitsos, and K.Karantzalos, “MARIDA: A benchmark for marine debris detection from sentinel-2 remote sensing data,” vol. 17, no. 1, p. e0262247. [Online]. Available: https://dx.plos.org/10.1371/journal.pone.0262247

  4. [4]

    MAGIC- BATHYNET: A multimodal remote sensing dataset for bathymetry prediction and pixel-based classification in shallow waters,

    P.Agrafiotis, Ł.Janowski, D.Skarlatos, and B.Demir, “MAGIC- BATHYNET: A multimodal remote sensing dataset for bathymetry prediction and pixel-based classification in shallow waters,” in IGARSS 2024 - 2024 IEEE International Geoscience and Remote Sensing Symposium. IEEE, pp. 249–253. [Online]. Available: https://ieeexplore.ieee.org/document/10641355/

  5. [5]

    A review of active and passive optical methods in hydrography,

    G.Mandlburger, “A review of active and passive optical methods in hydrography,”The International Hydrographic Review, vol. 28, pp. 8– 52, 11 2022

  6. [6]

    Deepblue: Advanced convolutional neural network applications for ocean remote sensing,

    H. Wang and X. Li, “Deepblue: Advanced convolutional neural network applications for ocean remote sensing,”IEEE geoscience and remote sensing magazine, vol. 12, no. 1, pp. 138–161, 2023

  7. [7]

    Satellite remote sensing and bathymetry co-driven deep neu- ral network for coral reef shallow water benthic habitat classification,

    H. Chen, J. Cheng, X. Ruan, J. Li, L. Ye, S. Chu, L. Cheng, and K. Zhang, “Satellite remote sensing and bathymetry co-driven deep neu- ral network for coral reef shallow water benthic habitat classification,” International Journal of Applied Earth Observation and Geoinforma- tion, vol. 132, p. 104054, 2024

  8. [8]

    Developments in deep learning algorithms for coastline extraction from remote sensing imagery: a systematic review,

    S. Khurram, A. B. Pour, M. Bagheri, E. H. Ariffin, M. F. Akhir, and S. B. Hamzah, “Developments in deep learning algorithms for coastline extraction from remote sensing imagery: a systematic review,”Earth Science Informatics, vol. 18, no. 3, p. 292, 2025

  9. [9]

    Seabed-net: A multi-task network for joint bathymetry estimation and seabed classification from remote sensing imagery in shallow waters,

    P. Agrafiotis and B. Demir, “Seabed-net: A multi-task network for joint bathymetry estimation and seabed classification from remote sensing imagery in shallow waters,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 231, pp. 414–430, 2026

  10. [10]

    Deep learning for ocean forecasting: A comprehensive review of methods, applications, and datasets,

    R. Hao, Y . Zhao, S. Zhang, and X. Deng, “Deep learning for ocean forecasting: A comprehensive review of methods, applications, and datasets,”IEEE Transactions on Cybernetics, 2025

  11. [11]

    Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,

    O.Manas, A.Lacoste, X. i Nieto, D.Vazquez, and P.Rodriguez, “Seasonal contrast: Unsupervised pre-training from uncurated remote sensing data,” in2021 IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp. 9394–9403. [Online]. Available: https://ieeexplore.ieee.org/document/9710545/

  12. [12]

    Spectralgpt: Spectral remote sensing foun- dation model,

    D. Hong, B. Zhang, X. Li, Y . Li, C. Li, J. Yao, N. Yokoya, H. Li, P. Ghamisi, X. Jiaet al., “Spectralgpt: Spectral remote sensing foun- dation model,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 8, pp. 5227–5244, 2024

  13. [13]

    U-net: Convolutional networks for biomedical image segmentation,

    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” inInternational Conference on Medical image computing and computer-assisted intervention. Springer, 2015, pp. 234–241

  14. [14]

    Masked autoencoders are scalable vision learners

    K.He, X.Chen, S.Xie, Y .Li, P.Dollar, and R.Girshick, “Masked autoencoders are scalable vision learners,” in2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, pp. 15 979–15 988. [Online]. Available: https://ieeexplore.ieee. org/document/9879206/

  15. [15]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929

  16. [16]

    Feature guided masked autoencoder for self-supervised learning in remote sensing,

    Y .Wang, H.H.Hern´andez, C.M.Albrecht, and X.X.Zhu, “Feature guided masked autoencoder for self-supervised learning in remote sensing,” vol. 18, pp. 321–336. [Online]. Available: https://ieeexplore.ieee.org/ document/10766851/

  17. [17]

    11 Published at The 2nd Workshop on Foundation Models for Science at ICLR 2026 Remi Denton and Vighnesh Birodkar

    Y .Cong, S.Khanna, C.Meng, P.Liu, E.Rozi, Y .He, M.Burke, D.B.Lobell, and S.Ermon, “SatMAE: Pre-training transformers for temporal and multi-spectral satellite imagery,” version Number: 3. [Online]. Available: https://arxiv.org/abs/2207.08051

  18. [18]

    SSL4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],

    Y .Wang, N.A.A.Braham, Z.Xiong, C.Liu, C.M.Albrecht, and X.X.Zhu, “SSL4eo-s12: A large-scale multimodal, multitemporal dataset for self-supervised learning in earth observation [software and data sets],” pp. 98–106. [Online]. Available: https://ieeexplore.ieee.org/document/ 10261879/