pith. machine review for the scientific record. sign in

arxiv: 2602.10137 · v1 · submitted 2026-02-08 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:08 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords multispectral semantic segmentationConvNeXtfeature fusionattention mechanismland cover classificationremote sensingencoder-decoder network
0
0 comments X

The pith

MeCSAFNet uses separate ConvNeXt encoders and attentional fusion to improve multispectral land cover segmentation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MeCSAFNet, a multi-encoder network for semantic segmentation of multispectral images. It processes visible and non-visible channels through dual ConvNeXt encoders, reconstructs features with separate decoders, and fuses them in a dedicated decoder using CBAM attention to combine spatial and spectral information. The model supports 4-channel inputs such as RGB plus NIR and 6-channel inputs that add NDVI and NDWI indices. Experiments on the Five-Billion-Pixels and Potsdam datasets show mIoU gains of 14 to 19 percent over U-Net and SegFormer on FBP and 4 to 9 percent over DeepLabV3+ and SegFormer on Potsdam. A sympathetic reader would care because more accurate segmentation from multispectral data can support better environmental monitoring and land management decisions.

Core claim

The paper claims that its MeCSAFNet architecture, which applies dual ConvNeXt encoders to process spectral channels independently before integrating features through a multi-scale fusion decoder with CBAM attention and ASAU activation, produces higher mIoU scores than standard encoder-decoder models when segmenting multispectral land cover imagery on the FBP and Potsdam benchmarks.

What carries the argument

Dual ConvNeXt encoders that handle visible and non-visible channels separately, followed by a fusion decoder that performs multi-scale attentional feature combination with CBAM to merge fine spatial cues and high-level spectral representations.

If this is right

  • MeCSAFNet-base with 6-channel input raises mIoU by 14.72 to 19.21 percent over U-Net and SegFormer on the FBP dataset.
  • MeCSAFNet-large with 4-channel input raises mIoU by 4.80 to 9.11 percent over DeepLabV3+ and SegFormer on the Potsdam dataset.
  • Compact variants of the model maintain strong accuracy while lowering training time and inference cost.
  • The same architecture works without modification for both 4-channel RGB+NIR inputs and 6-channel inputs that include NDVI and NDWI indices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Independent encoding of spectral channels may prevent loss of band-specific information that occurs when all channels are mixed in a single early layer.
  • The attentional fusion step could prove useful for other remote sensing tasks that combine spatial detail with spectral signatures, such as change detection.
  • The design suggests a path for scaling to hyperspectral data where the number of distinct channels is much larger.

Load-bearing premise

The reported mIoU gains come from the dual-encoder and attentional fusion design rather than from dataset-specific tuning, hyperparameter choices, or unstated differences in training procedures.

What would settle it

Re-training all compared models including U-Net, SegFormer, and DeepLabV3+ under identical training protocols, data splits, and hyperparameters on the same FBP and Potsdam datasets, then checking whether the mIoU differences disappear.

Figures

Figures reproduced from arXiv: 2602.10137 by Angel D. Sappa, Leo Thomas Ramos.

Figure 1
Figure 1. Figure 1: Example images of the Five-Billion-Pixels dataset [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example images of the Potsdam dataset. categories: impervious surfaces, buildings, low vegetation, trees, cars, and clutter/background. Additionally, the images are provided in TIFF format and are accompanied by digital surface models (DSM) with a ground sampling distance of 5 cm. Examples from this dataset are shown in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the utilized architecture in this work. layer, enabling wider architectures without increasing com￾putational cost and facilitating the learning of varied and rich features. Furthermore, ConvNeXt incorporates the inverted bottleneck design, similar to that in the Swin Transformer. This approach allows the hidden dimension of the multi￾layer perceptrons to be larger than the input dimension, ena… view at source ↗
Figure 4
Figure 4. Figure 4: Comparison between ResNet, Swin Transformer, and ConvNeXt blocks. features are extracted hierarchically, with finer details be￾ing captured in the later stages. At the end of the fea￾ture extraction process, the baseline ConvNeXt architec￾ture incorporates a global average pooling layer followed by a fully convolutional layer as the classifier. The Con￾vNeXt architecture is available in four variants: Tiny… view at source ↗
Figure 5
Figure 5. Figure 5: ConvNeXt architecture structure (base version). Stages are connected sequentially, where the output of each stage serves as the input to the subsequent stage through downsampling operations [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Structure of the employed decoder block. Given the mentioned characteristics, ASAU is particu￾larly suitable for segmentation tasks that require fine-grained spatial precision and sensitivity to subtle variations. We include it in our architecture to enhance the model’s ability to capture fine textural differences and class boundaries, ul￾timately contributing to more accurate segmentation results. Compare… view at source ↗
Figure 8
Figure 8. Figure 8: Structure of the employed fusion block. CBAM is a hybrid attention mechanism that integrates two complementary strategies: channel attention and spatial attention, as shown in [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of the original ASAU formulation and the modified ASAU version used in this work. in [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: CBAM mechanism and its components. ⊕ denotes element-wise summation, ⊗ denotes element-wise multiplication (attention gating), and AvgPool and MaxPool indicate average and max pooling operations, respectively. 3.3.2. Intersection over union Intersection over Union (IoU) is a critical metric for evaluating segmentation tasks. It quantifies the degree of overlap between the predicted segmentation mask and th… view at source ↗
Figure 10
Figure 10. Figure 10: Segmentation results of our approach compared to other baseline methods on the Five-Billion-Pixels dataset. Regions demarcated by dashed lines indicate the areas where the most substantial discrepancies between models are observed. epochs for full convergence. Nevertheless, the large vari￾ant still outperforms the tiny and small models across all metrics, indicating a clear upward performance trend with i… view at source ↗
Figure 11
Figure 11. Figure 11: Segmentation results of our approach compared to other baseline methods on the Potsdam dataset. Regions demarcated by dashed lines indicate the areas where the most substantial discrepancies between models are observed. when compared to a selection of baseline and state-of-the￾art models. Compared to standard baselines such as U￾Net, DeepLabV3+, and SegFormer, our models demon￾strate substantial gains in … view at source ↗
read the original abstract

This work proposes MeCSAFNet, a multi-branch encoder-decoder architecture for land cover segmentation in multispectral imagery. The model separately processes visible and non-visible channels through dual ConvNeXt encoders, followed by individual decoders that reconstruct spatial information. A dedicated fusion decoder integrates intermediate features at multiple scales, combining fine spatial cues with high-level spectral representations. The feature fusion is further enhanced with CBAM attention, and the ASAU activation function contributes to stable and efficient optimization. The model is designed to process different spectral configurations, including a 4-channel (4c) input combining RGB and NIR bands, as well as a 6-channel (6c) input incorporating NDVI and NDWI indices. Experiments on the Five-Billion-Pixels (FBP) and Potsdam datasets demonstrate significant performance gains. On FBP, MeCSAFNet-base (6c) surpasses U-Net (4c) by +19.21%, U-Net (6c) by +14.72%, SegFormer (4c) by +19.62%, and SegFormer (6c) by +14.74% in mIoU. On Potsdam, MeCSAFNet-large (4c) improves over DeepLabV3+ (4c) by +6.48%, DeepLabV3+ (6c) by +5.85%, SegFormer (4c) by +9.11%, and SegFormer (6c) by +4.80% in mIoU. The model also achieves consistent gains over several recent state-of-the-art approaches. Moreover, compact variants of MeCSAFNet deliver notable performance with lower training time and reduced inference cost, supporting their deployment in resource-constrained environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proposes MeCSAFNet, a multi-encoder ConvNeXt-based architecture for multispectral semantic segmentation. It processes visible and non-visible channels via separate encoders, reconstructs features with individual decoders, and fuses multi-scale representations in a dedicated decoder using CBAM attention and the ASAU activation. Experiments on the FBP and Potsdam datasets report large mIoU gains for both 4-channel and 6-channel inputs over U-Net, SegFormer, and DeepLabV3+ baselines, with additional claims of efficiency for compact variants.

Significance. If the reported mIoU improvements can be shown to arise specifically from the dual-encoder design and attentional fusion rather than training differences, the approach would offer a practical advance for multispectral land-cover segmentation, especially in settings where compact models with lower inference cost are needed.

major comments (3)
  1. [Experimental results] Experimental results section: the headline mIoU claims (e.g., MeCSAFNet-base (6c) surpassing U-Net (4c) by +19.21% on FBP and MeCSAFNet-large (4c) surpassing DeepLabV3+ (4c) by +6.48% on Potsdam) are presented without ablation tables that remove only the fusion decoder or CBAM while keeping encoder count and channel handling fixed, so the attribution to the proposed components remains unverified.
  2. [Experimental setup] Experimental setup: no information is given on whether the baseline models were retrained under identical optimizer, augmentation, batch size, and schedule choices as MeCSAFNet; without this, the observed deltas cannot be isolated from protocol differences.
  3. [Results tables] Results tables: the mIoU figures are reported as single-point values with no error bars, standard deviations across runs, or statistical significance tests, weakening confidence in the magnitude of the claimed gains.
minor comments (2)
  1. [Method] The description of the ASAU activation function is introduced without a precise mathematical definition or comparison to standard alternatives such as ReLU or GELU.
  2. [Figure 1] Figure captions for the network diagram could more explicitly label the visible vs. non-visible encoder branches and the multi-scale fusion paths.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below with clarifications and indicate where revisions will be made to improve the experimental rigor.

read point-by-point responses
  1. Referee: Experimental results section: the headline mIoU claims (e.g., MeCSAFNet-base (6c) surpassing U-Net (4c) by +19.21% on FBP and MeCSAFNet-large (4c) surpassing DeepLabV3+ (4c) by +6.48% on Potsdam) are presented without ablation tables that remove only the fusion decoder or CBAM while keeping encoder count and channel handling fixed, so the attribution to the proposed components remains unverified.

    Authors: We acknowledge that targeted ablations isolating the fusion decoder and CBAM (while fixing encoder count and channel handling) would strengthen attribution of the gains. Our current experiments focus on end-to-end comparisons, but we will add these specific ablation studies in the revised manuscript. revision: yes

  2. Referee: Experimental setup: no information is given on whether the baseline models were retrained under identical optimizer, augmentation, batch size, and schedule choices as MeCSAFNet; without this, the observed deltas cannot be isolated from protocol differences.

    Authors: All baselines (U-Net, SegFormer, DeepLabV3+) were retrained from scratch using identical settings: AdamW optimizer, the same augmentation pipeline, batch size of 8, and the identical learning rate schedule as MeCSAFNet. We will expand the experimental setup section to state this explicitly. revision: yes

  3. Referee: Results tables: the mIoU figures are reported as single-point values with no error bars, standard deviations across runs, or statistical significance tests, weakening confidence in the magnitude of the claimed gains.

    Authors: We agree that reporting variability would increase confidence. Due to high computational cost on these large datasets, results are from single runs. In the revision we will add a limitations paragraph noting this and highlighting the consistency of gains across model sizes and datasets as supporting evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with external dataset benchmarks

full rationale

The manuscript proposes MeCSAFNet as a dual-ConvNeXt encoder architecture with CBAM attentional fusion and ASAU activation, then reports mIoU numbers on the public FBP and Potsdam datasets against published baselines (U-Net, SegFormer, DeepLabV3+). No equations, uniqueness theorems, or parameter-fitting steps appear in the provided text. All performance claims are direct empirical comparisons; the architecture choices are presented as design decisions rather than derived quantities that reduce to their own inputs by construction. No self-citations are invoked to close any logical loop. This is the standard non-circular pattern for an applied CV architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical performance of a neural network whose design choices (dual encoders, attentional fusion, specific activation) are treated as given; no explicit free parameters, mathematical axioms, or invented physical entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5626 in / 1395 out tokens · 38313 ms · 2026-05-16T06:08:08.015645+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages

  1. [1]

    Enhancing food crop classification in agriculture through dipper throat optimization and deep learning with remote sensing

    Antony, A., R, G.K., 2024. Enhancing food crop classification in agriculture through dipper throat optimization and deep learning with remote sensing. e-Prime - Advances in Electrical Engineering, Electronics and Energy 9, 100732. doi:10.1016/j.prime.2024.100732

  2. [2]

    Convnext based semi-supervised approach with consistency regularization for weedsclassification

    Benchallal, F., Hafiane, A., Ragot, N., Canals, R., 2024. Convnext based semi-supervised approach with consistency regularization for weedsclassification. ExpertSystemswithApplications239,122222. doi:https://doi.org/10.1016/j.eswa.2023.122222

  3. [3]

    Biswas, K., Jha, D., Tomar, N.K., Karri, M., Reza, A., Durak, G., Medetalibeyoglu,A.,Antalek,M.,Velichko,Y.,Ladner,D.,Borhani, A., Bagci, U., 2024. Adaptive smooth activation function for im- proved organ segmentation and disease diagnosis, in: Medical Image Ramos and Sappa: Preprint submitted for review Page 18 of 21 Multi-encoder ConvNeXt Network with S...

  4. [4]

    Dual streamfusionnetworkformulti-spectralhighresolutionremotesens- ing image segmentation, in: Pattern Recognition and Computer Vi- sion, Springer International Publishing, Cham

    Cao, Y., Shi, Y., Liu, Y., Huo, C., Xiang, S., Pan, C., 2021. Dual streamfusionnetworkformulti-spectralhighresolutionremotesens- ing image segmentation, in: Pattern Recognition and Computer Vi- sion, Springer International Publishing, Cham. pp. 537–547. doi:10. 1007/978-3-030-88007-1_44

  5. [5]

    Coarse-to-finesemantic segmentation of satellite images

    Chen,H.,Yang,W.,Liu,L.,Xia,G.S.,2024. Coarse-to-finesemantic segmentation of satellite images. ISPRS Journal of Photogrammetry and Remote Sensing 217, 1–17. doi:10.1016/j.isprsjprs.2024.07. 028

  6. [6]

    Strengthen the feature distinguishability of geo-object details in the semantic segmentation of high-resolution remote sensing images

    Chen, J., Wang, H., Guo, Y., Sun, G., Zhang, Y., Deng, M., 2021. Strengthen the feature distinguishability of geo-object details in the semantic segmentation of high-resolution remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 2327–2340. doi:10.1109/JSTARS.2021.3053067

  7. [7]

    A simple framework for contrastive learning of visual representations, in: Pro- ceedings of the 37th International Conference on Machine Learning, JMLR.org

    Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020. A simple framework for contrastive learning of visual representations, in: Pro- ceedings of the 37th International Conference on Machine Learning, JMLR.org. doi:10.5555/3524938.3525087

  8. [8]

    Xception: DeepLearning with Depthwise Separa- bleConvolutions,in:2017IEEEConferenceonComputerVisionand PatternRecognition(CVPR),IEEEComputerSociety,LosAlamitos, CA,USA.pp.1800–1807

    Chollet, F., 2017. Xception: DeepLearning with Depthwise Separa- bleConvolutions,in:2017IEEEConferenceonComputerVisionand PatternRecognition(CVPR),IEEEComputerSociety,LosAlamitos, CA,USA.pp.1800–1807. URL:10.1109/CVPR.2017.195,doi: 10.1109/ CVPR.2017.195

  9. [9]

    Dong,R.,Mou,L.,Chen,M.,Li,W.,Tong,X.Y.,Yuan,S.,Zhang,L., Zheng, J., Zhu, X.X., Fu, H., 2023. Large-scale land cover mapping with fine-grained classes via class-aware semi-supervised semantic segmentation,in:2023IEEE/CVFInternationalConferenceonCom- puterVision(ICCV),pp.16737–16747. doi: 10.1109/ICCV51070.2023. 01539

  10. [10]

    Frequency-aware robust multidimensional information fusion framework for remote sensing image segmentation

    Fan, J., Li, J., Liu, Y., Zhang, F., 2024. Frequency-aware robust multidimensional information fusion framework for remote sensing image segmentation. Engineering Applications of Artificial Intelli- gence 129, 107638. doi:10.1016/j.engappai.2023.107638

  11. [11]

    Fu, G., Lin, K., Mu, S., Lu, C., Wang, X., Wang, T., 2026. Offline classification training-online regression prediction mode for spindle thermal error prediction based on convnext-resnet parallel hybrid network with vision-based thermal image measurement. Measure- ment257,119025. doi: https://doi.org/10.1016/j.measurement.2025. 119025

  12. [12]

    Gao, L., Liu, H., Yang, M., Chen, L., Wan, Y., Xiao, Z., Qian, Y.,

  13. [13]

    IEEE JournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 14, 10990–11003

    Stransfuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE JournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 14, 10990–11003. doi:10.1109/JSTARS.2021.3119654

  14. [14]

    Hyperspectral and multispectral classification for coastal wetland using depthwise feature interaction network

    Gao,Y.,Li,W.,Zhang,M.,Wang,J.,Sun,W.,Tao,R.,Du,Q.,2022. Hyperspectral and multispectral classification for coastal wetland using depthwise feature interaction network. IEEE Transactions on Geoscience and Remote Sensing 60, 1–15. doi:10.1109/TGRS.2021. 3097093

  15. [15]

    U-net convolutional networks for mining land cover classification based on high-resolution uav imagery

    Giang, T.L., Dang, K.B., Toan Le, Q., Nguyen, V.G., Tong, S.S., Pham, V.M., 2020. U-net convolutional networks for mining land cover classification based on high-resolution uav imagery. IEEE Access 8, 186257–186273. doi:10.1109/ACCESS.2020.3030112

  16. [16]

    Staal, M

    He, A., Li, T., Li, N., Wang, K., Fu, H., 2021. Cabnet: Category attention block for imbalanced diabetic retinopathy grading. IEEE Transactions on Medical Imaging 40, 143–153. doi:10.1109/TMI. 2020.3023463

  17. [17]

    Csit:Amultiscale vision transformer for hyperspectral image classification

    He,W.,Huang,W.,Liao,S.,Xu,Z.,Yan,J.,2022. Csit:Amultiscale vision transformer for hyperspectral image classification. IEEE JournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 15, 9266–9277. doi:10.1109/JSTARS.2022.3216335

  18. [18]

    Spectralgpt: Spectral remote sensing foundation model

    Hong,D.,Zhang,B.,Li,X.,Li,Y.,Li,C.,Yao,J.,Yokoya,N.,Li,H., Ghamisi,P.,Jia,X.,Plaza,A.,Gamba,P.,Benediktsson,J.A.,Chanus- sot, J., 2024. Spectralgpt: Spectral remote sensing foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence 46, 5227–5244. doi:10.1109/TPAMI.2024.3362475

  19. [19]

    A commentary review on the use of normalized difference vegetation index(ndvi)intheeraofpopularremotesensing

    Huang, S., Tang, L., Hupy, J.P., Wang, Y., Shao, G., 2020. A commentary review on the use of normalized difference vegetation index(ndvi)intheeraofpopularremotesensing. JournalofForestry Research 32, 1–6. doi:https://doi.org/10.1007/s11676-020-01155-1

  20. [20]

    Multiscale semantic segmentation of remote sensing images based on edge optimization

    Huang, W., Deng, F., Liu, H., Ding, M., Yao, Q., 2025. Multiscale semantic segmentation of remote sensing images based on edge optimization. IEEETransactionsonGeoscienceandRemoteSensing 63, 1–13. doi:10.1109/TGRS.2025.3553524

  21. [21]

    Demystifyingnormalized differencevegetationindex(ndvi)forgreennessexposureassessments and policy interventions in urban greening

    delaIglesiaMartinez,A.,Labib,S.,2023. Demystifyingnormalized differencevegetationindex(ndvi)forgreennessexposureassessments and policy interventions in urban greening. Environmental Research 220, 115155. doi:https://doi.org/10.1016/j.envres.2022.115155

  22. [22]

    Application of uav-based photogrammetry and normalised water index (ndwi) to estimate the rock mass rating (rmr): A case study

    Ismail, A., A Rashid, A.S., Sa’ari, R., Rasib, A.W., Mustaffar, M., Abdullah, R.A., Kassim, A., Mohd Yusof, N., Abd Rahaman, N., Mohd Apandi, N., Kalatehjari, R., 2022. Application of uav-based photogrammetry and normalised water index (ndwi) to estimate the rock mass rating (rmr): A case study. Physics and Chemistry of the Earth,PartsA/B/C127,103161. doi...

  23. [23]

    Acomprehensivereviewofremote sensing platforms, sensors, and applications in nut crops

    Jafarbiglu,H.,Pourreza,A.,2022. Acomprehensivereviewofremote sensing platforms, sensors, and applications in nut crops. Computers and Electronics in Agriculture 197, 106844. doi:10.1016/j.compag. 2022.106844

  24. [24]

    Spatio-temporal analysis of land use land cover change and its impact on land surface temperature of sialkot city, pakistan

    Javaid, K., Ghafoor, G.Z., Sharif, F., Shahid, M.G., Shahzad, L., Ghafoor, N., Hayyat, M.U., Farhan, M., 2023. Spatio-temporal analysis of land use land cover change and its impact on land surface temperature of sialkot city, pakistan. Scientific Reports 13. doi:10. 1038/s41598-023-49608-x

  25. [25]

    IET image processing doi:10.1049/ipr2.13101

    Jiang,J.,Feng,X.,Huang,H.,2024.Semanticsegmentationofremote sensing images based on dual-channel attention mechanism. IET image processing doi:10.1049/ipr2.13101

  26. [26]

    Karmakar, P., Teng, S.W., Murshed, M., Pang, S., Li, Y., Lin, H.,

  27. [27]

    Remote Sensing Applications: Society and Environment 33, 101093

    Crop monitoring by multimodal remote sensing: A review. Remote Sensing Applications: Society and Environment 33, 101093. doi:10.1016/j.rsase.2023.101093

  28. [28]

    Review on convolutional neural networks (cnn) in vegetation remote sensing

    Kattenborn, T., Leitloff, J., Schiefer, F., Hinz, S., 2021. Review on convolutional neural networks (cnn) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing 173, 24–49. doi:10.1016/j.isprsjprs.2020.12.010

  29. [29]

    Afusedmulti-subfrequencybandsandcbamssvep- bci classification method based on convolutional neural network

    Lei, D., Dong, C., Guo, H., Ma, P., Liu, H., Bao, N., Kang, H., Chen, X.,Wu,Y.,2024. Afusedmulti-subfrequencybandsandcbamssvep- bci classification method based on convolutional neural network. Scientific Reports 14, 8616. doi:10.1038/s41598-024-59348-1

  30. [30]

    Dual-path feature fusion network for semantic segmentation of remote sensing images

    Li, B., Zhang, Y., Zhang, Y., Li, B., Li, Z., 2024a. Dual-path feature fusion network for semantic segmentation of remote sensing images. IEEE Geoscience and Remote Sensing Letters 21, 1–5. doi:10.1109/ LGRS.2024.3402690

  31. [31]

    Csnet:Aremote sensing image semantic segmentation network based on coordinate attention and skip connections

    Li,J.,Zhang,H.,Chen,L.,He,B.,Chen,H.,2025a. Csnet:Aremote sensing image semantic segmentation network based on coordinate attention and skip connections. Remote Sensing 17. doi:10.3390/ rs17122048

  32. [32]

    Mscr-hrnetv2: High- resolution remote sensing image segmentation for railway scenes, in: 2023 China Automation Congress (CAC), pp

    Li, L., Yang, Q., Shi, R., Teng, J., 2023a. Mscr-hrnetv2: High- resolution remote sensing image segmentation for railway scenes, in: 2023 China Automation Congress (CAC), pp. 5809–5814. doi:10. 1109/CAC59555.2023.10451482

  33. [33]

    Li, M.J., Zhu, M.C., Ma, Z., Li, P.S., Zhang, X.B., Hou, A.K., Shi, J.B., He, Y., Chen, K., Weng, T., He, Z.Y., Zheng, Z.Z., Jiang, L.,

  34. [34]

    Classification of surface natural resources based on u-net and gf-1 satellite images, in: 2020 17th International Computer Confer- ence on Wavelet Active Media Technology and Information Process- ing (ICCWAMTIP), pp. 179–182. doi:10.1109/ICCWAMTIP51612.2020. 9317315

  35. [35]

    Cmpf-unet: a convnext multi-scale pyramid fusion u-shaped network for multi-category segmentation of remote sensing images

    Li, N., Yu, X., Yu, M., 2024b. Cmpf-unet: a convnext multi-scale pyramid fusion u-shaped network for multi-category segmentation of remote sensing images. Geocarto International 39, 2311217. doi:10.1080/10106049.2024.2311217

  36. [36]

    Li, P., Lin, Y., Schultz-Fellenz, E., 2024c. Contextual hourglass networkforsemanticsegmentationofhighresolutionaerialimagery, Ramos and Sappa: Preprint submitted for review Page 19 of 21 Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation in: 2024 5th International Conference on Electronic Communi...

  37. [37]

    Multistage attentionresu-netforsemanticsegmentationoffine-resolutionremote sensing images

    Li, R., Zheng, S., Duan, C., Su, J., Zhang, C., 2022. Multistage attentionresu-netforsemanticsegmentationoffine-resolutionremote sensing images. IEEE Geoscience and Remote Sensing Letters 19, 1–5. doi:10.1109/LGRS.2021.3063381

  38. [38]

    Adaptingcross-sensorhigh-resolution remote sensing imagery for land use classification

    Li,W.,Sun,K.,Wei,J.,2025b. Adaptingcross-sensorhigh-resolution remote sensing imagery for land use classification. Remote Sensing

  39. [39]

    doi:10.3390/rs17050927

  40. [40]

    Soil carbon content prediction using multi-source data feature fusion of deep learning based on spectral and hyperspectral images

    Li, X., Li, Z., Qiu, H., Chen, G., Fan, P., 2023b. Soil carbon content prediction using multi-source data feature fusion of deep learning based on spectral and hyperspectral images. Chemosphere 336, 139161. doi:10.1016/j.chemosphere.2023.139161

  41. [41]

    Dspcanet: Dual- channelscale-awaresegmentationnetworkwithpositionandchannel attentionsforhigh-resolutionaerialimages

    Li, Y.C., Li, H.C., Hu, W.S., Yu, H.L., 2021. Dspcanet: Dual- channelscale-awaresegmentationnetworkwithpositionandchannel attentionsforhigh-resolutionaerialimages. IEEEJournalofSelected TopicsinAppliedEarthObservationsandRemoteSensing14,8552–

  42. [42]

    doi:10.1109/JSTARS.2021.3102137

  43. [43]

    Liang, Z., Wang, X., 2022. Semantic segmentation network with band-locationadaptiveselectionmechanismformultispectralremote sensing images, in: IGARSS 2022 - 2022 IEEE International Geo- science and Remote Sensing Symposium, pp. 3488–3491. doi:10. 1109/IGARSS46834.2022.9884212

  44. [44]

    Hslabeling: Toward efficient labeling for large-scale remote sensing image segmentation with hybrid sparse labeling

    Lin,J.,Yang,Z.,Liu,Q.,Yan,Y.,Ghamisi,P.,Xie,W.,Fang,L.,2025. Hslabeling: Toward efficient labeling for large-scale remote sensing image segmentation with hybrid sparse labeling. IEEE Transactions on Image Processing 34, 1864–1878. doi:10.1109/TIP.2025.3550039

  45. [45]

    Lin,T.Y.,Dollár,P.,Girshick,R.,He,K.,Hariharan,B.,Belongie,S.,

  46. [46]

    Feature pyramid networks for object detection, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944. doi:10.1109/CVPR.2017.106

  47. [47]

    Enhanced swin transformer and edge spatial attention for remote sensing image semantic segmentation

    Liu, F., Hu, Z., Li, L., Li, H., Liu, X., 2025a. Enhanced swin transformer and edge spatial attention for remote sensing image semantic segmentation. IEEE Signal Processing Letters 32, 1296–

  48. [48]

    doi:10.1109/LSP.2025.3550858

  49. [49]

    Comparison of multi-source satellite images for classifying marsh vegetation using deeplabv3 plus deep learning algorithm

    Liu,M.,Fu,B.,Xie,S.,He,H.,Lan,F.,Li,Y.,Lou,P.,Fan,D.,2021. Comparison of multi-source satellite images for classifying marsh vegetation using deeplabv3 plus deep learning algorithm. Ecological Indicators 125, 107562. doi:10.1016/j.ecolind.2021.107562

  50. [50]

    Liu,W.,Duan,P.,Xie,Z.,Kang,X.,Li,S.,2024. Learnfromsegment anythingmodel:Localregionhomogenizingforcross-domainremote sensing image segmentation, in: IGARSS 2024 - 2024 IEEE Interna- tional Geoscience and Remote Sensing Symposium, pp. 8351–8354. doi:10.1109/IGARSS53475.2024.10642007

  51. [51]

    Liu, Y., Cao, W., Xu, H., Xie, Y., Miao, C., 2025b. Scsnet: Semantic segmentation of carbon source and sink in remote sensing images based on multi-scale transformer and local feature fusion, in: 2025 InternationalJointConferenceonNeuralNetworks(IJCNN),pp.1–8. doi:10.1109/IJCNN64981.2025.11229288

  52. [52]

    Liu, Y., Shi, S., Wang, J., Zhong, Y., 2023. Seeing beyond the patch: Scale-adaptive semantic segmentation of high-resolution re- mote sensing imagery based on reinforcement learning, in: Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16868–16878

  53. [53]

    Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.,

  54. [54]

    In: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pp

    A convnet for the 2020s, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11966– 11976. doi:10.1109/CVPR52688.2022.01167

  55. [55]

    Multispectral remote sensingimagematchingviaimagetransferbyregularizedconditional generative adversarial networks and local feature

    Ma, T., Ma, J., Yu, K., Zhang, J., Fu, W., 2021. Multispectral remote sensingimagematchingviaimagetransferbyregularizedconditional generative adversarial networks and local feature. IEEE Geoscience and Remote Sensing Letters 18, 351–355. doi:10.1109/LGRS.2020. 2972361

  56. [56]

    Adjacent-scale multimodal fusion networks for semantic segmentation of remote sensing data

    Ma, X., Xu, X., Zhang, X., Pun, M.O., 2024a. Adjacent-scale multimodal fusion networks for semantic segmentation of remote sensing data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17, 20116–20128. doi:10.1109/ JSTARS.2024.3486906

  57. [57]

    Swint-resnet: An improved remote sensing image segmentation model based on trans- former

    Ma, Y., Wang, Y., Liu, X., Wang, H., 2024b. Swint-resnet: An improved remote sensing image segmentation model based on trans- former. IEEE Geoscience and Remote Sensing Letters 21, 1–5. doi:10.1109/LGRS.2024.3433034

  58. [58]

    Application of geospatial indices for mapping land cover/use change detection in a mining area

    Madasa, A., Orimoloye, I.R., Ololade, O.O., 2021. Application of geospatial indices for mapping land cover/use change detection in a mining area. Journal of African Earth Sciences 175, 104108. doi:10.1016/j.jafrearsci.2021.104108

  59. [61]

    Boundary- aware dual-stream network for vhr remote sensing images semantic segmentation

    Nong, Z., Su, X., Liu, Y., Zhan, Z., Yuan, Q., 2021. Boundary- aware dual-stream network for vhr remote sensing images semantic segmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 5260–5268. doi:10.1109/ JSTARS.2021.3076035

  60. [62]

    Improving estimation of lai dynamic by fusion of morphological and vegetation indices based on uav imagery

    Qiao, L., Gao, D., Zhao, R., Tang, W., An, L., Li, M., Sun, H., 2022. Improving estimation of lai dynamic by fusion of morphological and vegetation indices based on uav imagery. Computers and Electronics in Agriculture 192, 106603. doi:10.1016/j.compag.2021.106603

  61. [65]

    Multispectral semantic segmen- tation for land cover classification: An overview

    Ramos, L.T., Sappa, A.D., 2024. Multispectral semantic segmen- tation for land cover classification: An overview. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17, 14295–14336. doi:10.1109/JSTARS.2024.3438620

  62. [66]

    Dual-branch convnext-based networkwithattentionalfusiondecodingforlandcoverclassification using multispectral imagery, in: SoutheastCon 2025, pp

    Ramos, L.T., Sappa, A.D., 2025a. Dual-branch convnext-based networkwithattentionalfusiondecodingforlandcoverclassification using multispectral imagery, in: SoutheastCon 2025, pp. 187–194. doi:10.1109/SoutheastCon56624.2025.10971457

  63. [67]

    Leveraging u-net and selective feature extraction for land cover classification using remote sensing imagery

    Ramos, L.T., Sappa, A.D., 2025b. Leveraging u-net and selective feature extraction for land cover classification using remote sensing imagery. Scientific Reports 15, 784. doi:https://doi.org/10.1038/ s41598-024-84795-1

  64. [68]

    Relationshipbetweendroughtandsoilerosion based on the normalized differential water index (ndwi) and revised universal soil loss equation (rusle) model

    Rendana, M., Razi Idris, W.M., Alia, F., Effendi Rahim, S., Yamin, M.,Izzudin,M.,2024. Relationshipbetweendroughtandsoilerosion based on the normalized differential water index (ndwi) and revised universal soil loss equation (rusle) model. Regional Sustainability 5, 100183. doi:https://doi.org/10.1016/j.regsus.2024.100183

  65. [69]

    Finding differences between transformers and convnets using counterfactual simulation testing, in: Advances in Neural Information Processing Systems, pp

    Ruiz, N., Bargal, S., Xie, C., Saenko, K., Sclaroff, S., 2022. Finding differences between transformers and convnets using counterfactual simulation testing, in: Advances in Neural Information Processing Systems, pp. 14403–14418

  66. [70]

    Sajol, M.S.I., Alvi, S.T., Era, C.A.A., 2024. Performance assess- ment of advanced cnn and transformer architectures in skin can- cer detection, in: 2024 11th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), pp. 1–8. doi:10.1109/EECSI63442.2024.10776508

  67. [71]

    Semantic segmentation of multispectral images using res-seg-net model, in: 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pp

    Saxena, N., N., K.B., Raman, B., 2020. Semantic segmentation of multispectral images using res-seg-net model, in: 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pp. 154–

  68. [72]

    doi:10.1109/ICSC.2020.00030

  69. [73]

    Adaptive multitype contrastive views generation for remote sensing imagesemanticsegmentation

    Shi, C., Han, P., Zhao, M., Fang, L., Miao, Q., Pun, C.M., 2025. Adaptive multitype contrastive views generation for remote sensing imagesemanticsegmentation. IEEETransactionsonGeoscienceand Ramos and Sappa: Preprint submitted for review Page 20 of 21 Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmenta...

  70. [74]

    Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z., 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883. doi:10.1109/CVPR.2016.207

  71. [75]

    A review of machine learning in processing remote sensing data for mineral exploration

    Shirmard, H., Farahbakhsh, E., Müller, R.D., Chandra, R., 2022. A review of machine learning in processing remote sensing data for mineral exploration. Remote Sensing of Environment 268, 112750. doi:10.1016/j.rse.2021.112750

  72. [76]

    Self-supervisedremote sensing feature learning: Learning paradigms, challenges, and future works

    Tao,C.,Qi,J.,Guo,M.,Zhu,Q.,Li,H.,2023. Self-supervisedremote sensing feature learning: Learning paradigms, challenges, and future works. IEEE Transactions on Geoscience and Remote Sensing 61, 1–26. doi:10.1109/TGRS.2023.3276853

  73. [77]

    Land- cover classification with high-resolution remote sensing im- ages using transferable deep models.Remote Sensing of En- vironment, doi: 10.1016/j.rse.2019.111322, 2020

    Tong, X.Y., Xia, G.S., Lu, Q., Shen, H., Li, S., You, S., Zhang, L., 2020. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment 237, 111322. doi:10.1016/j.rse.2019.111322

  74. [78]

    Enabling country-scale land covermappingwithmeter-resolutionsatelliteimagery.ISPRSJournal of Photogrammetry and Remote Sensing 196, 178–196

    Tong, X.Y., Xia, G.S., Zhu, X.X., 2023. Enabling country-scale land covermappingwithmeter-resolutionsatelliteimagery.ISPRSJournal of Photogrammetry and Remote Sensing 196, 178–196. doi:10.1016/ j.isprsjprs.2022.12.011

  75. [79]

    Vasanthi, A., Joshitha, K.L., 2024. Water body detection utilizing ndwi, ndvi and nmdwi indices in sen-12 spectral imagery, in: 2024 First International Conference on Electronics, Communication and SignalProcessing(ICECSP),pp.1–5. doi: 10.1109/ICECSP61809.2024. 10698263

  76. [80]

    Multispectral and hyperspectral image fusion in remote sensing: A survey

    Vivone, G., 2023. Multispectral and hyperspectral image fusion in remote sensing: A survey. Information Fusion 89, 405–417. doi:10. 1016/j.inffus.2022.08.032

  77. [81]

    3d Shapenets: A Deep Representation for Volumetric Shapes

    Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P., 2019. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2512–2521. doi:10.1109/CVPR. 2019.00262

  78. [82]

    Advances in Engineering Software42(12), 1020–1034 (2011)

    Wang, F., Yi, Q., Hu, J., Xie, L., Yao, X., Xu, T., Zheng, J., 2021. Combining spectral and textural information in uav hyperspectral images to estimate rice grain yield. International Journal of Applied Earth Observation and Geoinformation 102, 102397. doi:10.1016/j. jag.2021.102397

  79. [83]

    Estimation of vegetation traits with kernel ndvi

    Wang, Q., Álvaro Moreno-Martínez, Muñoz-Marí, J., Campos- Taberner, M., Camps-Valls, G., 2023. Estimation of vegetation traits with kernel ndvi. ISPRS Journal of Photogrammetry and Remote Sensing 195, 408–417. doi:https://doi.org/10.1016/j.isprsjprs. 2022.12.019

  80. [84]

    Cbam: Convolutional block attention module, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y

    Woo, S., Park, J., Lee, J.Y., Kweon, I.S., 2018. Cbam: Convolutional block attention module, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham. pp. 3–19

Showing first 80 references.