arxiv: 2602.10137 · v1 · submitted 2026-02-08 · 💻 cs.CV · cs.AI

Recognition: no theorem link

Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation

Leo Thomas Ramos , Angel D. Sappa

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:08 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords multispectral semantic segmentationConvNeXtfeature fusionattention mechanismland cover classificationremote sensingencoder-decoder network

0 comments

The pith

MeCSAFNet uses separate ConvNeXt encoders and attentional fusion to improve multispectral land cover segmentation accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MeCSAFNet, a multi-encoder network for semantic segmentation of multispectral images. It processes visible and non-visible channels through dual ConvNeXt encoders, reconstructs features with separate decoders, and fuses them in a dedicated decoder using CBAM attention to combine spatial and spectral information. The model supports 4-channel inputs such as RGB plus NIR and 6-channel inputs that add NDVI and NDWI indices. Experiments on the Five-Billion-Pixels and Potsdam datasets show mIoU gains of 14 to 19 percent over U-Net and SegFormer on FBP and 4 to 9 percent over DeepLabV3+ and SegFormer on Potsdam. A sympathetic reader would care because more accurate segmentation from multispectral data can support better environmental monitoring and land management decisions.

Core claim

The paper claims that its MeCSAFNet architecture, which applies dual ConvNeXt encoders to process spectral channels independently before integrating features through a multi-scale fusion decoder with CBAM attention and ASAU activation, produces higher mIoU scores than standard encoder-decoder models when segmenting multispectral land cover imagery on the FBP and Potsdam benchmarks.

What carries the argument

Dual ConvNeXt encoders that handle visible and non-visible channels separately, followed by a fusion decoder that performs multi-scale attentional feature combination with CBAM to merge fine spatial cues and high-level spectral representations.

If this is right

MeCSAFNet-base with 6-channel input raises mIoU by 14.72 to 19.21 percent over U-Net and SegFormer on the FBP dataset.
MeCSAFNet-large with 4-channel input raises mIoU by 4.80 to 9.11 percent over DeepLabV3+ and SegFormer on the Potsdam dataset.
Compact variants of the model maintain strong accuracy while lowering training time and inference cost.
The same architecture works without modification for both 4-channel RGB+NIR inputs and 6-channel inputs that include NDVI and NDWI indices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Independent encoding of spectral channels may prevent loss of band-specific information that occurs when all channels are mixed in a single early layer.
The attentional fusion step could prove useful for other remote sensing tasks that combine spatial detail with spectral signatures, such as change detection.
The design suggests a path for scaling to hyperspectral data where the number of distinct channels is much larger.

Load-bearing premise

The reported mIoU gains come from the dual-encoder and attentional fusion design rather than from dataset-specific tuning, hyperparameter choices, or unstated differences in training procedures.

What would settle it

Re-training all compared models including U-Net, SegFormer, and DeepLabV3+ under identical training protocols, data splits, and hyperparameters on the same FBP and Potsdam datasets, then checking whether the mIoU differences disappear.

Figures

Figures reproduced from arXiv: 2602.10137 by Angel D. Sappa, Leo Thomas Ramos.

**Figure 2.** Figure 2: Example images of the Potsdam dataset. categories: impervious surfaces, buildings, low vegetation, trees, cars, and clutter/background. Additionally, the images are provided in TIFF format and are accompanied by digital surface models (DSM) with a ground sampling distance of 5 cm. Examples from this dataset are shown in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the utilized architecture in this work. layer, enabling wider architectures without increasing computational cost and facilitating the learning of varied and rich features. Furthermore, ConvNeXt incorporates the inverted bottleneck design, similar to that in the Swin Transformer. This approach allows the hidden dimension of the multilayer perceptrons to be larger than the input dimension, ena… view at source ↗

**Figure 4.** Figure 4: Comparison between ResNet, Swin Transformer, and ConvNeXt blocks. features are extracted hierarchically, with finer details being captured in the later stages. At the end of the feature extraction process, the baseline ConvNeXt architecture incorporates a global average pooling layer followed by a fully convolutional layer as the classifier. The ConvNeXt architecture is available in four variants: Tiny… view at source ↗

**Figure 5.** Figure 5: ConvNeXt architecture structure (base version). Stages are connected sequentially, where the output of each stage serves as the input to the subsequent stage through downsampling operations [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Structure of the employed decoder block. Given the mentioned characteristics, ASAU is particularly suitable for segmentation tasks that require fine-grained spatial precision and sensitivity to subtle variations. We include it in our architecture to enhance the model’s ability to capture fine textural differences and class boundaries, ultimately contributing to more accurate segmentation results. Compare… view at source ↗

**Figure 8.** Figure 8: Structure of the employed fusion block. CBAM is a hybrid attention mechanism that integrates two complementary strategies: channel attention and spatial attention, as shown in [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 7.** Figure 7: Comparison of the original ASAU formulation and the modified ASAU version used in this work. in [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 9.** Figure 9: CBAM mechanism and its components. ⊕ denotes element-wise summation, ⊗ denotes element-wise multiplication (attention gating), and AvgPool and MaxPool indicate average and max pooling operations, respectively. 3.3.2. Intersection over union Intersection over Union (IoU) is a critical metric for evaluating segmentation tasks. It quantifies the degree of overlap between the predicted segmentation mask and th… view at source ↗

**Figure 10.** Figure 10: Segmentation results of our approach compared to other baseline methods on the Five-Billion-Pixels dataset. Regions demarcated by dashed lines indicate the areas where the most substantial discrepancies between models are observed. epochs for full convergence. Nevertheless, the large variant still outperforms the tiny and small models across all metrics, indicating a clear upward performance trend with i… view at source ↗

**Figure 11.** Figure 11: Segmentation results of our approach compared to other baseline methods on the Potsdam dataset. Regions demarcated by dashed lines indicate the areas where the most substantial discrepancies between models are observed. when compared to a selection of baseline and state-of-theart models. Compared to standard baselines such as UNet, DeepLabV3+, and SegFormer, our models demonstrate substantial gains in … view at source ↗

read the original abstract

This work proposes MeCSAFNet, a multi-branch encoder-decoder architecture for land cover segmentation in multispectral imagery. The model separately processes visible and non-visible channels through dual ConvNeXt encoders, followed by individual decoders that reconstruct spatial information. A dedicated fusion decoder integrates intermediate features at multiple scales, combining fine spatial cues with high-level spectral representations. The feature fusion is further enhanced with CBAM attention, and the ASAU activation function contributes to stable and efficient optimization. The model is designed to process different spectral configurations, including a 4-channel (4c) input combining RGB and NIR bands, as well as a 6-channel (6c) input incorporating NDVI and NDWI indices. Experiments on the Five-Billion-Pixels (FBP) and Potsdam datasets demonstrate significant performance gains. On FBP, MeCSAFNet-base (6c) surpasses U-Net (4c) by +19.21%, U-Net (6c) by +14.72%, SegFormer (4c) by +19.62%, and SegFormer (6c) by +14.74% in mIoU. On Potsdam, MeCSAFNet-large (4c) improves over DeepLabV3+ (4c) by +6.48%, DeepLabV3+ (6c) by +5.85%, SegFormer (4c) by +9.11%, and SegFormer (6c) by +4.80% in mIoU. The model also achieves consistent gains over several recent state-of-the-art approaches. Moreover, compact variants of MeCSAFNet deliver notable performance with lower training time and reduced inference cost, supporting their deployment in resource-constrained environments.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports sizable mIoU gains from a dual-ConvNeXt encoder plus attentional fusion decoder on FBP and Potsdam, but the gains are not isolated from training differences.

read the letter

The main takeaway is that MeCSAFNet uses separate ConvNeXt encoders for visible and non-visible bands, a multi-scale fusion decoder with CBAM attention, and the ASAU activation to handle 4-channel and 6-channel multispectral inputs for land-cover segmentation. It posts clear numerical lifts over U-Net, SegFormer, and DeepLabV3+ on the two public datasets, with the largest deltas on FBP reaching 14-19 points in mIoU depending on the baseline and channel count. The compact variants also look useful for lower compute budgets. That is the concrete advance: a working architecture variant tuned to spectral inputs that already exist in the literature. The experiments cover both standard RGB+NIR and the NDVI/NDWI augmented 6-channel case, which matches real remote-sensing practice. The soft spot is the missing controls. No ablation tables show what happens when the fusion decoder or CBAM is removed while keeping everything else fixed, and the baselines are not retrained under the authors' exact optimizer, schedule, or augmentations. Without those steps the deltas cannot be confidently attributed to the dual-encoder design rather than protocol differences. Error bars and significance tests are also absent. This work is aimed at the remote-sensing segmentation crowd that needs practical accuracy on land-cover tasks. A reader already working on multispectral models could pull the compact versions for speed, but the paper does not change broader understanding of attention or fusion. It deserves peer review because the datasets are public and the reported numbers are large enough that referees can request the missing ablations and decide whether the claims survive. I would send it out rather than desk-reject.

Referee Report

3 major / 2 minor

Summary. The paper proposes MeCSAFNet, a multi-encoder ConvNeXt-based architecture for multispectral semantic segmentation. It processes visible and non-visible channels via separate encoders, reconstructs features with individual decoders, and fuses multi-scale representations in a dedicated decoder using CBAM attention and the ASAU activation. Experiments on the FBP and Potsdam datasets report large mIoU gains for both 4-channel and 6-channel inputs over U-Net, SegFormer, and DeepLabV3+ baselines, with additional claims of efficiency for compact variants.

Significance. If the reported mIoU improvements can be shown to arise specifically from the dual-encoder design and attentional fusion rather than training differences, the approach would offer a practical advance for multispectral land-cover segmentation, especially in settings where compact models with lower inference cost are needed.

major comments (3)

[Experimental results] Experimental results section: the headline mIoU claims (e.g., MeCSAFNet-base (6c) surpassing U-Net (4c) by +19.21% on FBP and MeCSAFNet-large (4c) surpassing DeepLabV3+ (4c) by +6.48% on Potsdam) are presented without ablation tables that remove only the fusion decoder or CBAM while keeping encoder count and channel handling fixed, so the attribution to the proposed components remains unverified.
[Experimental setup] Experimental setup: no information is given on whether the baseline models were retrained under identical optimizer, augmentation, batch size, and schedule choices as MeCSAFNet; without this, the observed deltas cannot be isolated from protocol differences.
[Results tables] Results tables: the mIoU figures are reported as single-point values with no error bars, standard deviations across runs, or statistical significance tests, weakening confidence in the magnitude of the claimed gains.

minor comments (2)

[Method] The description of the ASAU activation function is introduced without a precise mathematical definition or comparison to standard alternatives such as ReLU or GELU.
[Figure 1] Figure captions for the network diagram could more explicitly label the visible vs. non-visible encoder branches and the multi-scale fusion paths.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major point below with clarifications and indicate where revisions will be made to improve the experimental rigor.

read point-by-point responses

Referee: Experimental results section: the headline mIoU claims (e.g., MeCSAFNet-base (6c) surpassing U-Net (4c) by +19.21% on FBP and MeCSAFNet-large (4c) surpassing DeepLabV3+ (4c) by +6.48% on Potsdam) are presented without ablation tables that remove only the fusion decoder or CBAM while keeping encoder count and channel handling fixed, so the attribution to the proposed components remains unverified.

Authors: We acknowledge that targeted ablations isolating the fusion decoder and CBAM (while fixing encoder count and channel handling) would strengthen attribution of the gains. Our current experiments focus on end-to-end comparisons, but we will add these specific ablation studies in the revised manuscript. revision: yes
Referee: Experimental setup: no information is given on whether the baseline models were retrained under identical optimizer, augmentation, batch size, and schedule choices as MeCSAFNet; without this, the observed deltas cannot be isolated from protocol differences.

Authors: All baselines (U-Net, SegFormer, DeepLabV3+) were retrained from scratch using identical settings: AdamW optimizer, the same augmentation pipeline, batch size of 8, and the identical learning rate schedule as MeCSAFNet. We will expand the experimental setup section to state this explicitly. revision: yes
Referee: Results tables: the mIoU figures are reported as single-point values with no error bars, standard deviations across runs, or statistical significance tests, weakening confidence in the magnitude of the claimed gains.

Authors: We agree that reporting variability would increase confidence. Due to high computational cost on these large datasets, results are from single runs. In the revision we will add a limitations paragraph noting this and highlighting the consistency of gains across model sizes and datasets as supporting evidence. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical architecture proposal with external dataset benchmarks

full rationale

The manuscript proposes MeCSAFNet as a dual-ConvNeXt encoder architecture with CBAM attentional fusion and ASAU activation, then reports mIoU numbers on the public FBP and Potsdam datasets against published baselines (U-Net, SegFormer, DeepLabV3+). No equations, uniqueness theorems, or parameter-fitting steps appear in the provided text. All performance claims are direct empirical comparisons; the architecture choices are presented as design decisions rather than derived quantities that reduce to their own inputs by construction. No self-citations are invoked to close any logical loop. This is the standard non-circular pattern for an applied CV architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the empirical performance of a neural network whose design choices (dual encoders, attentional fusion, specific activation) are treated as given; no explicit free parameters, mathematical axioms, or invented physical entities are stated in the abstract.

pith-pipeline@v0.9.0 · 5626 in / 1395 out tokens · 38313 ms · 2026-05-16T06:08:08.015645+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

97 extracted references · 97 canonical work pages

[1]

Enhancing food crop classification in agriculture through dipper throat optimization and deep learning with remote sensing

Antony, A., R, G.K., 2024. Enhancing food crop classification in agriculture through dipper throat optimization and deep learning with remote sensing. e-Prime - Advances in Electrical Engineering, Electronics and Energy 9, 100732. doi:10.1016/j.prime.2024.100732

work page doi:10.1016/j.prime.2024.100732 2024
[2]

Convnext based semi-supervised approach with consistency regularization for weedsclassification

Benchallal, F., Hafiane, A., Ragot, N., Canals, R., 2024. Convnext based semi-supervised approach with consistency regularization for weedsclassification. ExpertSystemswithApplications239,122222. doi:https://doi.org/10.1016/j.eswa.2023.122222

work page doi:10.1016/j.eswa.2023.122222 2024
[3]

Biswas, K., Jha, D., Tomar, N.K., Karri, M., Reza, A., Durak, G., Medetalibeyoglu,A.,Antalek,M.,Velichko,Y.,Ladner,D.,Borhani, A., Bagci, U., 2024. Adaptive smooth activation function for im- proved organ segmentation and disease diagnosis, in: Medical Image Ramos and Sappa: Preprint submitted for review Page 18 of 21 Multi-encoder ConvNeXt Network with S...

work page 2024
[4]

Dual streamfusionnetworkformulti-spectralhighresolutionremotesens- ing image segmentation, in: Pattern Recognition and Computer Vi- sion, Springer International Publishing, Cham

Cao, Y., Shi, Y., Liu, Y., Huo, C., Xiang, S., Pan, C., 2021. Dual streamfusionnetworkformulti-spectralhighresolutionremotesens- ing image segmentation, in: Pattern Recognition and Computer Vi- sion, Springer International Publishing, Cham. pp. 537–547. doi:10. 1007/978-3-030-88007-1_44

work page 2021
[5]

Coarse-to-finesemantic segmentation of satellite images

Chen,H.,Yang,W.,Liu,L.,Xia,G.S.,2024. Coarse-to-finesemantic segmentation of satellite images. ISPRS Journal of Photogrammetry and Remote Sensing 217, 1–17. doi:10.1016/j.isprsjprs.2024.07. 028

work page doi:10.1016/j.isprsjprs.2024.07 2024
[6]

Strengthen the feature distinguishability of geo-object details in the semantic segmentation of high-resolution remote sensing images

Chen, J., Wang, H., Guo, Y., Sun, G., Zhang, Y., Deng, M., 2021. Strengthen the feature distinguishability of geo-object details in the semantic segmentation of high-resolution remote sensing images. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 2327–2340. doi:10.1109/JSTARS.2021.3053067

work page doi:10.1109/jstars.2021.3053067 2021
[7]

A simple framework for contrastive learning of visual representations, in: Pro- ceedings of the 37th International Conference on Machine Learning, JMLR.org

Chen, T., Kornblith, S., Norouzi, M., Hinton, G., 2020. A simple framework for contrastive learning of visual representations, in: Pro- ceedings of the 37th International Conference on Machine Learning, JMLR.org. doi:10.5555/3524938.3525087

work page doi:10.5555/3524938.3525087 2020
[8]

Xception: DeepLearning with Depthwise Separa- bleConvolutions,in:2017IEEEConferenceonComputerVisionand PatternRecognition(CVPR),IEEEComputerSociety,LosAlamitos, CA,USA.pp.1800–1807

Chollet, F., 2017. Xception: DeepLearning with Depthwise Separa- bleConvolutions,in:2017IEEEConferenceonComputerVisionand PatternRecognition(CVPR),IEEEComputerSociety,LosAlamitos, CA,USA.pp.1800–1807. URL:10.1109/CVPR.2017.195,doi: 10.1109/ CVPR.2017.195

work page doi:10.1109/cvpr.2017.195 2017
[9]

Dong,R.,Mou,L.,Chen,M.,Li,W.,Tong,X.Y.,Yuan,S.,Zhang,L., Zheng, J., Zhu, X.X., Fu, H., 2023. Large-scale land cover mapping with fine-grained classes via class-aware semi-supervised semantic segmentation,in:2023IEEE/CVFInternationalConferenceonCom- puterVision(ICCV),pp.16737–16747. doi: 10.1109/ICCV51070.2023. 01539

work page doi:10.1109/iccv51070.2023 2023
[10]

Frequency-aware robust multidimensional information fusion framework for remote sensing image segmentation

Fan, J., Li, J., Liu, Y., Zhang, F., 2024. Frequency-aware robust multidimensional information fusion framework for remote sensing image segmentation. Engineering Applications of Artificial Intelli- gence 129, 107638. doi:10.1016/j.engappai.2023.107638

work page doi:10.1016/j.engappai.2023.107638 2024
[11]

Fu, G., Lin, K., Mu, S., Lu, C., Wang, X., Wang, T., 2026. Offline classification training-online regression prediction mode for spindle thermal error prediction based on convnext-resnet parallel hybrid network with vision-based thermal image measurement. Measure- ment257,119025. doi: https://doi.org/10.1016/j.measurement.2025. 119025

work page doi:10.1016/j.measurement.2025 2026
[12]

Gao, L., Liu, H., Yang, M., Chen, L., Wan, Y., Xiao, Z., Qian, Y.,

work page
[13]

IEEE JournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 14, 10990–11003

Stransfuse: Fusing swin transformer and convolutional neural network for remote sensing image semantic segmentation. IEEE JournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 14, 10990–11003. doi:10.1109/JSTARS.2021.3119654

work page doi:10.1109/jstars.2021.3119654 2021
[14]

Hyperspectral and multispectral classification for coastal wetland using depthwise feature interaction network

Gao,Y.,Li,W.,Zhang,M.,Wang,J.,Sun,W.,Tao,R.,Du,Q.,2022. Hyperspectral and multispectral classification for coastal wetland using depthwise feature interaction network. IEEE Transactions on Geoscience and Remote Sensing 60, 1–15. doi:10.1109/TGRS.2021. 3097093

work page doi:10.1109/tgrs.2021 2022
[15]

U-net convolutional networks for mining land cover classification based on high-resolution uav imagery

Giang, T.L., Dang, K.B., Toan Le, Q., Nguyen, V.G., Tong, S.S., Pham, V.M., 2020. U-net convolutional networks for mining land cover classification based on high-resolution uav imagery. IEEE Access 8, 186257–186273. doi:10.1109/ACCESS.2020.3030112

work page doi:10.1109/access.2020.3030112 2020
[16]

Staal, M

He, A., Li, T., Li, N., Wang, K., Fu, H., 2021. Cabnet: Category attention block for imbalanced diabetic retinopathy grading. IEEE Transactions on Medical Imaging 40, 143–153. doi:10.1109/TMI. 2020.3023463

work page doi:10.1109/tmi 2021
[17]

Csit:Amultiscale vision transformer for hyperspectral image classification

He,W.,Huang,W.,Liao,S.,Xu,Z.,Yan,J.,2022. Csit:Amultiscale vision transformer for hyperspectral image classification. IEEE JournalofSelectedTopicsinAppliedEarthObservationsandRemote Sensing 15, 9266–9277. doi:10.1109/JSTARS.2022.3216335

work page doi:10.1109/jstars.2022.3216335 2022
[18]

Spectralgpt: Spectral remote sensing foundation model

Hong,D.,Zhang,B.,Li,X.,Li,Y.,Li,C.,Yao,J.,Yokoya,N.,Li,H., Ghamisi,P.,Jia,X.,Plaza,A.,Gamba,P.,Benediktsson,J.A.,Chanus- sot, J., 2024. Spectralgpt: Spectral remote sensing foundation model. IEEE Transactions on Pattern Analysis and Machine Intelligence 46, 5227–5244. doi:10.1109/TPAMI.2024.3362475

work page doi:10.1109/tpami.2024.3362475 2024
[19]

A commentary review on the use of normalized difference vegetation index(ndvi)intheeraofpopularremotesensing

Huang, S., Tang, L., Hupy, J.P., Wang, Y., Shao, G., 2020. A commentary review on the use of normalized difference vegetation index(ndvi)intheeraofpopularremotesensing. JournalofForestry Research 32, 1–6. doi:https://doi.org/10.1007/s11676-020-01155-1

work page doi:10.1007/s11676-020-01155-1 2020
[20]

Multiscale semantic segmentation of remote sensing images based on edge optimization

Huang, W., Deng, F., Liu, H., Ding, M., Yao, Q., 2025. Multiscale semantic segmentation of remote sensing images based on edge optimization. IEEETransactionsonGeoscienceandRemoteSensing 63, 1–13. doi:10.1109/TGRS.2025.3553524

work page doi:10.1109/tgrs.2025.3553524 2025
[21]

Demystifyingnormalized differencevegetationindex(ndvi)forgreennessexposureassessments and policy interventions in urban greening

delaIglesiaMartinez,A.,Labib,S.,2023. Demystifyingnormalized differencevegetationindex(ndvi)forgreennessexposureassessments and policy interventions in urban greening. Environmental Research 220, 115155. doi:https://doi.org/10.1016/j.envres.2022.115155

work page doi:10.1016/j.envres.2022.115155 2023
[22]

Application of uav-based photogrammetry and normalised water index (ndwi) to estimate the rock mass rating (rmr): A case study

Ismail, A., A Rashid, A.S., Sa’ari, R., Rasib, A.W., Mustaffar, M., Abdullah, R.A., Kassim, A., Mohd Yusof, N., Abd Rahaman, N., Mohd Apandi, N., Kalatehjari, R., 2022. Application of uav-based photogrammetry and normalised water index (ndwi) to estimate the rock mass rating (rmr): A case study. Physics and Chemistry of the Earth,PartsA/B/C127,103161. doi...

work page doi:10.1016/j.pce 2022
[23]

Acomprehensivereviewofremote sensing platforms, sensors, and applications in nut crops

Jafarbiglu,H.,Pourreza,A.,2022. Acomprehensivereviewofremote sensing platforms, sensors, and applications in nut crops. Computers and Electronics in Agriculture 197, 106844. doi:10.1016/j.compag. 2022.106844

work page doi:10.1016/j.compag 2022
[24]

Spatio-temporal analysis of land use land cover change and its impact on land surface temperature of sialkot city, pakistan

Javaid, K., Ghafoor, G.Z., Sharif, F., Shahid, M.G., Shahzad, L., Ghafoor, N., Hayyat, M.U., Farhan, M., 2023. Spatio-temporal analysis of land use land cover change and its impact on land surface temperature of sialkot city, pakistan. Scientific Reports 13. doi:10. 1038/s41598-023-49608-x

work page 2023
[25]

IET image processing doi:10.1049/ipr2.13101

Jiang,J.,Feng,X.,Huang,H.,2024.Semanticsegmentationofremote sensing images based on dual-channel attention mechanism. IET image processing doi:10.1049/ipr2.13101

work page doi:10.1049/ipr2.13101 2024
[26]

Karmakar, P., Teng, S.W., Murshed, M., Pang, S., Li, Y., Lin, H.,

work page
[27]

Remote Sensing Applications: Society and Environment 33, 101093

Crop monitoring by multimodal remote sensing: A review. Remote Sensing Applications: Society and Environment 33, 101093. doi:10.1016/j.rsase.2023.101093

work page doi:10.1016/j.rsase.2023.101093 2023
[28]

Review on convolutional neural networks (cnn) in vegetation remote sensing

Kattenborn, T., Leitloff, J., Schiefer, F., Hinz, S., 2021. Review on convolutional neural networks (cnn) in vegetation remote sensing. ISPRS Journal of Photogrammetry and Remote Sensing 173, 24–49. doi:10.1016/j.isprsjprs.2020.12.010

work page doi:10.1016/j.isprsjprs.2020.12.010 2021
[29]

Afusedmulti-subfrequencybandsandcbamssvep- bci classification method based on convolutional neural network

Lei, D., Dong, C., Guo, H., Ma, P., Liu, H., Bao, N., Kang, H., Chen, X.,Wu,Y.,2024. Afusedmulti-subfrequencybandsandcbamssvep- bci classification method based on convolutional neural network. Scientific Reports 14, 8616. doi:10.1038/s41598-024-59348-1

work page doi:10.1038/s41598-024-59348-1 2024
[30]

Dual-path feature fusion network for semantic segmentation of remote sensing images

Li, B., Zhang, Y., Zhang, Y., Li, B., Li, Z., 2024a. Dual-path feature fusion network for semantic segmentation of remote sensing images. IEEE Geoscience and Remote Sensing Letters 21, 1–5. doi:10.1109/ LGRS.2024.3402690

work page arXiv 2024
[31]

Csnet:Aremote sensing image semantic segmentation network based on coordinate attention and skip connections

Li,J.,Zhang,H.,Chen,L.,He,B.,Chen,H.,2025a. Csnet:Aremote sensing image semantic segmentation network based on coordinate attention and skip connections. Remote Sensing 17. doi:10.3390/ rs17122048

work page
[32]

Mscr-hrnetv2: High- resolution remote sensing image segmentation for railway scenes, in: 2023 China Automation Congress (CAC), pp

Li, L., Yang, Q., Shi, R., Teng, J., 2023a. Mscr-hrnetv2: High- resolution remote sensing image segmentation for railway scenes, in: 2023 China Automation Congress (CAC), pp. 5809–5814. doi:10. 1109/CAC59555.2023.10451482

work page arXiv 2023
[33]

Li, M.J., Zhu, M.C., Ma, Z., Li, P.S., Zhang, X.B., Hou, A.K., Shi, J.B., He, Y., Chen, K., Weng, T., He, Z.Y., Zheng, Z.Z., Jiang, L.,

work page
[34]

Classification of surface natural resources based on u-net and gf-1 satellite images, in: 2020 17th International Computer Confer- ence on Wavelet Active Media Technology and Information Process- ing (ICCWAMTIP), pp. 179–182. doi:10.1109/ICCWAMTIP51612.2020. 9317315

work page doi:10.1109/iccwamtip51612.2020 2020
[35]

Cmpf-unet: a convnext multi-scale pyramid fusion u-shaped network for multi-category segmentation of remote sensing images

Li, N., Yu, X., Yu, M., 2024b. Cmpf-unet: a convnext multi-scale pyramid fusion u-shaped network for multi-category segmentation of remote sensing images. Geocarto International 39, 2311217. doi:10.1080/10106049.2024.2311217

work page doi:10.1080/10106049.2024.2311217 2024
[36]

Li, P., Lin, Y., Schultz-Fellenz, E., 2024c. Contextual hourglass networkforsemanticsegmentationofhighresolutionaerialimagery, Ramos and Sappa: Preprint submitted for review Page 19 of 21 Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmentation in: 2024 5th International Conference on Electronic Communi...

work page arXiv 2024
[37]

Multistage attentionresu-netforsemanticsegmentationoffine-resolutionremote sensing images

Li, R., Zheng, S., Duan, C., Su, J., Zhang, C., 2022. Multistage attentionresu-netforsemanticsegmentationoffine-resolutionremote sensing images. IEEE Geoscience and Remote Sensing Letters 19, 1–5. doi:10.1109/LGRS.2021.3063381

work page doi:10.1109/lgrs.2021.3063381 2022
[38]

Adaptingcross-sensorhigh-resolution remote sensing imagery for land use classification

Li,W.,Sun,K.,Wei,J.,2025b. Adaptingcross-sensorhigh-resolution remote sensing imagery for land use classification. Remote Sensing

work page
[39]

doi:10.3390/rs17050927

work page doi:10.3390/rs17050927
[40]

Soil carbon content prediction using multi-source data feature fusion of deep learning based on spectral and hyperspectral images

Li, X., Li, Z., Qiu, H., Chen, G., Fan, P., 2023b. Soil carbon content prediction using multi-source data feature fusion of deep learning based on spectral and hyperspectral images. Chemosphere 336, 139161. doi:10.1016/j.chemosphere.2023.139161

work page doi:10.1016/j.chemosphere.2023.139161 2023
[41]

Dspcanet: Dual- channelscale-awaresegmentationnetworkwithpositionandchannel attentionsforhigh-resolutionaerialimages

Li, Y.C., Li, H.C., Hu, W.S., Yu, H.L., 2021. Dspcanet: Dual- channelscale-awaresegmentationnetworkwithpositionandchannel attentionsforhigh-resolutionaerialimages. IEEEJournalofSelected TopicsinAppliedEarthObservationsandRemoteSensing14,8552–

work page 2021
[42]

doi:10.1109/JSTARS.2021.3102137

work page doi:10.1109/jstars.2021.3102137 2021
[43]

Liang, Z., Wang, X., 2022. Semantic segmentation network with band-locationadaptiveselectionmechanismformultispectralremote sensing images, in: IGARSS 2022 - 2022 IEEE International Geo- science and Remote Sensing Symposium, pp. 3488–3491. doi:10. 1109/IGARSS46834.2022.9884212

work page arXiv 2022
[44]

Hslabeling: Toward efficient labeling for large-scale remote sensing image segmentation with hybrid sparse labeling

Lin,J.,Yang,Z.,Liu,Q.,Yan,Y.,Ghamisi,P.,Xie,W.,Fang,L.,2025. Hslabeling: Toward efficient labeling for large-scale remote sensing image segmentation with hybrid sparse labeling. IEEE Transactions on Image Processing 34, 1864–1878. doi:10.1109/TIP.2025.3550039

work page doi:10.1109/tip.2025.3550039 2025
[45]

Lin,T.Y.,Dollár,P.,Girshick,R.,He,K.,Hariharan,B.,Belongie,S.,

work page
[46]

Feature pyramid networks for object detection, in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 936–944. doi:10.1109/CVPR.2017.106

work page doi:10.1109/cvpr.2017.106 2017
[47]

Enhanced swin transformer and edge spatial attention for remote sensing image semantic segmentation

Liu, F., Hu, Z., Li, L., Li, H., Liu, X., 2025a. Enhanced swin transformer and edge spatial attention for remote sensing image semantic segmentation. IEEE Signal Processing Letters 32, 1296–

work page
[48]

doi:10.1109/LSP.2025.3550858

work page doi:10.1109/lsp.2025.3550858 2025
[49]

Comparison of multi-source satellite images for classifying marsh vegetation using deeplabv3 plus deep learning algorithm

Liu,M.,Fu,B.,Xie,S.,He,H.,Lan,F.,Li,Y.,Lou,P.,Fan,D.,2021. Comparison of multi-source satellite images for classifying marsh vegetation using deeplabv3 plus deep learning algorithm. Ecological Indicators 125, 107562. doi:10.1016/j.ecolind.2021.107562

work page doi:10.1016/j.ecolind.2021.107562 2021
[50]

Liu,W.,Duan,P.,Xie,Z.,Kang,X.,Li,S.,2024. Learnfromsegment anythingmodel:Localregionhomogenizingforcross-domainremote sensing image segmentation, in: IGARSS 2024 - 2024 IEEE Interna- tional Geoscience and Remote Sensing Symposium, pp. 8351–8354. doi:10.1109/IGARSS53475.2024.10642007

work page doi:10.1109/igarss53475.2024.10642007 2024
[51]

Liu, Y., Cao, W., Xu, H., Xie, Y., Miao, C., 2025b. Scsnet: Semantic segmentation of carbon source and sink in remote sensing images based on multi-scale transformer and local feature fusion, in: 2025 InternationalJointConferenceonNeuralNetworks(IJCNN),pp.1–8. doi:10.1109/IJCNN64981.2025.11229288

work page doi:10.1109/ijcnn64981.2025.11229288 2025
[52]

Liu, Y., Shi, S., Wang, J., Zhong, Y., 2023. Seeing beyond the patch: Scale-adaptive semantic segmentation of high-resolution re- mote sensing imagery based on reinforcement learning, in: Proceed- ings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 16868–16878

work page 2023
[53]

Liu, Z., Mao, H., Wu, C.Y., Feichtenhofer, C., Darrell, T., Xie, S.,

work page
[54]

In: Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition (CVPR), pp

A convnet for the 2020s, in: 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11966– 11976. doi:10.1109/CVPR52688.2022.01167

work page doi:10.1109/cvpr52688.2022.01167 2022
[55]

Multispectral remote sensingimagematchingviaimagetransferbyregularizedconditional generative adversarial networks and local feature

Ma, T., Ma, J., Yu, K., Zhang, J., Fu, W., 2021. Multispectral remote sensingimagematchingviaimagetransferbyregularizedconditional generative adversarial networks and local feature. IEEE Geoscience and Remote Sensing Letters 18, 351–355. doi:10.1109/LGRS.2020. 2972361

work page doi:10.1109/lgrs.2020 2021
[56]

Adjacent-scale multimodal fusion networks for semantic segmentation of remote sensing data

Ma, X., Xu, X., Zhang, X., Pun, M.O., 2024a. Adjacent-scale multimodal fusion networks for semantic segmentation of remote sensing data. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17, 20116–20128. doi:10.1109/ JSTARS.2024.3486906

work page arXiv 2024
[57]

Swint-resnet: An improved remote sensing image segmentation model based on trans- former

Ma, Y., Wang, Y., Liu, X., Wang, H., 2024b. Swint-resnet: An improved remote sensing image segmentation model based on trans- former. IEEE Geoscience and Remote Sensing Letters 21, 1–5. doi:10.1109/LGRS.2024.3433034

work page doi:10.1109/lgrs.2024.3433034 2024
[58]

Application of geospatial indices for mapping land cover/use change detection in a mining area

Madasa, A., Orimoloye, I.R., Ololade, O.O., 2021. Application of geospatial indices for mapping land cover/use change detection in a mining area. Journal of African Earth Sciences 175, 104108. doi:10.1016/j.jafrearsci.2021.104108

work page doi:10.1016/j.jafrearsci.2021.104108 2021
[61]

Boundary- aware dual-stream network for vhr remote sensing images semantic segmentation

Nong, Z., Su, X., Liu, Y., Zhan, Z., Yuan, Q., 2021. Boundary- aware dual-stream network for vhr remote sensing images semantic segmentation. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 14, 5260–5268. doi:10.1109/ JSTARS.2021.3076035

work page arXiv 2021
[62]

Improving estimation of lai dynamic by fusion of morphological and vegetation indices based on uav imagery

Qiao, L., Gao, D., Zhao, R., Tang, W., An, L., Li, M., Sun, H., 2022. Improving estimation of lai dynamic by fusion of morphological and vegetation indices based on uav imagery. Computers and Electronics in Agriculture 192, 106603. doi:10.1016/j.compag.2021.106603

work page doi:10.1016/j.compag.2021.106603 2022
[65]

Multispectral semantic segmen- tation for land cover classification: An overview

Ramos, L.T., Sappa, A.D., 2024. Multispectral semantic segmen- tation for land cover classification: An overview. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 17, 14295–14336. doi:10.1109/JSTARS.2024.3438620

work page doi:10.1109/jstars.2024.3438620 2024
[66]

Dual-branch convnext-based networkwithattentionalfusiondecodingforlandcoverclassification using multispectral imagery, in: SoutheastCon 2025, pp

Ramos, L.T., Sappa, A.D., 2025a. Dual-branch convnext-based networkwithattentionalfusiondecodingforlandcoverclassification using multispectral imagery, in: SoutheastCon 2025, pp. 187–194. doi:10.1109/SoutheastCon56624.2025.10971457

work page doi:10.1109/southeastcon56624.2025.10971457 2025
[67]

Leveraging u-net and selective feature extraction for land cover classification using remote sensing imagery

Ramos, L.T., Sappa, A.D., 2025b. Leveraging u-net and selective feature extraction for land cover classification using remote sensing imagery. Scientific Reports 15, 784. doi:https://doi.org/10.1038/ s41598-024-84795-1

work page
[68]

Relationshipbetweendroughtandsoilerosion based on the normalized differential water index (ndwi) and revised universal soil loss equation (rusle) model

Rendana, M., Razi Idris, W.M., Alia, F., Effendi Rahim, S., Yamin, M.,Izzudin,M.,2024. Relationshipbetweendroughtandsoilerosion based on the normalized differential water index (ndwi) and revised universal soil loss equation (rusle) model. Regional Sustainability 5, 100183. doi:https://doi.org/10.1016/j.regsus.2024.100183

work page doi:10.1016/j.regsus.2024.100183 2024
[69]

Finding differences between transformers and convnets using counterfactual simulation testing, in: Advances in Neural Information Processing Systems, pp

Ruiz, N., Bargal, S., Xie, C., Saenko, K., Sclaroff, S., 2022. Finding differences between transformers and convnets using counterfactual simulation testing, in: Advances in Neural Information Processing Systems, pp. 14403–14418

work page 2022
[70]

Sajol, M.S.I., Alvi, S.T., Era, C.A.A., 2024. Performance assess- ment of advanced cnn and transformer architectures in skin can- cer detection, in: 2024 11th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), pp. 1–8. doi:10.1109/EECSI63442.2024.10776508

work page doi:10.1109/eecsi63442.2024.10776508 2024
[71]

Semantic segmentation of multispectral images using res-seg-net model, in: 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pp

Saxena, N., N., K.B., Raman, B., 2020. Semantic segmentation of multispectral images using res-seg-net model, in: 2020 IEEE 14th International Conference on Semantic Computing (ICSC), pp. 154–

work page 2020
[72]

doi:10.1109/ICSC.2020.00030

work page doi:10.1109/icsc.2020.00030 2020
[73]

Adaptive multitype contrastive views generation for remote sensing imagesemanticsegmentation

Shi, C., Han, P., Zhao, M., Fang, L., Miao, Q., Pun, C.M., 2025. Adaptive multitype contrastive views generation for remote sensing imagesemanticsegmentation. IEEETransactionsonGeoscienceand Ramos and Sappa: Preprint submitted for review Page 20 of 21 Multi-encoder ConvNeXt Network with Smooth Attentional Feature Fusion for Multispectral Semantic Segmenta...

work page doi:10.1109/tgrs.2024.3525133 2025
[74]

Shi, W., Caballero, J., Huszár, F., Totz, J., Aitken, A.P., Bishop, R., Rueckert, D., Wang, Z., 2016. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network, in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1874–1883. doi:10.1109/CVPR.2016.207

work page doi:10.1109/cvpr.2016.207 2016
[75]

A review of machine learning in processing remote sensing data for mineral exploration

Shirmard, H., Farahbakhsh, E., Müller, R.D., Chandra, R., 2022. A review of machine learning in processing remote sensing data for mineral exploration. Remote Sensing of Environment 268, 112750. doi:10.1016/j.rse.2021.112750

work page doi:10.1016/j.rse.2021.112750 2022
[76]

Self-supervisedremote sensing feature learning: Learning paradigms, challenges, and future works

Tao,C.,Qi,J.,Guo,M.,Zhu,Q.,Li,H.,2023. Self-supervisedremote sensing feature learning: Learning paradigms, challenges, and future works. IEEE Transactions on Geoscience and Remote Sensing 61, 1–26. doi:10.1109/TGRS.2023.3276853

work page doi:10.1109/tgrs.2023.3276853 2023
[77]

Land- cover classification with high-resolution remote sensing im- ages using transferable deep models.Remote Sensing of En- vironment, doi: 10.1016/j.rse.2019.111322, 2020

Tong, X.Y., Xia, G.S., Lu, Q., Shen, H., Li, S., You, S., Zhang, L., 2020. Land-cover classification with high-resolution remote sensing images using transferable deep models. Remote Sensing of Environment 237, 111322. doi:10.1016/j.rse.2019.111322

work page doi:10.1016/j.rse.2019.111322 2020
[78]

Enabling country-scale land covermappingwithmeter-resolutionsatelliteimagery.ISPRSJournal of Photogrammetry and Remote Sensing 196, 178–196

Tong, X.Y., Xia, G.S., Zhu, X.X., 2023. Enabling country-scale land covermappingwithmeter-resolutionsatelliteimagery.ISPRSJournal of Photogrammetry and Remote Sensing 196, 178–196. doi:10.1016/ j.isprsjprs.2022.12.011

work page 2023
[79]

Vasanthi, A., Joshitha, K.L., 2024. Water body detection utilizing ndwi, ndvi and nmdwi indices in sen-12 spectral imagery, in: 2024 First International Conference on Electronics, Communication and SignalProcessing(ICECSP),pp.1–5. doi: 10.1109/ICECSP61809.2024. 10698263

work page doi:10.1109/icecsp61809.2024 2024
[80]

Multispectral and hyperspectral image fusion in remote sensing: A survey

Vivone, G., 2023. Multispectral and hyperspectral image fusion in remote sensing: A survey. Information Fusion 89, 405–417. doi:10. 1016/j.inffus.2022.08.032

work page 2023
[81]

3d Shapenets: A Deep Representation for Volumetric Shapes

Vu, T.H., Jain, H., Bucher, M., Cord, M., Pérez, P., 2019. Advent: Adversarial entropy minimization for domain adaptation in semantic segmentation, in: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2512–2521. doi:10.1109/CVPR. 2019.00262

work page doi:10.1109/cvpr 2019
[82]

Advances in Engineering Software42(12), 1020–1034 (2011)

Wang, F., Yi, Q., Hu, J., Xie, L., Yao, X., Xu, T., Zheng, J., 2021. Combining spectral and textural information in uav hyperspectral images to estimate rice grain yield. International Journal of Applied Earth Observation and Geoinformation 102, 102397. doi:10.1016/j. jag.2021.102397

work page doi:10.1016/j 2021
[83]

Estimation of vegetation traits with kernel ndvi

Wang, Q., Álvaro Moreno-Martínez, Muñoz-Marí, J., Campos- Taberner, M., Camps-Valls, G., 2023. Estimation of vegetation traits with kernel ndvi. ISPRS Journal of Photogrammetry and Remote Sensing 195, 408–417. doi:https://doi.org/10.1016/j.isprsjprs. 2022.12.019

work page doi:10.1016/j.isprsjprs 2023
[84]

Cbam: Convolutional block attention module, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y

Woo, S., Park, J., Lee, J.Y., Kweon, I.S., 2018. Cbam: Convolutional block attention module, in: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (Eds.), Computer Vision – ECCV 2018, Springer International Publishing, Cham. pp. 3–19

work page 2018

Showing first 80 references.