pith. sign in

arxiv: 2508.02384 · v2 · submitted 2025-08-04 · 💻 cs.CV

SMART-Ship: A Comprehensive Synchronized Multi-modal Aligned Remote Sensing Targets Dataset and Benchmark for Berthed Ships Analysis

Pith reviewed 2026-05-19 00:52 UTC · model grok-4.3

classification 💻 cs.CV
keywords remote sensing datasetmulti-modal alignmentberthed shipsmaritime surveillanceSAR imagerychange detectionship detectionimage registration
0
0 comments X

The pith

A new dataset synchronizes five remote sensing modalities for berthed ship analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the SMART-Ship dataset to overcome limitations of single-modality satellite data for long-term maritime observation. It supplies 1092 spatiotemporally registered image sets spanning visible-light, SAR, panchromatic, multi-spectral, and near-infrared modalities and covering 38,838 ships. Each set carries hierarchical annotations for polygonal locations, fine-grained categories, instance identifiers, and change masks. Standardized benchmarks on five core tasks let representative methods be compared directly on the data. A reader would care because reliable multi-modal alignment can support consistent surveillance despite varying satellite orbits and weather conditions.

Core claim

The SMART-Ship dataset consists of 1092 multi-modal image sets acquired within one week, each registered for spatiotemporal consistency and annotated with polygonal ship locations, fine-grained categories, instance-level identifiers, and change region masks, thereby enabling standardized benchmarks for five fundamental multi-modal remote sensing tasks on berthed ships.

What carries the argument

The SMART-Ship dataset itself, built from 1092 aligned multi-modal image sets and hierarchical annotations that organize 38,838 ship instances to serve multiple interpretation tasks.

If this is right

  • Representative methods can be compared on standardized benchmarks across five multi-modal tasks.
  • The dataset supports ship detection, classification, instance identification, and change detection in maritime settings.
  • Hierarchical annotations allow flexible use for both coarse and fine-grained remote sensing problems.
  • Evaluations indicate the data can reveal directions for improving multi-modal fusion techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Instance-level identifiers could enable tracking the same ships across different modalities and times.
  • The registration approach might transfer to other dynamic targets such as vehicles or aircraft in remote sensing.
  • Expanding the one-week acquisition window could test longer-term change analysis.
  • Models trained to fuse all five modalities simultaneously could be evaluated directly on the existing splits.

Load-bearing premise

Each multi-modal image set is accurately registered for spatiotemporal consistency and the hierarchical annotations are reliable enough to support the claimed range of tasks.

What would settle it

Discovery of widespread misalignment between modalities or annotation errors that cause all benchmarked methods to fail on the defined tasks would show the dataset does not support the intended multi-modal analysis.

Figures

Figures reproduced from arXiv: 2508.02384 by Chen-Chen Fan, Haolin Huang, Kehan Qi, Linping Zhang, Peiyao Guo, Yong-Qiang Mao, You He, Yu Liu, Yuxi Suo, Zhizhuo Jiang.

Figure 1
Figure 1. Figure 1: The proposed SMART-Ship dataset. This comprehensive multi-modal maritime dataset features precise polygon an [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overall Composition of the SMART-Ship Dataset. (a) image distribution across modalities; (b) ground sampling [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Multi-modal ship annotation statistics and consistency analysis. (a-c) Distribution of width, height, and aspect ratio [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Task 1 Multi-Modal Ship Detection - qualitative comparison on RGB and SAR modalities. The first column shows [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Task 2: Cross-Modal Ship Re-Identification - qualitative results from RGB to SAR, and RGB to PAN. Each row [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Task 3 Cross-Modal Generation - qualitative comparison between RGB, SAR, and PAN modalities. Each row shows [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Task 5: Cross-Modal Change Detection - qualitative comparison. Bi-temporal RGB-SAR image pairs and change [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
read the original abstract

Given the limitations of satellite orbits and imaging conditions, multi-modal remote sensing (RS) data is crucial in enabling long-term earth observation. However, maritime surveillance remains challenging due to the complexity of multi-scale targets and the dynamic environments. To bridge this critical gap, we propose a Synchronized Multi-modal Aligned Remote sensing Targets dataset for berthed ships analysis (SMART-Ship), containing spatiotemporal registered images with fine-grained annotation for maritime targets from five modalities: visible-light, synthetic aperture radar (SAR), panchromatic, multi-spectral, and near-infrared. Specifically, our dataset consists of 1092 multi-modal image sets, covering 38,838 ships. Each image set is acquired within one week and registered to ensure spatiotemporal consistency. Ship instances in each set are annotated with polygonal location information, fine-grained categories, instance-level identifiers, and change region masks, organized hierarchically to support diverse multi-modal RS tasks. Furthermore, we define standardized benchmarks on five fundamental tasks and comprehensively compare representative methods across the dataset. Thorough experiment evaluations validate that the proposed SMART-Ship dataset could support various multi-modal RS interpretation tasks and reveal the promising directions for further exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces the SMART-Ship dataset consisting of 1092 multi-modal image sets (visible-light, SAR, panchromatic, multi-spectral, NIR) of berthed ships, totaling 38,838 instances. Each set is described as acquired within one week and registered for spatiotemporal consistency, with hierarchical annotations providing polygonal locations, fine-grained categories, instance-level identifiers, and change region masks. The authors establish standardized benchmarks for five fundamental tasks, compare representative methods, and conclude that the dataset supports diverse multi-modal remote sensing interpretation tasks while highlighting promising research directions.

Significance. If the claimed alignments and annotations can be shown to be reliable through quantitative validation, the dataset would be a meaningful addition to remote sensing resources for maritime surveillance. Its scale, five-modality coverage, change masks, and hierarchical structure could enable progress on cross-modal fusion, instance tracking, and change detection in port environments where existing datasets are limited. The inclusion of benchmark experiments is a constructive element that could help standardize evaluation in this sub-area.

major comments (2)
  1. [Abstract and Dataset Construction] Abstract and Dataset Construction section: The central claim that the 1092 image sets provide 'spatiotemporal consistency' via registration rests only on the statement that images were 'acquired within one week and registered.' No quantitative registration accuracy metrics (RMSE, overlap coefficients, or mutual-information scores) are reported, particularly for the geometrically and radiometrically challenging SAR-optical pairs. This directly affects whether the dataset can actually support the claimed multi-modal tasks and is therefore load-bearing.
  2. [Annotation and Benchmark sections] Annotation description and Benchmark sections: The manuscript describes the annotations as 'fine-grained' and 'hierarchical' but reports no inter-annotator agreement statistics, validation protocol, or reliability measures for the polygonal locations, categories, instance IDs, or change masks. Without these, the soundness of the five benchmark tasks and the experimental comparisons cannot be fully assessed.
minor comments (2)
  1. [Abstract] The abstract and introduction could more explicitly list the five benchmark tasks and the representative methods compared, to improve immediate readability.
  2. [Dataset Construction] Consider including sensor specifications, exact acquisition dates, and geographic coverage details in the dataset section to strengthen reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract and Dataset Construction] Abstract and Dataset Construction section: The central claim that the 1092 image sets provide 'spatiotemporal consistency' via registration rests only on the statement that images were 'acquired within one week and registered.' No quantitative registration accuracy metrics (RMSE, overlap coefficients, or mutual-information scores) are reported, particularly for the geometrically and radiometrically challenging SAR-optical pairs. This directly affects whether the dataset can actually support the claimed multi-modal tasks and is therefore load-bearing.

    Authors: We agree with the referee that providing quantitative metrics for the registration accuracy is essential to substantiate the spatiotemporal consistency of the dataset. In the revised manuscript, we will include a detailed account of the registration procedure along with quantitative validation metrics, such as RMSE for geometric alignment and overlap coefficients or mutual information scores for the multi-modal image sets. Special emphasis will be placed on the SAR-optical pairs to address the challenges mentioned. revision: yes

  2. Referee: [Annotation and Benchmark sections] Annotation description and Benchmark sections: The manuscript describes the annotations as 'fine-grained' and 'hierarchical' but reports no inter-annotator agreement statistics, validation protocol, or reliability measures for the polygonal locations, categories, instance IDs, or change masks. Without these, the soundness of the five benchmark tasks and the experimental comparisons cannot be fully assessed.

    Authors: We recognize the value of reporting inter-annotator agreement and validation details to ensure the quality and reliability of the annotations. Accordingly, we will revise the Annotation section to describe the annotation process in greater detail, including the involvement of multiple annotators, the protocol used for quality control and disagreement resolution, and quantitative reliability measures such as inter-annotator agreement scores for the various annotation types (polygonal locations, categories, instance identifiers, and change region masks). revision: yes

Circularity Check

0 steps flagged

No significant circularity; dataset contribution is independent

full rationale

The paper introduces a new multi-modal remote sensing dataset (SMART-Ship) consisting of 1092 image sets with annotations and defines five benchmark tasks with empirical comparisons. No mathematical derivations, predictions, or equations are present that reduce claims to fitted parameters or self-citations by construction. The central premise rests on data collection, registration, and hierarchical annotations rather than any self-definitional or load-bearing circular steps. This is a standard self-contained dataset paper with no reduction of results to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper contributes a new empirical dataset rather than a derivation from axioms or parameters. Main premises concern data registration accuracy and annotation quality.

axioms (1)
  • domain assumption Standard remote sensing registration methods can achieve spatiotemporal consistency across modalities when images are acquired within one week.
    Invoked in the description of how each image set is prepared for alignment.

pith-pipeline@v0.9.0 · 8464 in / 1076 out tokens · 48806 ms · 2026-05-19T00:52:18.213408+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 1 internal anchor

  1. [1]

    STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery,

    Y. Li, L. Wang, T. Wang, X. Yang, J. Luo, Q. Wang, Y. Deng, W. Wang, X. Sun, H. Li, B. Dang, Y. Zhang, Y. Yu, and J. Yan, “STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1832–1849, 2025

  2. [2]

    Learning to holistically detect bridges from large-size VHR remote sensing imagery,

    Y. Li, J. Luo, Y. Zhang, Y. Tan, J.-G. Yu, and S. Bai, “Learning to holistically detect bridges from large-size VHR remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 11 507–11 523, 2024

  3. [3]

    FarSeg++: Foreground-aware relation network for geospatial object segmen- tation in high spatial resolution remote sensing imagery,

    Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “FarSeg++: Foreground-aware relation network for geospatial object segmen- tation in high spatial resolution remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 715–13 729, 2023

  4. [4]

    Frequency- adaptive learning for SAR ship detection in clutter scenes,

    L. Zhang, Y. Liu, W. Zhao, X. Wang, G. Li, and Y. He, “Frequency- adaptive learning for SAR ship detection in clutter scenes,” IEEE T rans. Geosci. Remote Sens., vol. 61, pp. 1–14, 2023

  5. [5]

    MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,

    Z. Yu, C. Liu, L. Liu, Z. Shi, and Z. Zou, “MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1764–1781, 2025

  6. [6]

    A semi- supervised deep rule-based approach for complex satellite sensor image analysis,

    X. Gu, P . P . Angelov, C. Zhang, and P . M. Atkinson, “A semi- supervised deep rule-based approach for complex satellite sensor image analysis,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 5, pp. 2281–2292, 2020

  7. [7]

    LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,

    Z.-R. Jin, T.-J. Zhang, T.-X. Jiang, G. Vivone, and L.-J. Deng, “LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 1, 2022, pp. 1113–1121

  8. [8]

    Pan- sharpening by convolutional neural networks in the full resolution framework,

    M. Ciotola, S. Vitale, A. Mazza, G. Poggi, and G. Scarpa, “Pan- sharpening by convolutional neural networks in the full resolution framework,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–17, 2022

  9. [9]

    Detail injection- based deep convolutional neural networks for pansharpening,

    L.-J. Deng, G. Vivone, C. Jin, and J. Chanussot, “Detail injection- based deep convolutional neural networks for pansharpening,” IEEE T rans. Geosci. Remote Sens., vol. 59, no. 8, pp. 6995–7010, 2020

  10. [10]

    Vehicle detection in aerial imagery: A small target detection benchmark,

    S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun. Image Repre- sent., vol. 34, pp. 187–203, 2016

  11. [11]

    Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,

    Y. Sun, B. Cao, P . Zhu, and Q. Hu, “Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,” IEEE T rans. Circuits Syst. Video T echnol., vol. 32, no. 10, pp. 6700– 6713, 2022

  12. [12]

    Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery,

    Q. Feng and Z. Wang, “Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery,” Pattern Recognit., vol. 130, p. 108786, 2022

  13. [13]

    A new learning paradigm for foundation model-based remote-sensing change detection,

    K. Li, X. Cao, and D. Meng, “A new learning paradigm for foundation model-based remote-sensing change detection,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–12, 2024

  14. [14]

    Remote sensing image change detec- tion with transformers,

    H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detec- tion with transformers,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–14, 2021

  15. [15]

    A transformer-based siamese network for change detection,

    W. G. C. Bandara and V . M. Patel, “A transformer-based siamese network for change detection,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2022, pp. 207–210

  16. [16]

    Asymmetric feature fusion network for hyperspectral and SAR image classification,

    W. Li, Y. Gao, M. Zhang, R. Tao, and Q. Du, “Asymmetric feature fusion network for hyperspectral and SAR image classification,” IEEE T rans. Neural Netw. Learn. Syst., vol. 34, no. 10, pp. 8057–8070, 2022

  17. [17]

    ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,

    J. Shen, Y. Chen, Y. Liu, X. Zuo, H. Fan, and W. Yang, “ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,” Pattern Recognit., vol. 145, p. 109913, 2024

  18. [18]

    Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,

    Q. Zhang, Z. Wang, X. Wang, G. Li, L. Huang, H. Song, and Z. Song, “Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,”J. Radars, vol. 13, no. R24037, p. 885, 2024

  19. [19]

    Multiscale generative adversarial network based on wavelet feature learning for SAR-to-optical image translation,

    H. Li, C. Gu, D. Wu, G. Cheng, L. Guo, and H. Liu, “Multiscale generative adversarial network based on wavelet feature learning for SAR-to-optical image translation,” IEEE T rans. Geosci. Remote Sens., vol. 60, pp. 1–15, 2022

  20. [20]

    A semi-supervised image-to-image translation framework for SAR– optical image matching,

    W.-L. Du, Y. Zhou, H. Zhu, J. Zhao, Z. Shao, and X. Tian, “A semi-supervised image-to-image translation framework for SAR– optical image matching,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022

  21. [21]

    Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,

    W. Cai, H. Zhang, J. Li, and M. Yu, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1319–1335, 2024

  22. [22]

    Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,

    Z. Wang, X. Wang, G. Li, and C. Li, “Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–17, 2024. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 17

  23. [23]

    Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,

    H. Wang, S. Li, J. Yang, Y. Liu, Y. Lv, and Z. Zhou, “Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,” arXiv preprint arXiv:2506.22027 , 2025

  24. [24]

    MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,

    L. Chen, L. Li, S. Wang, S. Gao, X. Ye et al., “MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,” Opt. Precision Eng. , vol. 31, no. 13, pp. 1962–1972, 2023

  25. [25]

    A cross-modal fusion method for multispectral small ship detection,

    Y. Liu, Y. Liu, X. Wang, L. Zhang, Z. Jiang, Y. Li, C. Yan, Y. Fu, and T. Zhang, “A cross-modal fusion method for multispectral small ship detection,” in Proc. Int. Conf. Inf. Fusion , 2024, pp. 1–6

  26. [26]

    Spacenet 6: Multi-sensor all weather mapping dataset,

    J. Shermeyer, D. Hogan, J. Brown, A. Van Etten, N. Weir et al. , “Spacenet 6: Multi-sensor all weather mapping dataset,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops , 2020, pp. 196–197

  27. [27]

    The qxs-saropt dataset for deep learning in sar-optical data fusion. arxiv 2021,

    M. Huang, Y. Xu, L. Qian, W. Shi, Y. Zhang, W. Bao, N. Wang, X. Liu, and X. Xiang, “The QXS-SAROPT dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:2103.08259 , 2021

  28. [28]

    SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,

    L. Wang, X. Xu, Y. Yu, R. Yang, R. Gui, Z. Xu, and F. Pu, “SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,” IEEE Access, vol. 7, pp. 129 136–129 149, 2019

  29. [29]

    Pan-sharpening of quickbird satellite images using multiresolution techniques: Wavelets, contourlets, and curvelets,

    J. F. Reinoso, “Pan-sharpening of quickbird satellite images using multiresolution techniques: Wavelets, contourlets, and curvelets,” Imaging Sci. J. , vol. 58, no. 3, pp. 125–135, 2010

  30. [30]

    Worldview-2 pan-sharpening,

    C. Padwick, M. Deskevich, F. Pacifici, and S. Smallwood, “Worldview-2 pan-sharpening,” in Proc. ASPRS Annu. Conf. , vol. 2630, 2010, pp. 1–14

  31. [31]

    Application of different pan-sharpening methods on worldview- 3 images,

    O. R. Belfiore, C. Meneghini, C. Parente, R. Santamaria et al. , “Application of different pan-sharpening methods on worldview- 3 images,” J. Eng. Appl. Sci. , vol. 11, no. 1, pp. 490–496, 2016

  32. [32]

    Un- supervised image regression for heterogeneous change detection,

    L. T. Luppino, F. M. Bianchi, G. Moser, and S. N. Anfinsen, “Un- supervised image regression for heterogeneous change detection,” arXiv preprint arXiv:1909.05948 , 2019

  33. [33]

    A fractal projection and Markovian segmentation- based approach for multimodal change detection,

    M. Mignotte, “A fractal projection and Markovian segmentation- based approach for multimodal change detection,” IEEE T rans. Geosci. Remote Sens. , vol. 58, no. 11, pp. 8046–8058, 2020

  34. [34]

    An a-contrario approach for subpixel change detection in satellite imagery,

    A. Robin, L. Moisan, and S. Le Hegarat-Mascle, “An a-contrario approach for subpixel change detection in satellite imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 32, no. 11, pp. 1977–1993, 2010

  35. [35]

    The SEN1-2 Dataset for Deep Learning in SAR-Optical Data Fusion

    M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:1807.01569, 2018

  36. [36]

    A comparative analysis of GAN-based methods for SAR-to-optical image translation,

    Y. Zhao, T. Celik, N. Liu, and H.-C. Li, “A comparative analysis of GAN-based methods for SAR-to-optical image translation,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022

  37. [37]

    SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,

    Y. Xia, W. He, Q. Huang, H. Chen, H. Huang, and H. Zhang, “SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,” IEEE T rans. Geosci. Remote Sens. , vol. 62, pp. 1–19, 2024

  38. [38]

    GF2 PMS remote sensing imagery,

    China Center for Resources Satellite Data and Application, “GF2 PMS remote sensing imagery,” https://www.chinageoss. cn/, 2017

  39. [39]

    A large-scale benchmark data set for evaluating pansharpening performance: Overview and implemen- tation,

    X. Meng, Y. Xiong, F. Shao, H. Shen, W. Sun, G. Yang, Q. Yuan, R. Fu, and H. Zhang, “A large-scale benchmark data set for evaluating pansharpening performance: Overview and implemen- tation,” IEEE Geosci. Remote Sens. Mag. , vol. 9, no. 1, pp. 18–52, 2020

  40. [40]

    Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine,

    F. Gao, J. Dong, B. Li, Q. Xu, and C. Xie, “Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine,” J. Appl. Remote Sens. , vol. 10, no. 4, p. 046019, 2016

  41. [41]

    A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images,

    C. Zhang, Y. Feng, L. Hu, D. Tapete, L. Pan, Z. Liang, F. Cigna, and P . Yue, “A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images,” Int. J. Appl. Earth Observ. Geoinf. , vol. 109, p. 102769, 2022

  42. [42]

    Cross-modality fusion transformer for multispectral object detection,

    Q. Feng, D. Han, and Z. Wang, “Cross-modality fusion transformer for multispectral object detection,” arXiv preprint arXiv:2111.00273, 2021

  43. [43]

    Multimodal object detection via probabilistic ensembling,

    Y.-T. Chen, J. Shi, Z. Ye, C. Mertz, D. Ramanan, and S. Kong, “Multimodal object detection via probabilistic ensembling,” in Proc. Comput. Vis. ECCV , 2022, pp. 139–158

  44. [44]

    Learning source-invariant deep hashing convolutional neural networks for cross-source re- mote sensing image retrieval,

    Y. Li, Y. Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source re- mote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens. , vol. 56, no. 11, pp. 6521–6536, 2018

  45. [45]

    CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,

    U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,” Pattern Recognit. Lett., vol. 131, pp. 456– 462, 2020

  46. [46]

    A discriminative distillation network for cross-source remote sensing image retrieval,

    W. Xiong, Z. Xiong, Y. Cui, and Y. Lv, “A discriminative distillation network for cross-source remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens. , vol. 13, pp. 1234–1247, 2020

  47. [47]

    Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,

    Z. Cai, Y. Pan, and W. Jin, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 17, pp. 7759–7772, 2024

  48. [48]

    Image-to-image translation with conditional adversarial networks,

    P . Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1125–1134

  49. [49]

    Unpaired image-to- image translation using cycle-consistent adversarial networks,

    J.-Y. Zhu, T. Park, P . Isola, and A. A. Efros, “Unpaired image-to- image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2223–2232

  50. [50]

    Contrastive learning for unpaired image-to-image translation,

    T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in Comput. Vis. ECCV , 2020, pp. 319–345

  51. [51]

    Multi- channel attention selection GAN with cascaded semantic guidance for cross-view image translation,

    H. Tang, D. Xu, N. Sebe, Y. Wang, J. J. Corso, and Y. Yan, “Multi- channel attention selection GAN with cascaded semantic guidance for cross-view image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2417–2426

  52. [52]

    The dawn of KAN in image-to-image (I2I) translation: Integrating Kolmogorov-Arnold networks with GANs for unpaired I2I translation,

    A. Mahara, N. D. Rishe, and L. Deng, “The dawn of KAN in image-to-image (I2I) translation: Integrating Kolmogorov-Arnold networks with GANs for unpaired I2I translation,” arXiv preprint arXiv:2408.08216, 2024

  53. [53]

    A novel it- erative PCA–based pansharpening method,

    M. Ghadjati, A. Moussaoui, and A. Boukharouba, “A novel it- erative PCA–based pansharpening method,” Remote Sens. Lett. , vol. 10, no. 3, pp. 264–273, 2019

  54. [54]

    PanNet: A deep network architecture for pan-sharpening,

    J. Yang, X. Fu, Y. Hu, Y. Huang, X. Ding, and J. Paisley, “PanNet: A deep network architecture for pan-sharpening,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5449–5457

  55. [55]

    SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,

    L. He, Y. Rao, J. Li, J. Chanussot, A. Plaza, J. Zhu, and B. Li, “SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 12, no. 4, pp. 1188–1204, 2019

  56. [56]

    Pan-sharpening using an efficient bidirectional pyramid network,

    Y. Zhang, C. Liu, M. Sun, and Y. Ou, “Pan-sharpening using an efficient bidirectional pyramid network,” IEEE T rans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5549–5563, 2019

  57. [57]

    A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,

    Q. Yuan, Y. Wei, X. Meng, H. Shen, and L. Zhang, “A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 11, no. 3, pp. 978–989, 2018

  58. [58]

    A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,

    J. Liu, M. Gong, K. Qin, and P . Zhang, “A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,” IEEE T rans. Neural Netw. Learn. Syst. , vol. 29, no. 3, pp. 545–559, 2016

  59. [59]

    Novel enhanced UNet for change detection using multimodal remote sensing image,

    Z. Lv, H. Huang, W. Sun, T. Lei, J. A. Benediktsson, and J. Li, “Novel enhanced UNet for change detection using multimodal remote sensing image,” IEEE Geosci. Remote Sens. Lett. , vol. 20, pp. 1–5, 2023

  60. [60]

    SegFormer: Simple and efficient design for semantic segmentation with transformers,

    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P . Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12 077–12 090, 2021

  61. [61]

    Jocher, A

    G. Jocher, A. Stoken, J. Borovec et al., “YOLOv5,” https://github. com/ultralytics/yolov5, 2021

  62. [62]

    CSPNet: A new backbone that can enhance learning capability of CNN,

    C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P .-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A new backbone that can enhance learning capability of CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 390–391

  63. [63]

    YOLOrs: Object detection in multimodal remote sensing imagery,

    M. Sharma et al., “YOLOrs: Object detection in multimodal remote sensing imagery,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1497–1508, 2021

  64. [64]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

  65. [65]

    CADDN: A content-aware downsampling-based de- tection method for small objects in remote sensing images,

    L. Zhang, Y. Liu, X. Wang, Y. He, G. Li, Y. Zhang, C. Liu, Z. Jiang, and Y. Liu, “CADDN: A content-aware downsampling-based de- tection method for small objects in remote sensing images,” IEEE T rans. Geosci. Remote Sens., vol. 63, pp. 1–17, 2025

  66. [66]

    Towards large-scale small object detection: Survey and bench- marks,

    G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards large-scale small object detection: Survey and bench- marks,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 467–13 488, 2023. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 18

  67. [67]

    De- tection and tracking meet drones challenge,

    P . Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “De- tection and tracking meet drones challenge,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7380–7399, 2022

  68. [68]

    Object detection in aerial images: A large-scale benchmark and challenges,

    J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y. Yang, S. Belongie, J. Luo, M. Datcu, M. Pelillo et al. , “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7778–7796, 2021

  69. [69]

    Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing,

    X. Yang, J. Yan, W. Liao, X. Yang, J. Tang, and T. He, “Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 45, no. 2, pp. 2384–2399, 2022

  70. [70]

    Highly efficient and unsupervised framework for moving object detection in satellite videos,

    C. Xiao, W. An, Y. Zhang, Z. Su, M. Li, W. Sheng, M. Pietik ¨ainen, and L. Liu, “Highly efficient and unsupervised framework for moving object detection in satellite videos,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 12, pp. 11 532–11 539, 2024

  71. [71]

    Hybrid Gaussian deformation for efficient remote sensing object detection,

    W. Zhao, X. Zhang, H. Wang, and H. Lu, “Hybrid Gaussian deformation for efficient remote sensing object detection,” IEEE T rans. Pattern Anal. Mach. Intell., pp. 1–17, 2025

  72. [72]

    Circle loss: A unified perspective of pair similarity optimization,

    Y. Sun, C. Cheng, Y. Zhang et al., “Circle loss: A unified perspective of pair similarity optimization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6398–6407

  73. [73]

    Person re-identification in the wild,

    L. Zheng, H. Zhang, S. Sun et al. , “Person re-identification in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1367–1376

  74. [74]

    Sphereface: Deep hypersphere embedding for face recognition,

    W. Liu, Y. Wen, Z. Yu et al. , “Sphereface: Deep hypersphere embedding for face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 212–220

  75. [75]

    Deep metric learning via lifted structured feature embedding,

    H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2016, pp. 4004–4012

  76. [76]

    Dual-path convolutional image-text embeddings with instance loss,

    Z. Zheng, L. Zheng, M. Garrett et al. , “Dual-path convolutional image-text embeddings with instance loss,” ACM T rans. Multime- dia Comput. Commun. Appl. , vol. 16, no. 2, pp. 1–23, 2020

  77. [77]

    Deep graph metric learning for weakly supervised person re-identification,

    J. Meng, W.-S. Zheng, J.-H. Lai, and L. Wang, “Deep graph metric learning for weakly supervised person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 10, pp. 6074–6093, 2021

  78. [78]

    Adaptive sparse pairwise loss for object re-identification,

    X. Zhou, Y. Zhong, Z. Cheng, F. Liang, and L. Ma, “Adaptive sparse pairwise loss for object re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2023, pp. 19 691–19 701

  79. [79]

    Deep learning for person re-identification: A survey and outlook,

    M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, 2021

  80. [80]

    Weakly super- vised tracklet association learning with video labels for person re-identification,

    M. Liu, Y. Bian, Q. Liu, X. Wang, and Y. Wang, “Weakly super- vised tracklet association learning with video labels for person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 5, pp. 3595–3607, 2024

Showing first 80 references.