SMART-Ship: A Comprehensive Synchronized Multi-modal Aligned Remote Sensing Targets Dataset and Benchmark for Berthed Ships Analysis

Chen-Chen Fan; Haolin Huang; Kehan Qi; Linping Zhang; Peiyao Guo; Yong-Qiang Mao; You He; Yu Liu; Yuxi Suo; Zhizhuo Jiang

arxiv: 2508.02384 · v2 · submitted 2025-08-04 · 💻 cs.CV

SMART-Ship: A Comprehensive Synchronized Multi-modal Aligned Remote Sensing Targets Dataset and Benchmark for Berthed Ships Analysis

Chen-Chen Fan , Peiyao Guo , Linping Zhang , Kehan Qi , Haolin Huang , Yong-Qiang Mao , Yuxi Suo , Zhizhuo Jiang

show 2 more authors

Yu Liu You He

This is my paper

Pith reviewed 2026-05-19 00:52 UTC · model grok-4.3

classification 💻 cs.CV

keywords remote sensing datasetmulti-modal alignmentberthed shipsmaritime surveillanceSAR imagerychange detectionship detectionimage registration

0 comments

The pith

A new dataset synchronizes five remote sensing modalities for berthed ship analysis.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents the SMART-Ship dataset to overcome limitations of single-modality satellite data for long-term maritime observation. It supplies 1092 spatiotemporally registered image sets spanning visible-light, SAR, panchromatic, multi-spectral, and near-infrared modalities and covering 38,838 ships. Each set carries hierarchical annotations for polygonal locations, fine-grained categories, instance identifiers, and change masks. Standardized benchmarks on five core tasks let representative methods be compared directly on the data. A reader would care because reliable multi-modal alignment can support consistent surveillance despite varying satellite orbits and weather conditions.

Core claim

The SMART-Ship dataset consists of 1092 multi-modal image sets acquired within one week, each registered for spatiotemporal consistency and annotated with polygonal ship locations, fine-grained categories, instance-level identifiers, and change region masks, thereby enabling standardized benchmarks for five fundamental multi-modal remote sensing tasks on berthed ships.

What carries the argument

The SMART-Ship dataset itself, built from 1092 aligned multi-modal image sets and hierarchical annotations that organize 38,838 ship instances to serve multiple interpretation tasks.

If this is right

Representative methods can be compared on standardized benchmarks across five multi-modal tasks.
The dataset supports ship detection, classification, instance identification, and change detection in maritime settings.
Hierarchical annotations allow flexible use for both coarse and fine-grained remote sensing problems.
Evaluations indicate the data can reveal directions for improving multi-modal fusion techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Instance-level identifiers could enable tracking the same ships across different modalities and times.
The registration approach might transfer to other dynamic targets such as vehicles or aircraft in remote sensing.
Expanding the one-week acquisition window could test longer-term change analysis.
Models trained to fuse all five modalities simultaneously could be evaluated directly on the existing splits.

Load-bearing premise

Each multi-modal image set is accurately registered for spatiotemporal consistency and the hierarchical annotations are reliable enough to support the claimed range of tasks.

What would settle it

Discovery of widespread misalignment between modalities or annotation errors that cause all benchmarked methods to fail on the defined tasks would show the dataset does not support the intended multi-modal analysis.

Figures

Figures reproduced from arXiv: 2508.02384 by Chen-Chen Fan, Haolin Huang, Kehan Qi, Linping Zhang, Peiyao Guo, Yong-Qiang Mao, You He, Yu Liu, Yuxi Suo, Zhizhuo Jiang.

**Figure 1.** Figure 1: The proposed SMART-Ship dataset. This comprehensive multi-modal maritime dataset features precise polygon an [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Overall Composition of the SMART-Ship Dataset. (a) image distribution across modalities; (b) ground sampling [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Multi-modal ship annotation statistics and consistency analysis. (a-c) Distribution of width, height, and aspect ratio [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Task 1 Multi-Modal Ship Detection - qualitative comparison on RGB and SAR modalities. The first column shows [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Task 2: Cross-Modal Ship Re-Identification - qualitative results from RGB to SAR, and RGB to PAN. Each row [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Task 3 Cross-Modal Generation - qualitative comparison between RGB, SAR, and PAN modalities. Each row shows [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 8.** Figure 8: Task 5: Cross-Modal Change Detection - qualitative comparison. Bi-temporal RGB-SAR image pairs and change [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

read the original abstract

Given the limitations of satellite orbits and imaging conditions, multi-modal remote sensing (RS) data is crucial in enabling long-term earth observation. However, maritime surveillance remains challenging due to the complexity of multi-scale targets and the dynamic environments. To bridge this critical gap, we propose a Synchronized Multi-modal Aligned Remote sensing Targets dataset for berthed ships analysis (SMART-Ship), containing spatiotemporal registered images with fine-grained annotation for maritime targets from five modalities: visible-light, synthetic aperture radar (SAR), panchromatic, multi-spectral, and near-infrared. Specifically, our dataset consists of 1092 multi-modal image sets, covering 38,838 ships. Each image set is acquired within one week and registered to ensure spatiotemporal consistency. Ship instances in each set are annotated with polygonal location information, fine-grained categories, instance-level identifiers, and change region masks, organized hierarchically to support diverse multi-modal RS tasks. Furthermore, we define standardized benchmarks on five fundamental tasks and comprehensively compare representative methods across the dataset. Thorough experiment evaluations validate that the proposed SMART-Ship dataset could support various multi-modal RS interpretation tasks and reveal the promising directions for further exploration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SMART-Ship is a new multi-modal dataset for berthed ships that sets up useful benchmarks but skips quantitative checks on alignment and annotation quality.

read the letter

This paper's real contribution is a new dataset of 1092 multi-modal image sets for berthed ships, pulled from visible, SAR, panchromatic, multi-spectral, and near-infrared sources. It comes with polygonal annotations, fine-grained categories, instance IDs, and change masks, all organized to support different analysis tasks. They did a good job scaling it up to nearly 39,000 ships and defining five benchmark tasks with some method comparisons. That setup lets people test multi-modal approaches in a maritime context, which is useful given the challenges with dynamic environments and multi-scale targets. The weaker part is the description of how the data was put together. It says the images were acquired within one week and registered for consistency, but there's no numbers on registration accuracy or how they checked the annotations. For something relying on cross-modal alignment, especially with SAR, that missing detail makes it harder to judge if the dataset is as ready for those tasks as claimed. This is aimed at remote sensing researchers working on ship detection or surveillance. Readers who need a benchmark for multi-modal fusion or change detection would get value from the experiments and the data structure. The paper engages honestly with the literature on maritime RS and focuses on filling a practical gap. It deserves a serious referee to dig into the construction details. I would send it to peer review. The dataset could be a solid resource once the quality aspects are clarified.

Referee Report

2 major / 2 minor

Summary. The paper introduces the SMART-Ship dataset consisting of 1092 multi-modal image sets (visible-light, SAR, panchromatic, multi-spectral, NIR) of berthed ships, totaling 38,838 instances. Each set is described as acquired within one week and registered for spatiotemporal consistency, with hierarchical annotations providing polygonal locations, fine-grained categories, instance-level identifiers, and change region masks. The authors establish standardized benchmarks for five fundamental tasks, compare representative methods, and conclude that the dataset supports diverse multi-modal remote sensing interpretation tasks while highlighting promising research directions.

Significance. If the claimed alignments and annotations can be shown to be reliable through quantitative validation, the dataset would be a meaningful addition to remote sensing resources for maritime surveillance. Its scale, five-modality coverage, change masks, and hierarchical structure could enable progress on cross-modal fusion, instance tracking, and change detection in port environments where existing datasets are limited. The inclusion of benchmark experiments is a constructive element that could help standardize evaluation in this sub-area.

major comments (2)

[Abstract and Dataset Construction] Abstract and Dataset Construction section: The central claim that the 1092 image sets provide 'spatiotemporal consistency' via registration rests only on the statement that images were 'acquired within one week and registered.' No quantitative registration accuracy metrics (RMSE, overlap coefficients, or mutual-information scores) are reported, particularly for the geometrically and radiometrically challenging SAR-optical pairs. This directly affects whether the dataset can actually support the claimed multi-modal tasks and is therefore load-bearing.
[Annotation and Benchmark sections] Annotation description and Benchmark sections: The manuscript describes the annotations as 'fine-grained' and 'hierarchical' but reports no inter-annotator agreement statistics, validation protocol, or reliability measures for the polygonal locations, categories, instance IDs, or change masks. Without these, the soundness of the five benchmark tasks and the experimental comparisons cannot be fully assessed.

minor comments (2)

[Abstract] The abstract and introduction could more explicitly list the five benchmark tasks and the representative methods compared, to improve immediate readability.
[Dataset Construction] Consider including sensor specifications, exact acquisition dates, and geographic coverage details in the dataset section to strengthen reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.

read point-by-point responses

Referee: [Abstract and Dataset Construction] Abstract and Dataset Construction section: The central claim that the 1092 image sets provide 'spatiotemporal consistency' via registration rests only on the statement that images were 'acquired within one week and registered.' No quantitative registration accuracy metrics (RMSE, overlap coefficients, or mutual-information scores) are reported, particularly for the geometrically and radiometrically challenging SAR-optical pairs. This directly affects whether the dataset can actually support the claimed multi-modal tasks and is therefore load-bearing.

Authors: We agree with the referee that providing quantitative metrics for the registration accuracy is essential to substantiate the spatiotemporal consistency of the dataset. In the revised manuscript, we will include a detailed account of the registration procedure along with quantitative validation metrics, such as RMSE for geometric alignment and overlap coefficients or mutual information scores for the multi-modal image sets. Special emphasis will be placed on the SAR-optical pairs to address the challenges mentioned. revision: yes
Referee: [Annotation and Benchmark sections] Annotation description and Benchmark sections: The manuscript describes the annotations as 'fine-grained' and 'hierarchical' but reports no inter-annotator agreement statistics, validation protocol, or reliability measures for the polygonal locations, categories, instance IDs, or change masks. Without these, the soundness of the five benchmark tasks and the experimental comparisons cannot be fully assessed.

Authors: We recognize the value of reporting inter-annotator agreement and validation details to ensure the quality and reliability of the annotations. Accordingly, we will revise the Annotation section to describe the annotation process in greater detail, including the involvement of multiple annotators, the protocol used for quality control and disagreement resolution, and quantitative reliability measures such as inter-annotator agreement scores for the various annotation types (polygonal locations, categories, instance identifiers, and change region masks). revision: yes

Circularity Check

0 steps flagged

No significant circularity; dataset contribution is independent

full rationale

The paper introduces a new multi-modal remote sensing dataset (SMART-Ship) consisting of 1092 image sets with annotations and defines five benchmark tasks with empirical comparisons. No mathematical derivations, predictions, or equations are present that reduce claims to fitted parameters or self-citations by construction. The central premise rests on data collection, registration, and hierarchical annotations rather than any self-definitional or load-bearing circular steps. This is a standard self-contained dataset paper with no reduction of results to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper contributes a new empirical dataset rather than a derivation from axioms or parameters. Main premises concern data registration accuracy and annotation quality.

axioms (1)

domain assumption Standard remote sensing registration methods can achieve spatiotemporal consistency across modalities when images are acquired within one week.
Invoked in the description of how each image set is prepared for alignment.

pith-pipeline@v0.9.0 · 8464 in / 1076 out tokens · 48806 ms · 2026-05-19T00:52:18.213408+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We propose a Synchronized Multi-modal Aligned Remote sensing Targets dataset for berthed ships analysis (SMART-Ship), containing spatiotemporal registered images with fine-grained annotation for maritime targets from five modalities...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Thorough experiment evaluations validate that the proposed SMART-Ship dataset could support various multi-modal RS interpretation tasks

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

87 extracted references · 87 canonical work pages · 1 internal anchor

[1]

STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery,

Y. Li, L. Wang, T. Wang, X. Yang, J. Luo, Q. Wang, Y. Deng, W. Wang, X. Sun, H. Li, B. Dang, Y. Zhang, Y. Yu, and J. Yan, “STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1832–1849, 2025

work page 2025
[2]

Learning to holistically detect bridges from large-size VHR remote sensing imagery,

Y. Li, J. Luo, Y. Zhang, Y. Tan, J.-G. Yu, and S. Bai, “Learning to holistically detect bridges from large-size VHR remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 11 507–11 523, 2024

work page 2024
[3]

FarSeg++: Foreground-aware relation network for geospatial object segmen- tation in high spatial resolution remote sensing imagery,

Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “FarSeg++: Foreground-aware relation network for geospatial object segmen- tation in high spatial resolution remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 715–13 729, 2023

work page 2023
[4]

Frequency- adaptive learning for SAR ship detection in clutter scenes,

L. Zhang, Y. Liu, W. Zhao, X. Wang, G. Li, and Y. He, “Frequency- adaptive learning for SAR ship detection in clutter scenes,” IEEE T rans. Geosci. Remote Sens., vol. 61, pp. 1–14, 2023

work page 2023
[5]

MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,

Z. Yu, C. Liu, L. Liu, Z. Shi, and Z. Zou, “MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1764–1781, 2025

work page 2025
[6]

A semi- supervised deep rule-based approach for complex satellite sensor image analysis,

X. Gu, P . P . Angelov, C. Zhang, and P . M. Atkinson, “A semi- supervised deep rule-based approach for complex satellite sensor image analysis,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 5, pp. 2281–2292, 2020

work page 2020
[7]

LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,

Z.-R. Jin, T.-J. Zhang, T.-X. Jiang, G. Vivone, and L.-J. Deng, “LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 1, 2022, pp. 1113–1121

work page 2022
[8]

Pan- sharpening by convolutional neural networks in the full resolution framework,

M. Ciotola, S. Vitale, A. Mazza, G. Poggi, and G. Scarpa, “Pan- sharpening by convolutional neural networks in the full resolution framework,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–17, 2022

work page 2022
[9]

Detail injection- based deep convolutional neural networks for pansharpening,

L.-J. Deng, G. Vivone, C. Jin, and J. Chanussot, “Detail injection- based deep convolutional neural networks for pansharpening,” IEEE T rans. Geosci. Remote Sens., vol. 59, no. 8, pp. 6995–7010, 2020

work page 2020
[10]

Vehicle detection in aerial imagery: A small target detection benchmark,

S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun. Image Repre- sent., vol. 34, pp. 187–203, 2016

work page 2016
[11]

Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,

Y. Sun, B. Cao, P . Zhu, and Q. Hu, “Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,” IEEE T rans. Circuits Syst. Video T echnol., vol. 32, no. 10, pp. 6700– 6713, 2022

work page 2022
[12]

Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery,

Q. Feng and Z. Wang, “Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery,” Pattern Recognit., vol. 130, p. 108786, 2022

work page 2022
[13]

A new learning paradigm for foundation model-based remote-sensing change detection,

K. Li, X. Cao, and D. Meng, “A new learning paradigm for foundation model-based remote-sensing change detection,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–12, 2024

work page 2024
[14]

Remote sensing image change detec- tion with transformers,

H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detec- tion with transformers,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–14, 2021

work page 2021
[15]

A transformer-based siamese network for change detection,

W. G. C. Bandara and V . M. Patel, “A transformer-based siamese network for change detection,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2022, pp. 207–210

work page 2022
[16]

Asymmetric feature fusion network for hyperspectral and SAR image classification,

W. Li, Y. Gao, M. Zhang, R. Tao, and Q. Du, “Asymmetric feature fusion network for hyperspectral and SAR image classification,” IEEE T rans. Neural Netw. Learn. Syst., vol. 34, no. 10, pp. 8057–8070, 2022

work page 2022
[17]

ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,

J. Shen, Y. Chen, Y. Liu, X. Zuo, H. Fan, and W. Yang, “ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,” Pattern Recognit., vol. 145, p. 109913, 2024

work page 2024
[18]

Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,

Q. Zhang, Z. Wang, X. Wang, G. Li, L. Huang, H. Song, and Z. Song, “Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,”J. Radars, vol. 13, no. R24037, p. 885, 2024

work page 2024
[19]

Multiscale generative adversarial network based on wavelet feature learning for SAR-to-optical image translation,

H. Li, C. Gu, D. Wu, G. Cheng, L. Guo, and H. Liu, “Multiscale generative adversarial network based on wavelet feature learning for SAR-to-optical image translation,” IEEE T rans. Geosci. Remote Sens., vol. 60, pp. 1–15, 2022

work page 2022
[20]

A semi-supervised image-to-image translation framework for SAR– optical image matching,

W.-L. Du, Y. Zhou, H. Zhu, J. Zhao, Z. Shao, and X. Tian, “A semi-supervised image-to-image translation framework for SAR– optical image matching,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022

work page 2022
[21]

Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,

W. Cai, H. Zhang, J. Li, and M. Yu, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1319–1335, 2024

work page 2024
[22]

Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,

Z. Wang, X. Wang, G. Li, and C. Li, “Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–17, 2024. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 17

work page 2024
[23]

Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,

H. Wang, S. Li, J. Yang, Y. Liu, Y. Lv, and Z. Zhou, “Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,” arXiv preprint arXiv:2506.22027 , 2025

work page arXiv 2025
[24]

MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,

L. Chen, L. Li, S. Wang, S. Gao, X. Ye et al., “MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,” Opt. Precision Eng. , vol. 31, no. 13, pp. 1962–1972, 2023

work page 1962
[25]

A cross-modal fusion method for multispectral small ship detection,

Y. Liu, Y. Liu, X. Wang, L. Zhang, Z. Jiang, Y. Li, C. Yan, Y. Fu, and T. Zhang, “A cross-modal fusion method for multispectral small ship detection,” in Proc. Int. Conf. Inf. Fusion , 2024, pp. 1–6

work page 2024
[26]

Spacenet 6: Multi-sensor all weather mapping dataset,

J. Shermeyer, D. Hogan, J. Brown, A. Van Etten, N. Weir et al. , “Spacenet 6: Multi-sensor all weather mapping dataset,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops , 2020, pp. 196–197

work page 2020
[27]

The qxs-saropt dataset for deep learning in sar-optical data fusion. arxiv 2021,

M. Huang, Y. Xu, L. Qian, W. Shi, Y. Zhang, W. Bao, N. Wang, X. Liu, and X. Xiang, “The QXS-SAROPT dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:2103.08259 , 2021

work page arXiv 2021
[28]

SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,

L. Wang, X. Xu, Y. Yu, R. Yang, R. Gui, Z. Xu, and F. Pu, “SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,” IEEE Access, vol. 7, pp. 129 136–129 149, 2019

work page 2019
[29]

Pan-sharpening of quickbird satellite images using multiresolution techniques: Wavelets, contourlets, and curvelets,

J. F. Reinoso, “Pan-sharpening of quickbird satellite images using multiresolution techniques: Wavelets, contourlets, and curvelets,” Imaging Sci. J. , vol. 58, no. 3, pp. 125–135, 2010

work page 2010
[30]

Worldview-2 pan-sharpening,

C. Padwick, M. Deskevich, F. Pacifici, and S. Smallwood, “Worldview-2 pan-sharpening,” in Proc. ASPRS Annu. Conf. , vol. 2630, 2010, pp. 1–14

work page 2010
[31]

Application of different pan-sharpening methods on worldview- 3 images,

O. R. Belfiore, C. Meneghini, C. Parente, R. Santamaria et al. , “Application of different pan-sharpening methods on worldview- 3 images,” J. Eng. Appl. Sci. , vol. 11, no. 1, pp. 490–496, 2016

work page 2016
[32]

Un- supervised image regression for heterogeneous change detection,

L. T. Luppino, F. M. Bianchi, G. Moser, and S. N. Anfinsen, “Un- supervised image regression for heterogeneous change detection,” arXiv preprint arXiv:1909.05948 , 2019

work page arXiv 1909
[33]

A fractal projection and Markovian segmentation- based approach for multimodal change detection,

M. Mignotte, “A fractal projection and Markovian segmentation- based approach for multimodal change detection,” IEEE T rans. Geosci. Remote Sens. , vol. 58, no. 11, pp. 8046–8058, 2020

work page 2020
[34]

An a-contrario approach for subpixel change detection in satellite imagery,

A. Robin, L. Moisan, and S. Le Hegarat-Mascle, “An a-contrario approach for subpixel change detection in satellite imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 32, no. 11, pp. 1977–1993, 2010

work page 1977
[35]

The SEN1-2 Dataset for Deep Learning in SAR-Optical Data Fusion

M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:1807.01569, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[36]

A comparative analysis of GAN-based methods for SAR-to-optical image translation,

Y. Zhao, T. Celik, N. Liu, and H.-C. Li, “A comparative analysis of GAN-based methods for SAR-to-optical image translation,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022

work page 2022
[37]

SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,

Y. Xia, W. He, Q. Huang, H. Chen, H. Huang, and H. Zhang, “SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,” IEEE T rans. Geosci. Remote Sens. , vol. 62, pp. 1–19, 2024

work page 2024
[38]

GF2 PMS remote sensing imagery,

China Center for Resources Satellite Data and Application, “GF2 PMS remote sensing imagery,” https://www.chinageoss. cn/, 2017

work page 2017
[39]

A large-scale benchmark data set for evaluating pansharpening performance: Overview and implemen- tation,

X. Meng, Y. Xiong, F. Shao, H. Shen, W. Sun, G. Yang, Q. Yuan, R. Fu, and H. Zhang, “A large-scale benchmark data set for evaluating pansharpening performance: Overview and implemen- tation,” IEEE Geosci. Remote Sens. Mag. , vol. 9, no. 1, pp. 18–52, 2020

work page 2020
[40]

Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine,

F. Gao, J. Dong, B. Li, Q. Xu, and C. Xie, “Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine,” J. Appl. Remote Sens. , vol. 10, no. 4, p. 046019, 2016

work page 2016
[41]

A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images,

C. Zhang, Y. Feng, L. Hu, D. Tapete, L. Pan, Z. Liang, F. Cigna, and P . Yue, “A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images,” Int. J. Appl. Earth Observ. Geoinf. , vol. 109, p. 102769, 2022

work page 2022
[42]

Cross-modality fusion transformer for multispectral object detection,

Q. Feng, D. Han, and Z. Wang, “Cross-modality fusion transformer for multispectral object detection,” arXiv preprint arXiv:2111.00273, 2021

work page arXiv 2021
[43]

Multimodal object detection via probabilistic ensembling,

Y.-T. Chen, J. Shi, Z. Ye, C. Mertz, D. Ramanan, and S. Kong, “Multimodal object detection via probabilistic ensembling,” in Proc. Comput. Vis. ECCV , 2022, pp. 139–158

work page 2022
[44]

Learning source-invariant deep hashing convolutional neural networks for cross-source re- mote sensing image retrieval,

Y. Li, Y. Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source re- mote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens. , vol. 56, no. 11, pp. 6521–6536, 2018

work page 2018
[45]

CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,

U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,” Pattern Recognit. Lett., vol. 131, pp. 456– 462, 2020

work page 2020
[46]

A discriminative distillation network for cross-source remote sensing image retrieval,

W. Xiong, Z. Xiong, Y. Cui, and Y. Lv, “A discriminative distillation network for cross-source remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens. , vol. 13, pp. 1234–1247, 2020

work page 2020
[47]

Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,

Z. Cai, Y. Pan, and W. Jin, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 17, pp. 7759–7772, 2024

work page 2024
[48]

Image-to-image translation with conditional adversarial networks,

P . Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1125–1134

work page 2017
[49]

Unpaired image-to- image translation using cycle-consistent adversarial networks,

J.-Y. Zhu, T. Park, P . Isola, and A. A. Efros, “Unpaired image-to- image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2223–2232

work page 2017
[50]

Contrastive learning for unpaired image-to-image translation,

T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in Comput. Vis. ECCV , 2020, pp. 319–345

work page 2020
[51]

Multi- channel attention selection GAN with cascaded semantic guidance for cross-view image translation,

H. Tang, D. Xu, N. Sebe, Y. Wang, J. J. Corso, and Y. Yan, “Multi- channel attention selection GAN with cascaded semantic guidance for cross-view image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2417–2426

work page 2019
[52]

The dawn of KAN in image-to-image (I2I) translation: Integrating Kolmogorov-Arnold networks with GANs for unpaired I2I translation,

A. Mahara, N. D. Rishe, and L. Deng, “The dawn of KAN in image-to-image (I2I) translation: Integrating Kolmogorov-Arnold networks with GANs for unpaired I2I translation,” arXiv preprint arXiv:2408.08216, 2024

work page arXiv 2024
[53]

A novel it- erative PCA–based pansharpening method,

M. Ghadjati, A. Moussaoui, and A. Boukharouba, “A novel it- erative PCA–based pansharpening method,” Remote Sens. Lett. , vol. 10, no. 3, pp. 264–273, 2019

work page 2019
[54]

PanNet: A deep network architecture for pan-sharpening,

J. Yang, X. Fu, Y. Hu, Y. Huang, X. Ding, and J. Paisley, “PanNet: A deep network architecture for pan-sharpening,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5449–5457

work page 2017
[55]

SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,

L. He, Y. Rao, J. Li, J. Chanussot, A. Plaza, J. Zhu, and B. Li, “SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 12, no. 4, pp. 1188–1204, 2019

work page 2019
[56]

Pan-sharpening using an efficient bidirectional pyramid network,

Y. Zhang, C. Liu, M. Sun, and Y. Ou, “Pan-sharpening using an efficient bidirectional pyramid network,” IEEE T rans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5549–5563, 2019

work page 2019
[57]

A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,

Q. Yuan, Y. Wei, X. Meng, H. Shen, and L. Zhang, “A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 11, no. 3, pp. 978–989, 2018

work page 2018
[58]

A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,

J. Liu, M. Gong, K. Qin, and P . Zhang, “A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,” IEEE T rans. Neural Netw. Learn. Syst. , vol. 29, no. 3, pp. 545–559, 2016

work page 2016
[59]

Novel enhanced UNet for change detection using multimodal remote sensing image,

Z. Lv, H. Huang, W. Sun, T. Lei, J. A. Benediktsson, and J. Li, “Novel enhanced UNet for change detection using multimodal remote sensing image,” IEEE Geosci. Remote Sens. Lett. , vol. 20, pp. 1–5, 2023

work page 2023
[60]

SegFormer: Simple and efficient design for semantic segmentation with transformers,

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P . Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12 077–12 090, 2021

work page 2021
[61]

Jocher, A

G. Jocher, A. Stoken, J. Borovec et al., “YOLOv5,” https://github. com/ultralytics/yolov5, 2021

work page 2021
[62]

CSPNet: A new backbone that can enhance learning capability of CNN,

C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P .-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A new backbone that can enhance learning capability of CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 390–391

work page 2020
[63]

YOLOrs: Object detection in multimodal remote sensing imagery,

M. Sharma et al., “YOLOrs: Object detection in multimodal remote sensing imagery,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1497–1508, 2021

work page 2021
[64]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

work page 2016
[65]

CADDN: A content-aware downsampling-based de- tection method for small objects in remote sensing images,

L. Zhang, Y. Liu, X. Wang, Y. He, G. Li, Y. Zhang, C. Liu, Z. Jiang, and Y. Liu, “CADDN: A content-aware downsampling-based de- tection method for small objects in remote sensing images,” IEEE T rans. Geosci. Remote Sens., vol. 63, pp. 1–17, 2025

work page 2025
[66]

Towards large-scale small object detection: Survey and bench- marks,

G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards large-scale small object detection: Survey and bench- marks,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 467–13 488, 2023. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 18

work page 2023
[67]

De- tection and tracking meet drones challenge,

P . Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “De- tection and tracking meet drones challenge,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7380–7399, 2022

work page 2022
[68]

Object detection in aerial images: A large-scale benchmark and challenges,

J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y. Yang, S. Belongie, J. Luo, M. Datcu, M. Pelillo et al. , “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7778–7796, 2021

work page 2021
[69]

Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing,

X. Yang, J. Yan, W. Liao, X. Yang, J. Tang, and T. He, “Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 45, no. 2, pp. 2384–2399, 2022

work page 2022
[70]

Highly efficient and unsupervised framework for moving object detection in satellite videos,

C. Xiao, W. An, Y. Zhang, Z. Su, M. Li, W. Sheng, M. Pietik ¨ainen, and L. Liu, “Highly efficient and unsupervised framework for moving object detection in satellite videos,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 12, pp. 11 532–11 539, 2024

work page 2024
[71]

Hybrid Gaussian deformation for efficient remote sensing object detection,

W. Zhao, X. Zhang, H. Wang, and H. Lu, “Hybrid Gaussian deformation for efficient remote sensing object detection,” IEEE T rans. Pattern Anal. Mach. Intell., pp. 1–17, 2025

work page 2025
[72]

Circle loss: A unified perspective of pair similarity optimization,

Y. Sun, C. Cheng, Y. Zhang et al., “Circle loss: A unified perspective of pair similarity optimization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6398–6407

work page 2020
[73]

Person re-identification in the wild,

L. Zheng, H. Zhang, S. Sun et al. , “Person re-identification in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1367–1376

work page 2017
[74]

Sphereface: Deep hypersphere embedding for face recognition,

W. Liu, Y. Wen, Z. Yu et al. , “Sphereface: Deep hypersphere embedding for face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 212–220

work page 2017
[75]

Deep metric learning via lifted structured feature embedding,

H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2016, pp. 4004–4012

work page 2016
[76]

Dual-path convolutional image-text embeddings with instance loss,

Z. Zheng, L. Zheng, M. Garrett et al. , “Dual-path convolutional image-text embeddings with instance loss,” ACM T rans. Multime- dia Comput. Commun. Appl. , vol. 16, no. 2, pp. 1–23, 2020

work page 2020
[77]

Deep graph metric learning for weakly supervised person re-identification,

J. Meng, W.-S. Zheng, J.-H. Lai, and L. Wang, “Deep graph metric learning for weakly supervised person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 10, pp. 6074–6093, 2021

work page 2021
[78]

Adaptive sparse pairwise loss for object re-identification,

X. Zhou, Y. Zhong, Z. Cheng, F. Liang, and L. Ma, “Adaptive sparse pairwise loss for object re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2023, pp. 19 691–19 701

work page 2023
[79]

Deep learning for person re-identification: A survey and outlook,

M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, 2021

work page 2021
[80]

Weakly super- vised tracklet association learning with video labels for person re-identification,

M. Liu, Y. Bian, Q. Liu, X. Wang, and Y. Wang, “Weakly super- vised tracklet association learning with video labels for person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 5, pp. 3595–3607, 2024

work page 2024

Showing first 80 references.

[1] [1]

STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery,

Y. Li, L. Wang, T. Wang, X. Yang, J. Luo, Q. Wang, Y. Deng, W. Wang, X. Sun, H. Li, B. Dang, Y. Zhang, Y. Yu, and J. Yan, “STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1832–1849, 2025

work page 2025

[2] [2]

Learning to holistically detect bridges from large-size VHR remote sensing imagery,

Y. Li, J. Luo, Y. Zhang, Y. Tan, J.-G. Yu, and S. Bai, “Learning to holistically detect bridges from large-size VHR remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 11 507–11 523, 2024

work page 2024

[3] [3]

FarSeg++: Foreground-aware relation network for geospatial object segmen- tation in high spatial resolution remote sensing imagery,

Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “FarSeg++: Foreground-aware relation network for geospatial object segmen- tation in high spatial resolution remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 715–13 729, 2023

work page 2023

[4] [4]

Frequency- adaptive learning for SAR ship detection in clutter scenes,

L. Zhang, Y. Liu, W. Zhao, X. Wang, G. Li, and Y. He, “Frequency- adaptive learning for SAR ship detection in clutter scenes,” IEEE T rans. Geosci. Remote Sens., vol. 61, pp. 1–14, 2023

work page 2023

[5] [5]

MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,

Z. Yu, C. Liu, L. Liu, Z. Shi, and Z. Zou, “MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1764–1781, 2025

work page 2025

[6] [6]

A semi- supervised deep rule-based approach for complex satellite sensor image analysis,

X. Gu, P . P . Angelov, C. Zhang, and P . M. Atkinson, “A semi- supervised deep rule-based approach for complex satellite sensor image analysis,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 5, pp. 2281–2292, 2020

work page 2020

[7] [7]

LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,

Z.-R. Jin, T.-J. Zhang, T.-X. Jiang, G. Vivone, and L.-J. Deng, “LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 1, 2022, pp. 1113–1121

work page 2022

[8] [8]

Pan- sharpening by convolutional neural networks in the full resolution framework,

M. Ciotola, S. Vitale, A. Mazza, G. Poggi, and G. Scarpa, “Pan- sharpening by convolutional neural networks in the full resolution framework,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–17, 2022

work page 2022

[9] [9]

Detail injection- based deep convolutional neural networks for pansharpening,

L.-J. Deng, G. Vivone, C. Jin, and J. Chanussot, “Detail injection- based deep convolutional neural networks for pansharpening,” IEEE T rans. Geosci. Remote Sens., vol. 59, no. 8, pp. 6995–7010, 2020

work page 2020

[10] [10]

Vehicle detection in aerial imagery: A small target detection benchmark,

S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun. Image Repre- sent., vol. 34, pp. 187–203, 2016

work page 2016

[11] [11]

Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,

Y. Sun, B. Cao, P . Zhu, and Q. Hu, “Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,” IEEE T rans. Circuits Syst. Video T echnol., vol. 32, no. 10, pp. 6700– 6713, 2022

work page 2022

[12] [12]

Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery,

Q. Feng and Z. Wang, “Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery,” Pattern Recognit., vol. 130, p. 108786, 2022

work page 2022

[13] [13]

A new learning paradigm for foundation model-based remote-sensing change detection,

K. Li, X. Cao, and D. Meng, “A new learning paradigm for foundation model-based remote-sensing change detection,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–12, 2024

work page 2024

[14] [14]

Remote sensing image change detec- tion with transformers,

H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detec- tion with transformers,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–14, 2021

work page 2021

[15] [15]

A transformer-based siamese network for change detection,

W. G. C. Bandara and V . M. Patel, “A transformer-based siamese network for change detection,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2022, pp. 207–210

work page 2022

[16] [16]

Asymmetric feature fusion network for hyperspectral and SAR image classification,

W. Li, Y. Gao, M. Zhang, R. Tao, and Q. Du, “Asymmetric feature fusion network for hyperspectral and SAR image classification,” IEEE T rans. Neural Netw. Learn. Syst., vol. 34, no. 10, pp. 8057–8070, 2022

work page 2022

[17] [17]

ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,

J. Shen, Y. Chen, Y. Liu, X. Zuo, H. Fan, and W. Yang, “ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,” Pattern Recognit., vol. 145, p. 109913, 2024

work page 2024

[18] [18]

Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,

Q. Zhang, Z. Wang, X. Wang, G. Li, L. Huang, H. Song, and Z. Song, “Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,”J. Radars, vol. 13, no. R24037, p. 885, 2024

work page 2024

[19] [19]

Multiscale generative adversarial network based on wavelet feature learning for SAR-to-optical image translation,

H. Li, C. Gu, D. Wu, G. Cheng, L. Guo, and H. Liu, “Multiscale generative adversarial network based on wavelet feature learning for SAR-to-optical image translation,” IEEE T rans. Geosci. Remote Sens., vol. 60, pp. 1–15, 2022

work page 2022

[20] [20]

A semi-supervised image-to-image translation framework for SAR– optical image matching,

W.-L. Du, Y. Zhou, H. Zhu, J. Zhao, Z. Shao, and X. Tian, “A semi-supervised image-to-image translation framework for SAR– optical image matching,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022

work page 2022

[21] [21]

Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,

W. Cai, H. Zhang, J. Li, and M. Yu, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1319–1335, 2024

work page 2024

[22] [22]

Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,

Z. Wang, X. Wang, G. Li, and C. Li, “Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–17, 2024. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 17

work page 2024

[23] [23]

Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,

H. Wang, S. Li, J. Yang, Y. Liu, Y. Lv, and Z. Zhou, “Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,” arXiv preprint arXiv:2506.22027 , 2025

work page arXiv 2025

[24] [24]

MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,

L. Chen, L. Li, S. Wang, S. Gao, X. Ye et al., “MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,” Opt. Precision Eng. , vol. 31, no. 13, pp. 1962–1972, 2023

work page 1962

[25] [25]

A cross-modal fusion method for multispectral small ship detection,

Y. Liu, Y. Liu, X. Wang, L. Zhang, Z. Jiang, Y. Li, C. Yan, Y. Fu, and T. Zhang, “A cross-modal fusion method for multispectral small ship detection,” in Proc. Int. Conf. Inf. Fusion , 2024, pp. 1–6

work page 2024

[26] [26]

Spacenet 6: Multi-sensor all weather mapping dataset,

J. Shermeyer, D. Hogan, J. Brown, A. Van Etten, N. Weir et al. , “Spacenet 6: Multi-sensor all weather mapping dataset,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops , 2020, pp. 196–197

work page 2020

[27] [27]

The qxs-saropt dataset for deep learning in sar-optical data fusion. arxiv 2021,

M. Huang, Y. Xu, L. Qian, W. Shi, Y. Zhang, W. Bao, N. Wang, X. Liu, and X. Xiang, “The QXS-SAROPT dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:2103.08259 , 2021

work page arXiv 2021

[28] [28]

SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,

L. Wang, X. Xu, Y. Yu, R. Yang, R. Gui, Z. Xu, and F. Pu, “SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,” IEEE Access, vol. 7, pp. 129 136–129 149, 2019

work page 2019

[29] [29]

Pan-sharpening of quickbird satellite images using multiresolution techniques: Wavelets, contourlets, and curvelets,

J. F. Reinoso, “Pan-sharpening of quickbird satellite images using multiresolution techniques: Wavelets, contourlets, and curvelets,” Imaging Sci. J. , vol. 58, no. 3, pp. 125–135, 2010

work page 2010

[30] [30]

Worldview-2 pan-sharpening,

C. Padwick, M. Deskevich, F. Pacifici, and S. Smallwood, “Worldview-2 pan-sharpening,” in Proc. ASPRS Annu. Conf. , vol. 2630, 2010, pp. 1–14

work page 2010

[31] [31]

Application of different pan-sharpening methods on worldview- 3 images,

O. R. Belfiore, C. Meneghini, C. Parente, R. Santamaria et al. , “Application of different pan-sharpening methods on worldview- 3 images,” J. Eng. Appl. Sci. , vol. 11, no. 1, pp. 490–496, 2016

work page 2016

[32] [32]

Un- supervised image regression for heterogeneous change detection,

L. T. Luppino, F. M. Bianchi, G. Moser, and S. N. Anfinsen, “Un- supervised image regression for heterogeneous change detection,” arXiv preprint arXiv:1909.05948 , 2019

work page arXiv 1909

[33] [33]

A fractal projection and Markovian segmentation- based approach for multimodal change detection,

M. Mignotte, “A fractal projection and Markovian segmentation- based approach for multimodal change detection,” IEEE T rans. Geosci. Remote Sens. , vol. 58, no. 11, pp. 8046–8058, 2020

work page 2020

[34] [34]

An a-contrario approach for subpixel change detection in satellite imagery,

A. Robin, L. Moisan, and S. Le Hegarat-Mascle, “An a-contrario approach for subpixel change detection in satellite imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 32, no. 11, pp. 1977–1993, 2010

work page 1977

[35] [35]

The SEN1-2 Dataset for Deep Learning in SAR-Optical Data Fusion

M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:1807.01569, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[36] [36]

A comparative analysis of GAN-based methods for SAR-to-optical image translation,

Y. Zhao, T. Celik, N. Liu, and H.-C. Li, “A comparative analysis of GAN-based methods for SAR-to-optical image translation,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022

work page 2022

[37] [37]

SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,

Y. Xia, W. He, Q. Huang, H. Chen, H. Huang, and H. Zhang, “SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,” IEEE T rans. Geosci. Remote Sens. , vol. 62, pp. 1–19, 2024

work page 2024

[38] [38]

GF2 PMS remote sensing imagery,

China Center for Resources Satellite Data and Application, “GF2 PMS remote sensing imagery,” https://www.chinageoss. cn/, 2017

work page 2017

[39] [39]

A large-scale benchmark data set for evaluating pansharpening performance: Overview and implemen- tation,

X. Meng, Y. Xiong, F. Shao, H. Shen, W. Sun, G. Yang, Q. Yuan, R. Fu, and H. Zhang, “A large-scale benchmark data set for evaluating pansharpening performance: Overview and implemen- tation,” IEEE Geosci. Remote Sens. Mag. , vol. 9, no. 1, pp. 18–52, 2020

work page 2020

[40] [40]

Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine,

F. Gao, J. Dong, B. Li, Q. Xu, and C. Xie, “Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine,” J. Appl. Remote Sens. , vol. 10, no. 4, p. 046019, 2016

work page 2016

[41] [41]

A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images,

C. Zhang, Y. Feng, L. Hu, D. Tapete, L. Pan, Z. Liang, F. Cigna, and P . Yue, “A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images,” Int. J. Appl. Earth Observ. Geoinf. , vol. 109, p. 102769, 2022

work page 2022

[42] [42]

Cross-modality fusion transformer for multispectral object detection,

Q. Feng, D. Han, and Z. Wang, “Cross-modality fusion transformer for multispectral object detection,” arXiv preprint arXiv:2111.00273, 2021

work page arXiv 2021

[43] [43]

Multimodal object detection via probabilistic ensembling,

Y.-T. Chen, J. Shi, Z. Ye, C. Mertz, D. Ramanan, and S. Kong, “Multimodal object detection via probabilistic ensembling,” in Proc. Comput. Vis. ECCV , 2022, pp. 139–158

work page 2022

[44] [44]

Learning source-invariant deep hashing convolutional neural networks for cross-source re- mote sensing image retrieval,

Y. Li, Y. Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source re- mote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens. , vol. 56, no. 11, pp. 6521–6536, 2018

work page 2018

[45] [45]

CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,

U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,” Pattern Recognit. Lett., vol. 131, pp. 456– 462, 2020

work page 2020

[46] [46]

A discriminative distillation network for cross-source remote sensing image retrieval,

W. Xiong, Z. Xiong, Y. Cui, and Y. Lv, “A discriminative distillation network for cross-source remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens. , vol. 13, pp. 1234–1247, 2020

work page 2020

[47] [47]

Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,

Z. Cai, Y. Pan, and W. Jin, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 17, pp. 7759–7772, 2024

work page 2024

[48] [48]

Image-to-image translation with conditional adversarial networks,

P . Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1125–1134

work page 2017

[49] [49]

Unpaired image-to- image translation using cycle-consistent adversarial networks,

J.-Y. Zhu, T. Park, P . Isola, and A. A. Efros, “Unpaired image-to- image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2223–2232

work page 2017

[50] [50]

Contrastive learning for unpaired image-to-image translation,

T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in Comput. Vis. ECCV , 2020, pp. 319–345

work page 2020

[51] [51]

Multi- channel attention selection GAN with cascaded semantic guidance for cross-view image translation,

H. Tang, D. Xu, N. Sebe, Y. Wang, J. J. Corso, and Y. Yan, “Multi- channel attention selection GAN with cascaded semantic guidance for cross-view image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2417–2426

work page 2019

[52] [52]

The dawn of KAN in image-to-image (I2I) translation: Integrating Kolmogorov-Arnold networks with GANs for unpaired I2I translation,

A. Mahara, N. D. Rishe, and L. Deng, “The dawn of KAN in image-to-image (I2I) translation: Integrating Kolmogorov-Arnold networks with GANs for unpaired I2I translation,” arXiv preprint arXiv:2408.08216, 2024

work page arXiv 2024

[53] [53]

A novel it- erative PCA–based pansharpening method,

M. Ghadjati, A. Moussaoui, and A. Boukharouba, “A novel it- erative PCA–based pansharpening method,” Remote Sens. Lett. , vol. 10, no. 3, pp. 264–273, 2019

work page 2019

[54] [54]

PanNet: A deep network architecture for pan-sharpening,

J. Yang, X. Fu, Y. Hu, Y. Huang, X. Ding, and J. Paisley, “PanNet: A deep network architecture for pan-sharpening,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5449–5457

work page 2017

[55] [55]

SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,

L. He, Y. Rao, J. Li, J. Chanussot, A. Plaza, J. Zhu, and B. Li, “SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 12, no. 4, pp. 1188–1204, 2019

work page 2019

[56] [56]

Pan-sharpening using an efficient bidirectional pyramid network,

Y. Zhang, C. Liu, M. Sun, and Y. Ou, “Pan-sharpening using an efficient bidirectional pyramid network,” IEEE T rans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5549–5563, 2019

work page 2019

[57] [57]

A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,

Q. Yuan, Y. Wei, X. Meng, H. Shen, and L. Zhang, “A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 11, no. 3, pp. 978–989, 2018

work page 2018

[58] [58]

A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,

J. Liu, M. Gong, K. Qin, and P . Zhang, “A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,” IEEE T rans. Neural Netw. Learn. Syst. , vol. 29, no. 3, pp. 545–559, 2016

work page 2016

[59] [59]

Novel enhanced UNet for change detection using multimodal remote sensing image,

Z. Lv, H. Huang, W. Sun, T. Lei, J. A. Benediktsson, and J. Li, “Novel enhanced UNet for change detection using multimodal remote sensing image,” IEEE Geosci. Remote Sens. Lett. , vol. 20, pp. 1–5, 2023

work page 2023

[60] [60]

SegFormer: Simple and efficient design for semantic segmentation with transformers,

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P . Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12 077–12 090, 2021

work page 2021

[61] [61]

Jocher, A

G. Jocher, A. Stoken, J. Borovec et al., “YOLOv5,” https://github. com/ultralytics/yolov5, 2021

work page 2021

[62] [62]

CSPNet: A new backbone that can enhance learning capability of CNN,

C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P .-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A new backbone that can enhance learning capability of CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 390–391

work page 2020

[63] [63]

YOLOrs: Object detection in multimodal remote sensing imagery,

M. Sharma et al., “YOLOrs: Object detection in multimodal remote sensing imagery,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1497–1508, 2021

work page 2021

[64] [64]

Deep residual learning for image recognition,

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778

work page 2016

[65] [65]

CADDN: A content-aware downsampling-based de- tection method for small objects in remote sensing images,

L. Zhang, Y. Liu, X. Wang, Y. He, G. Li, Y. Zhang, C. Liu, Z. Jiang, and Y. Liu, “CADDN: A content-aware downsampling-based de- tection method for small objects in remote sensing images,” IEEE T rans. Geosci. Remote Sens., vol. 63, pp. 1–17, 2025

work page 2025

[66] [66]

Towards large-scale small object detection: Survey and bench- marks,

G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards large-scale small object detection: Survey and bench- marks,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 467–13 488, 2023. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 18

work page 2023

[67] [67]

De- tection and tracking meet drones challenge,

P . Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “De- tection and tracking meet drones challenge,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7380–7399, 2022

work page 2022

[68] [68]

Object detection in aerial images: A large-scale benchmark and challenges,

J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y. Yang, S. Belongie, J. Luo, M. Datcu, M. Pelillo et al. , “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7778–7796, 2021

work page 2021

[69] [69]

Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing,

X. Yang, J. Yan, W. Liao, X. Yang, J. Tang, and T. He, “Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 45, no. 2, pp. 2384–2399, 2022

work page 2022

[70] [70]

Highly efficient and unsupervised framework for moving object detection in satellite videos,

C. Xiao, W. An, Y. Zhang, Z. Su, M. Li, W. Sheng, M. Pietik ¨ainen, and L. Liu, “Highly efficient and unsupervised framework for moving object detection in satellite videos,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 12, pp. 11 532–11 539, 2024

work page 2024

[71] [71]

Hybrid Gaussian deformation for efficient remote sensing object detection,

W. Zhao, X. Zhang, H. Wang, and H. Lu, “Hybrid Gaussian deformation for efficient remote sensing object detection,” IEEE T rans. Pattern Anal. Mach. Intell., pp. 1–17, 2025

work page 2025

[72] [72]

Circle loss: A unified perspective of pair similarity optimization,

Y. Sun, C. Cheng, Y. Zhang et al., “Circle loss: A unified perspective of pair similarity optimization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6398–6407

work page 2020

[73] [73]

Person re-identification in the wild,

L. Zheng, H. Zhang, S. Sun et al. , “Person re-identification in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1367–1376

work page 2017

[74] [74]

Sphereface: Deep hypersphere embedding for face recognition,

W. Liu, Y. Wen, Z. Yu et al. , “Sphereface: Deep hypersphere embedding for face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 212–220

work page 2017

[75] [75]

Deep metric learning via lifted structured feature embedding,

H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2016, pp. 4004–4012

work page 2016

[76] [76]

Dual-path convolutional image-text embeddings with instance loss,

Z. Zheng, L. Zheng, M. Garrett et al. , “Dual-path convolutional image-text embeddings with instance loss,” ACM T rans. Multime- dia Comput. Commun. Appl. , vol. 16, no. 2, pp. 1–23, 2020

work page 2020

[77] [77]

Deep graph metric learning for weakly supervised person re-identification,

J. Meng, W.-S. Zheng, J.-H. Lai, and L. Wang, “Deep graph metric learning for weakly supervised person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 10, pp. 6074–6093, 2021

work page 2021

[78] [78]

Adaptive sparse pairwise loss for object re-identification,

X. Zhou, Y. Zhong, Z. Cheng, F. Liang, and L. Ma, “Adaptive sparse pairwise loss for object re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2023, pp. 19 691–19 701

work page 2023

[79] [79]

Deep learning for person re-identification: A survey and outlook,

M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, 2021

work page 2021

[80] [80]

Weakly super- vised tracklet association learning with video labels for person re-identification,

M. Liu, Y. Bian, Q. Liu, X. Wang, and Y. Wang, “Weakly super- vised tracklet association learning with video labels for person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 5, pp. 3595–3607, 2024

work page 2024