SMART-Ship: A Comprehensive Synchronized Multi-modal Aligned Remote Sensing Targets Dataset and Benchmark for Berthed Ships Analysis
Pith reviewed 2026-05-19 00:52 UTC · model grok-4.3
The pith
A new dataset synchronizes five remote sensing modalities for berthed ship analysis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The SMART-Ship dataset consists of 1092 multi-modal image sets acquired within one week, each registered for spatiotemporal consistency and annotated with polygonal ship locations, fine-grained categories, instance-level identifiers, and change region masks, thereby enabling standardized benchmarks for five fundamental multi-modal remote sensing tasks on berthed ships.
What carries the argument
The SMART-Ship dataset itself, built from 1092 aligned multi-modal image sets and hierarchical annotations that organize 38,838 ship instances to serve multiple interpretation tasks.
If this is right
- Representative methods can be compared on standardized benchmarks across five multi-modal tasks.
- The dataset supports ship detection, classification, instance identification, and change detection in maritime settings.
- Hierarchical annotations allow flexible use for both coarse and fine-grained remote sensing problems.
- Evaluations indicate the data can reveal directions for improving multi-modal fusion techniques.
Where Pith is reading between the lines
- Instance-level identifiers could enable tracking the same ships across different modalities and times.
- The registration approach might transfer to other dynamic targets such as vehicles or aircraft in remote sensing.
- Expanding the one-week acquisition window could test longer-term change analysis.
- Models trained to fuse all five modalities simultaneously could be evaluated directly on the existing splits.
Load-bearing premise
Each multi-modal image set is accurately registered for spatiotemporal consistency and the hierarchical annotations are reliable enough to support the claimed range of tasks.
What would settle it
Discovery of widespread misalignment between modalities or annotation errors that cause all benchmarked methods to fail on the defined tasks would show the dataset does not support the intended multi-modal analysis.
Figures
read the original abstract
Given the limitations of satellite orbits and imaging conditions, multi-modal remote sensing (RS) data is crucial in enabling long-term earth observation. However, maritime surveillance remains challenging due to the complexity of multi-scale targets and the dynamic environments. To bridge this critical gap, we propose a Synchronized Multi-modal Aligned Remote sensing Targets dataset for berthed ships analysis (SMART-Ship), containing spatiotemporal registered images with fine-grained annotation for maritime targets from five modalities: visible-light, synthetic aperture radar (SAR), panchromatic, multi-spectral, and near-infrared. Specifically, our dataset consists of 1092 multi-modal image sets, covering 38,838 ships. Each image set is acquired within one week and registered to ensure spatiotemporal consistency. Ship instances in each set are annotated with polygonal location information, fine-grained categories, instance-level identifiers, and change region masks, organized hierarchically to support diverse multi-modal RS tasks. Furthermore, we define standardized benchmarks on five fundamental tasks and comprehensively compare representative methods across the dataset. Thorough experiment evaluations validate that the proposed SMART-Ship dataset could support various multi-modal RS interpretation tasks and reveal the promising directions for further exploration.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces the SMART-Ship dataset consisting of 1092 multi-modal image sets (visible-light, SAR, panchromatic, multi-spectral, NIR) of berthed ships, totaling 38,838 instances. Each set is described as acquired within one week and registered for spatiotemporal consistency, with hierarchical annotations providing polygonal locations, fine-grained categories, instance-level identifiers, and change region masks. The authors establish standardized benchmarks for five fundamental tasks, compare representative methods, and conclude that the dataset supports diverse multi-modal remote sensing interpretation tasks while highlighting promising research directions.
Significance. If the claimed alignments and annotations can be shown to be reliable through quantitative validation, the dataset would be a meaningful addition to remote sensing resources for maritime surveillance. Its scale, five-modality coverage, change masks, and hierarchical structure could enable progress on cross-modal fusion, instance tracking, and change detection in port environments where existing datasets are limited. The inclusion of benchmark experiments is a constructive element that could help standardize evaluation in this sub-area.
major comments (2)
- [Abstract and Dataset Construction] Abstract and Dataset Construction section: The central claim that the 1092 image sets provide 'spatiotemporal consistency' via registration rests only on the statement that images were 'acquired within one week and registered.' No quantitative registration accuracy metrics (RMSE, overlap coefficients, or mutual-information scores) are reported, particularly for the geometrically and radiometrically challenging SAR-optical pairs. This directly affects whether the dataset can actually support the claimed multi-modal tasks and is therefore load-bearing.
- [Annotation and Benchmark sections] Annotation description and Benchmark sections: The manuscript describes the annotations as 'fine-grained' and 'hierarchical' but reports no inter-annotator agreement statistics, validation protocol, or reliability measures for the polygonal locations, categories, instance IDs, or change masks. Without these, the soundness of the five benchmark tasks and the experimental comparisons cannot be fully assessed.
minor comments (2)
- [Abstract] The abstract and introduction could more explicitly list the five benchmark tasks and the representative methods compared, to improve immediate readability.
- [Dataset Construction] Consider including sensor specifications, exact acquisition dates, and geographic coverage details in the dataset section to strengthen reproducibility.
Simulated Author's Rebuttal
We thank the referee for the thorough review and insightful comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract and Dataset Construction] Abstract and Dataset Construction section: The central claim that the 1092 image sets provide 'spatiotemporal consistency' via registration rests only on the statement that images were 'acquired within one week and registered.' No quantitative registration accuracy metrics (RMSE, overlap coefficients, or mutual-information scores) are reported, particularly for the geometrically and radiometrically challenging SAR-optical pairs. This directly affects whether the dataset can actually support the claimed multi-modal tasks and is therefore load-bearing.
Authors: We agree with the referee that providing quantitative metrics for the registration accuracy is essential to substantiate the spatiotemporal consistency of the dataset. In the revised manuscript, we will include a detailed account of the registration procedure along with quantitative validation metrics, such as RMSE for geometric alignment and overlap coefficients or mutual information scores for the multi-modal image sets. Special emphasis will be placed on the SAR-optical pairs to address the challenges mentioned. revision: yes
-
Referee: [Annotation and Benchmark sections] Annotation description and Benchmark sections: The manuscript describes the annotations as 'fine-grained' and 'hierarchical' but reports no inter-annotator agreement statistics, validation protocol, or reliability measures for the polygonal locations, categories, instance IDs, or change masks. Without these, the soundness of the five benchmark tasks and the experimental comparisons cannot be fully assessed.
Authors: We recognize the value of reporting inter-annotator agreement and validation details to ensure the quality and reliability of the annotations. Accordingly, we will revise the Annotation section to describe the annotation process in greater detail, including the involvement of multiple annotators, the protocol used for quality control and disagreement resolution, and quantitative reliability measures such as inter-annotator agreement scores for the various annotation types (polygonal locations, categories, instance identifiers, and change region masks). revision: yes
Circularity Check
No significant circularity; dataset contribution is independent
full rationale
The paper introduces a new multi-modal remote sensing dataset (SMART-Ship) consisting of 1092 image sets with annotations and defines five benchmark tasks with empirical comparisons. No mathematical derivations, predictions, or equations are present that reduce claims to fitted parameters or self-citations by construction. The central premise rests on data collection, registration, and hierarchical annotations rather than any self-definitional or load-bearing circular steps. This is a standard self-contained dataset paper with no reduction of results to inputs.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Standard remote sensing registration methods can achieve spatiotemporal consistency across modalities when images are acquired within one week.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose a Synchronized Multi-modal Aligned Remote sensing Targets dataset for berthed ships analysis (SMART-Ship), containing spatiotemporal registered images with fine-grained annotation for maritime targets from five modalities...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Thorough experiment evaluations validate that the proposed SMART-Ship dataset could support various multi-modal RS interpretation tasks
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Y. Li, L. Wang, T. Wang, X. Yang, J. Luo, Q. Wang, Y. Deng, W. Wang, X. Sun, H. Li, B. Dang, Y. Zhang, Y. Yu, and J. Yan, “STAR: A first-ever dataset and a large-scale benchmark for scene graph generation in large-size satellite imagery,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1832–1849, 2025
work page 2025
-
[2]
Learning to holistically detect bridges from large-size VHR remote sensing imagery,
Y. Li, J. Luo, Y. Zhang, Y. Tan, J.-G. Yu, and S. Bai, “Learning to holistically detect bridges from large-size VHR remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 11 507–11 523, 2024
work page 2024
-
[3]
Z. Zheng, Y. Zhong, J. Wang, A. Ma, and L. Zhang, “FarSeg++: Foreground-aware relation network for geospatial object segmen- tation in high spatial resolution remote sensing imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 715–13 729, 2023
work page 2023
-
[4]
Frequency- adaptive learning for SAR ship detection in clutter scenes,
L. Zhang, Y. Liu, W. Zhao, X. Wang, G. Li, and Y. He, “Frequency- adaptive learning for SAR ship detection in clutter scenes,” IEEE T rans. Geosci. Remote Sens., vol. 61, pp. 1–14, 2023
work page 2023
-
[5]
MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,
Z. Yu, C. Liu, L. Liu, Z. Shi, and Z. Zou, “MetaEarth: A generative foundation model for global-scale remote sensing image gener- ation,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 47, no. 3, pp. 1764–1781, 2025
work page 2025
-
[6]
A semi- supervised deep rule-based approach for complex satellite sensor image analysis,
X. Gu, P . P . Angelov, C. Zhang, and P . M. Atkinson, “A semi- supervised deep rule-based approach for complex satellite sensor image analysis,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 5, pp. 2281–2292, 2020
work page 2020
-
[7]
LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,
Z.-R. Jin, T.-J. Zhang, T.-X. Jiang, G. Vivone, and L.-J. Deng, “LAG- Conv: Local-context adaptive convolution kernels with global harmonic bias for pansharpening,” in Proc. AAAI Conf. Artif. Intell., vol. 36, no. 1, 2022, pp. 1113–1121
work page 2022
-
[8]
Pan- sharpening by convolutional neural networks in the full resolution framework,
M. Ciotola, S. Vitale, A. Mazza, G. Poggi, and G. Scarpa, “Pan- sharpening by convolutional neural networks in the full resolution framework,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–17, 2022
work page 2022
-
[9]
Detail injection- based deep convolutional neural networks for pansharpening,
L.-J. Deng, G. Vivone, C. Jin, and J. Chanussot, “Detail injection- based deep convolutional neural networks for pansharpening,” IEEE T rans. Geosci. Remote Sens., vol. 59, no. 8, pp. 6995–7010, 2020
work page 2020
-
[10]
Vehicle detection in aerial imagery: A small target detection benchmark,
S. Razakarivony and F. Jurie, “Vehicle detection in aerial imagery: A small target detection benchmark,” J. Vis. Commun. Image Repre- sent., vol. 34, pp. 187–203, 2016
work page 2016
-
[11]
Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,
Y. Sun, B. Cao, P . Zhu, and Q. Hu, “Drone-based RGB-infrared cross-modality vehicle detection via uncertainty-aware learning,” IEEE T rans. Circuits Syst. Video T echnol., vol. 32, no. 10, pp. 6700– 6713, 2022
work page 2022
-
[12]
Q. Feng and Z. Wang, “Cross-modality attentive feature fusion for object detection in multispectral remote sensing imagery,” Pattern Recognit., vol. 130, p. 108786, 2022
work page 2022
-
[13]
A new learning paradigm for foundation model-based remote-sensing change detection,
K. Li, X. Cao, and D. Meng, “A new learning paradigm for foundation model-based remote-sensing change detection,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–12, 2024
work page 2024
-
[14]
Remote sensing image change detec- tion with transformers,
H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detec- tion with transformers,” IEEE T rans. Geosci. Remote Sens. , vol. 60, pp. 1–14, 2021
work page 2021
-
[15]
A transformer-based siamese network for change detection,
W. G. C. Bandara and V . M. Patel, “A transformer-based siamese network for change detection,” in Proc. IEEE Int. Geosci. Remote Sens. Symp., 2022, pp. 207–210
work page 2022
-
[16]
Asymmetric feature fusion network for hyperspectral and SAR image classification,
W. Li, Y. Gao, M. Zhang, R. Tao, and Q. Du, “Asymmetric feature fusion network for hyperspectral and SAR image classification,” IEEE T rans. Neural Netw. Learn. Syst., vol. 34, no. 10, pp. 8057–8070, 2022
work page 2022
-
[17]
ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,
J. Shen, Y. Chen, Y. Liu, X. Zuo, H. Fan, and W. Yang, “ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection,” Pattern Recognit., vol. 145, p. 109913, 2024
work page 2024
-
[18]
Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,
Q. Zhang, Z. Wang, X. Wang, G. Li, L. Huang, H. Song, and Z. Song, “Cooperative ship detection in optical and SAR remote sensing images based on neighborhood saliency,”J. Radars, vol. 13, no. R24037, p. 885, 2024
work page 2024
-
[19]
H. Li, C. Gu, D. Wu, G. Cheng, L. Guo, and H. Liu, “Multiscale generative adversarial network based on wavelet feature learning for SAR-to-optical image translation,” IEEE T rans. Geosci. Remote Sens., vol. 60, pp. 1–15, 2022
work page 2022
-
[20]
A semi-supervised image-to-image translation framework for SAR– optical image matching,
W.-L. Du, Y. Zhou, H. Zhu, J. Zhao, Z. Shao, and X. Tian, “A semi-supervised image-to-image translation framework for SAR– optical image matching,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022
work page 2022
-
[21]
Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,
W. Cai, H. Zhang, J. Li, and M. Yu, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1319–1335, 2024
work page 2024
-
[22]
Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,
Z. Wang, X. Wang, G. Li, and C. Li, “Robust cross-modal remote sensing image retrieval via maximal correlation augmentation,” IEEE T rans. Geosci. Remote Sens., vol. 62, pp. 1–17, 2024. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 17
work page 2024
-
[23]
Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,
H. Wang, S. Li, J. Yang, Y. Liu, Y. Lv, and Z. Zhou, “Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,” arXiv preprint arXiv:2506.22027 , 2025
-
[24]
MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,
L. Chen, L. Li, S. Wang, S. Gao, X. Ye et al., “MMShip: A medium- resolution multispectral satellite imagery dataset for ship detec- tion,” Opt. Precision Eng. , vol. 31, no. 13, pp. 1962–1972, 2023
work page 1962
-
[25]
A cross-modal fusion method for multispectral small ship detection,
Y. Liu, Y. Liu, X. Wang, L. Zhang, Z. Jiang, Y. Li, C. Yan, Y. Fu, and T. Zhang, “A cross-modal fusion method for multispectral small ship detection,” in Proc. Int. Conf. Inf. Fusion , 2024, pp. 1–6
work page 2024
-
[26]
Spacenet 6: Multi-sensor all weather mapping dataset,
J. Shermeyer, D. Hogan, J. Brown, A. Van Etten, N. Weir et al. , “Spacenet 6: Multi-sensor all weather mapping dataset,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops , 2020, pp. 196–197
work page 2020
-
[27]
The qxs-saropt dataset for deep learning in sar-optical data fusion. arxiv 2021,
M. Huang, Y. Xu, L. Qian, W. Shi, Y. Zhang, W. Bao, N. Wang, X. Liu, and X. Xiang, “The QXS-SAROPT dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:2103.08259 , 2021
-
[28]
SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,
L. Wang, X. Xu, Y. Yu, R. Yang, R. Gui, Z. Xu, and F. Pu, “SAR-to- optical image translation using supervised cycle-consistent adver- sarial networks,” IEEE Access, vol. 7, pp. 129 136–129 149, 2019
work page 2019
-
[29]
J. F. Reinoso, “Pan-sharpening of quickbird satellite images using multiresolution techniques: Wavelets, contourlets, and curvelets,” Imaging Sci. J. , vol. 58, no. 3, pp. 125–135, 2010
work page 2010
-
[30]
C. Padwick, M. Deskevich, F. Pacifici, and S. Smallwood, “Worldview-2 pan-sharpening,” in Proc. ASPRS Annu. Conf. , vol. 2630, 2010, pp. 1–14
work page 2010
-
[31]
Application of different pan-sharpening methods on worldview- 3 images,
O. R. Belfiore, C. Meneghini, C. Parente, R. Santamaria et al. , “Application of different pan-sharpening methods on worldview- 3 images,” J. Eng. Appl. Sci. , vol. 11, no. 1, pp. 490–496, 2016
work page 2016
-
[32]
Un- supervised image regression for heterogeneous change detection,
L. T. Luppino, F. M. Bianchi, G. Moser, and S. N. Anfinsen, “Un- supervised image regression for heterogeneous change detection,” arXiv preprint arXiv:1909.05948 , 2019
-
[33]
A fractal projection and Markovian segmentation- based approach for multimodal change detection,
M. Mignotte, “A fractal projection and Markovian segmentation- based approach for multimodal change detection,” IEEE T rans. Geosci. Remote Sens. , vol. 58, no. 11, pp. 8046–8058, 2020
work page 2020
-
[34]
An a-contrario approach for subpixel change detection in satellite imagery,
A. Robin, L. Moisan, and S. Le Hegarat-Mascle, “An a-contrario approach for subpixel change detection in satellite imagery,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 32, no. 11, pp. 1977–1993, 2010
work page 1977
-
[35]
The SEN1-2 Dataset for Deep Learning in SAR-Optical Data Fusion
M. Schmitt, L. H. Hughes, and X. X. Zhu, “The SEN1-2 dataset for deep learning in SAR-optical data fusion,” arXiv preprint arXiv:1807.01569, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[36]
A comparative analysis of GAN-based methods for SAR-to-optical image translation,
Y. Zhao, T. Celik, N. Liu, and H.-C. Li, “A comparative analysis of GAN-based methods for SAR-to-optical image translation,” IEEE Geosci. Remote Sens. Lett. , vol. 19, pp. 1–5, 2022
work page 2022
-
[37]
SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,
Y. Xia, W. He, Q. Huang, H. Chen, H. Huang, and H. Zhang, “SOSSF: Landsat-8 image synthesis on the blending of sentinel- 1 and MODIS data,” IEEE T rans. Geosci. Remote Sens. , vol. 62, pp. 1–19, 2024
work page 2024
-
[38]
GF2 PMS remote sensing imagery,
China Center for Resources Satellite Data and Application, “GF2 PMS remote sensing imagery,” https://www.chinageoss. cn/, 2017
work page 2017
-
[39]
X. Meng, Y. Xiong, F. Shao, H. Shen, W. Sun, G. Yang, Q. Yuan, R. Fu, and H. Zhang, “A large-scale benchmark data set for evaluating pansharpening performance: Overview and implemen- tation,” IEEE Geosci. Remote Sens. Mag. , vol. 9, no. 1, pp. 18–52, 2020
work page 2020
-
[40]
F. Gao, J. Dong, B. Li, Q. Xu, and C. Xie, “Change detection from synthetic aperture radar images based on neighborhood-based ratio and extreme learning machine,” J. Appl. Remote Sens. , vol. 10, no. 4, p. 046019, 2016
work page 2016
-
[41]
C. Zhang, Y. Feng, L. Hu, D. Tapete, L. Pan, Z. Liang, F. Cigna, and P . Yue, “A domain adaptation neural network for change detection with heterogeneous optical and SAR remote sensing images,” Int. J. Appl. Earth Observ. Geoinf. , vol. 109, p. 102769, 2022
work page 2022
-
[42]
Cross-modality fusion transformer for multispectral object detection,
Q. Feng, D. Han, and Z. Wang, “Cross-modality fusion transformer for multispectral object detection,” arXiv preprint arXiv:2111.00273, 2021
-
[43]
Multimodal object detection via probabilistic ensembling,
Y.-T. Chen, J. Shi, Z. Ye, C. Mertz, D. Ramanan, and S. Kong, “Multimodal object detection via probabilistic ensembling,” in Proc. Comput. Vis. ECCV , 2022, pp. 139–158
work page 2022
-
[44]
Y. Li, Y. Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source re- mote sensing image retrieval,” IEEE T rans. Geosci. Remote Sens. , vol. 56, no. 11, pp. 6521–6536, 2018
work page 2018
-
[45]
CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,
U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “CMIR-NET: A deep learning based model for cross-modal re- trieval in remote sensing,” Pattern Recognit. Lett., vol. 131, pp. 456– 462, 2020
work page 2020
-
[46]
A discriminative distillation network for cross-source remote sensing image retrieval,
W. Xiong, Z. Xiong, Y. Cui, and Y. Lv, “A discriminative distillation network for cross-source remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens. , vol. 13, pp. 1234–1247, 2020
work page 2020
-
[47]
Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,
Z. Cai, Y. Pan, and W. Jin, “Proxy-based rotation invariant deep metric learning for remote sensing image retrieval,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 17, pp. 7759–7772, 2024
work page 2024
-
[48]
Image-to-image translation with conditional adversarial networks,
P . Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1125–1134
work page 2017
-
[49]
Unpaired image-to- image translation using cycle-consistent adversarial networks,
J.-Y. Zhu, T. Park, P . Isola, and A. A. Efros, “Unpaired image-to- image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. , 2017, pp. 2223–2232
work page 2017
-
[50]
Contrastive learning for unpaired image-to-image translation,
T. Park, A. A. Efros, R. Zhang, and J.-Y. Zhu, “Contrastive learning for unpaired image-to-image translation,” in Comput. Vis. ECCV , 2020, pp. 319–345
work page 2020
-
[51]
H. Tang, D. Xu, N. Sebe, Y. Wang, J. J. Corso, and Y. Yan, “Multi- channel attention selection GAN with cascaded semantic guidance for cross-view image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2019, pp. 2417–2426
work page 2019
-
[52]
A. Mahara, N. D. Rishe, and L. Deng, “The dawn of KAN in image-to-image (I2I) translation: Integrating Kolmogorov-Arnold networks with GANs for unpaired I2I translation,” arXiv preprint arXiv:2408.08216, 2024
-
[53]
A novel it- erative PCA–based pansharpening method,
M. Ghadjati, A. Moussaoui, and A. Boukharouba, “A novel it- erative PCA–based pansharpening method,” Remote Sens. Lett. , vol. 10, no. 3, pp. 264–273, 2019
work page 2019
-
[54]
PanNet: A deep network architecture for pan-sharpening,
J. Yang, X. Fu, Y. Hu, Y. Huang, X. Ding, and J. Paisley, “PanNet: A deep network architecture for pan-sharpening,” in Proc. IEEE Int. Conf. Comput. Vis., 2017, pp. 5449–5457
work page 2017
-
[55]
SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,
L. He, Y. Rao, J. Li, J. Chanussot, A. Plaza, J. Zhu, and B. Li, “SAR- to-optical image translation using supervised cycle-consistent ad- versarial networks,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 12, no. 4, pp. 1188–1204, 2019
work page 2019
-
[56]
Pan-sharpening using an efficient bidirectional pyramid network,
Y. Zhang, C. Liu, M. Sun, and Y. Ou, “Pan-sharpening using an efficient bidirectional pyramid network,” IEEE T rans. Geosci. Remote Sens., vol. 57, no. 8, pp. 5549–5563, 2019
work page 2019
-
[57]
A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,
Q. Yuan, Y. Wei, X. Meng, H. Shen, and L. Zhang, “A multiscale and multidepth convolutional neural network for remote sensing imagery pan-sharpening,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 11, no. 3, pp. 978–989, 2018
work page 2018
-
[58]
J. Liu, M. Gong, K. Qin, and P . Zhang, “A deep convolutional coupling network for change detection based on heterogeneous optical and radar images,” IEEE T rans. Neural Netw. Learn. Syst. , vol. 29, no. 3, pp. 545–559, 2016
work page 2016
-
[59]
Novel enhanced UNet for change detection using multimodal remote sensing image,
Z. Lv, H. Huang, W. Sun, T. Lei, J. A. Benediktsson, and J. Li, “Novel enhanced UNet for change detection using multimodal remote sensing image,” IEEE Geosci. Remote Sens. Lett. , vol. 20, pp. 1–5, 2023
work page 2023
-
[60]
SegFormer: Simple and efficient design for semantic segmentation with transformers,
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P . Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Adv. Neural Inf. Process. Syst. , vol. 34, pp. 12 077–12 090, 2021
work page 2021
- [61]
-
[62]
CSPNet: A new backbone that can enhance learning capability of CNN,
C.-Y. Wang, H.-Y. M. Liao, Y.-H. Wu, P .-Y. Chen, J.-W. Hsieh, and I.-H. Yeh, “CSPNet: A new backbone that can enhance learning capability of CNN,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. Workshops, 2020, pp. 390–391
work page 2020
-
[63]
YOLOrs: Object detection in multimodal remote sensing imagery,
M. Sharma et al., “YOLOrs: Object detection in multimodal remote sensing imagery,” IEEE J. Sel. T opics Appl. Earth Observ. Remote Sens., vol. 14, pp. 1497–1508, 2021
work page 2021
-
[64]
Deep residual learning for image recognition,
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, pp. 770–778
work page 2016
-
[65]
L. Zhang, Y. Liu, X. Wang, Y. He, G. Li, Y. Zhang, C. Liu, Z. Jiang, and Y. Liu, “CADDN: A content-aware downsampling-based de- tection method for small objects in remote sensing images,” IEEE T rans. Geosci. Remote Sens., vol. 63, pp. 1–17, 2025
work page 2025
-
[66]
Towards large-scale small object detection: Survey and bench- marks,
G. Cheng, X. Yuan, X. Yao, K. Yan, Q. Zeng, X. Xie, and J. Han, “Towards large-scale small object detection: Survey and bench- marks,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 45, no. 11, pp. 13 467–13 488, 2023. IEEE TRANSACTIONS ON PATTERN ANAL YSIS AND MACHINE INTELLIGENCE 18
work page 2023
-
[67]
De- tection and tracking meet drones challenge,
P . Zhu, L. Wen, D. Du, X. Bian, H. Fan, Q. Hu, and H. Ling, “De- tection and tracking meet drones challenge,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7380–7399, 2022
work page 2022
-
[68]
Object detection in aerial images: A large-scale benchmark and challenges,
J. Ding, N. Xue, G.-S. Xia, X. Bai, W. Yang, M. Y. Yang, S. Belongie, J. Luo, M. Datcu, M. Pelillo et al. , “Object detection in aerial images: A large-scale benchmark and challenges,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 11, pp. 7778–7796, 2021
work page 2021
-
[69]
X. Yang, J. Yan, W. Liao, X. Yang, J. Tang, and T. He, “Scrdet++: Detecting small, cluttered and rotated objects via instance-level feature denoising and rotation loss smoothing,” IEEE T rans. Pat- tern Anal. Mach. Intell. , vol. 45, no. 2, pp. 2384–2399, 2022
work page 2022
-
[70]
Highly efficient and unsupervised framework for moving object detection in satellite videos,
C. Xiao, W. An, Y. Zhang, Z. Su, M. Li, W. Sheng, M. Pietik ¨ainen, and L. Liu, “Highly efficient and unsupervised framework for moving object detection in satellite videos,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 12, pp. 11 532–11 539, 2024
work page 2024
-
[71]
Hybrid Gaussian deformation for efficient remote sensing object detection,
W. Zhao, X. Zhang, H. Wang, and H. Lu, “Hybrid Gaussian deformation for efficient remote sensing object detection,” IEEE T rans. Pattern Anal. Mach. Intell., pp. 1–17, 2025
work page 2025
-
[72]
Circle loss: A unified perspective of pair similarity optimization,
Y. Sun, C. Cheng, Y. Zhang et al., “Circle loss: A unified perspective of pair similarity optimization,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2020, pp. 6398–6407
work page 2020
-
[73]
Person re-identification in the wild,
L. Zheng, H. Zhang, S. Sun et al. , “Person re-identification in the wild,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2017, pp. 1367–1376
work page 2017
-
[74]
Sphereface: Deep hypersphere embedding for face recognition,
W. Liu, Y. Wen, Z. Yu et al. , “Sphereface: Deep hypersphere embedding for face recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 212–220
work page 2017
-
[75]
Deep metric learning via lifted structured feature embedding,
H. Oh Song, Y. Xiang, S. Jegelka, and S. Savarese, “Deep metric learning via lifted structured feature embedding,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. , 2016, pp. 4004–4012
work page 2016
-
[76]
Dual-path convolutional image-text embeddings with instance loss,
Z. Zheng, L. Zheng, M. Garrett et al. , “Dual-path convolutional image-text embeddings with instance loss,” ACM T rans. Multime- dia Comput. Commun. Appl. , vol. 16, no. 2, pp. 1–23, 2020
work page 2020
-
[77]
Deep graph metric learning for weakly supervised person re-identification,
J. Meng, W.-S. Zheng, J.-H. Lai, and L. Wang, “Deep graph metric learning for weakly supervised person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 44, no. 10, pp. 6074–6093, 2021
work page 2021
-
[78]
Adaptive sparse pairwise loss for object re-identification,
X. Zhou, Y. Zhong, Z. Cheng, F. Liang, and L. Ma, “Adaptive sparse pairwise loss for object re-identification,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. , 2023, pp. 19 691–19 701
work page 2023
-
[79]
Deep learning for person re-identification: A survey and outlook,
M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep learning for person re-identification: A survey and outlook,” IEEE T rans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, 2021
work page 2021
-
[80]
Weakly super- vised tracklet association learning with video labels for person re-identification,
M. Liu, Y. Bian, Q. Liu, X. Wang, and Y. Wang, “Weakly super- vised tracklet association learning with video labels for person re-identification,” IEEE T rans. Pattern Anal. Mach. Intell. , vol. 46, no. 5, pp. 3595–3607, 2024
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.