GeoMamba: A Geometry-driven MambaVision Framework and Dataset for Fine-grained Optical-SAR Object Retrieval
Pith reviewed 2026-05-20 05:28 UTC · model grok-4.3
pith:O6SVGEZF Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{O6SVGEZF}
Prints a linked pith:O6SVGEZF badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
The pith
GeoMamba adds geometric feature injection and consistency constraints to enable robust fine-grained retrieval between unaligned optical and SAR remote sensing images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GeoMamba introduces a Geometric Feature Injection module to enhance cross-modal feature interaction and incorporate structural priors for improved SAR representation robustness, together with a Geometric Consistency Constraint module and deep supervision strategy that applies hierarchical geometric constraints via classical operators to preserve informative object structures during representation learning, supporting effective fine-grained optical-SAR retrieval on the FGOS-as dataset under unaligned conditions.
What carries the argument
The Geometric Feature Injection (GFI) module, which enhances cross-modal interaction and adds structural priors, combined with the Geometric Consistency Constraint (GCC) module that imposes hierarchical geometric constraints using classical operators.
If this is right
- Cross-modal representations become learnable from unaligned optical-SAR pairs in practical remote sensing scenarios.
- Object structures remain more intact through the representation learning process.
- The framework supports all-to-all retrieval settings where queries and gallery items come from mixed modalities.
Where Pith is reading between the lines
- The same geometric injection pattern could extend to other sensor pairs such as optical and infrared for similar unaligned retrieval tasks.
- The FGOS-as dataset provides a benchmark that highlights limitations of paired-data methods when applied to real-world misaligned imagery.
- Additional classical geometric operators might be swapped in to target specific object categories like ships or aircraft more precisely.
Load-bearing premise
That injecting geometric features and enforcing consistency constraints with classical operators will sufficiently reduce modality gaps, speckle noise, and structural differences to support reliable cross-modal learning without aligned samples.
What would settle it
A controlled test on the FGOS-as dataset that disables the Geometric Feature Injection and Geometric Consistency Constraint modules and checks whether retrieval performance drops to or below levels achieved by prior methods without these geometric additions.
Figures
read the original abstract
Multi-source remote sensing enables complementary observation of ground objects, while cross-modal fine-grained object retrieval remains challenging, especially under unaligned optical and SAR conditions. Unlike conventional retrieval settings that rely on paired or spatially aligned samples, practical optical-SAR retrieval is affected by substantial modality discrepancy, speckle noise, and structural inconsistency, which limit robust cross-modal representation learning. To address this problem, we propose GeoMamba, a geometry-driven framework tailored for optical-SAR fine-grained retrieval. Specifically, GeoMamba introduces a Geometric Feature Injection (GFI) module that enhances cross-modal feature interaction and incorporates structural priors, thereby improving the robustness of SAR representations and promoting geometry-consistent feature learning. In addition, a Geometric Consistency Constraint (GCC) module, together with a Deep Supervision (DS) strategy, imposes hierarchical geometric constraints using classical operators, which helps preserve informative object structures during representation learning. We further construct a new dataset, FGOS-as, containing 11 aerospace and maritime categories for evaluating unaligned cross-modal fine-grained object retrieval in realistic remote sensing scenarios. Extensive experiments on FGOS-as demonstrate that GeoMamba outperforms existing methods, achieving 63.3% mAP and 77.0% Rank-1 accuracy in all-to-all retrieval setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GeoMamba, a geometry-driven MambaVision framework for fine-grained optical-SAR object retrieval under unaligned conditions. It introduces a Geometric Feature Injection (GFI) module to enhance cross-modal interaction via structural priors and a Geometric Consistency Constraint (GCC) module with Deep Supervision that applies hierarchical classical operators. The authors construct a new FGOS-as dataset covering 11 aerospace and maritime categories and report that GeoMamba achieves 63.3% mAP and 77.0% Rank-1 accuracy in the all-to-all retrieval setting, outperforming existing methods.
Significance. If the experimental claims are substantiated, the work could advance cross-modal retrieval in remote sensing by demonstrating how geometric priors and consistency constraints can mitigate modality gaps, speckle noise, and structural inconsistencies in practical unaligned optical-SAR scenarios. The FGOS-as dataset may provide a valuable benchmark for aerospace and maritime applications.
major comments (3)
- [§4 (Experiments)] §4 (Experiments): The central performance claims (63.3% mAP, 77.0% Rank-1) are presented without ablation tables that compare GeoMamba against a plain MambaVision backbone on identical FGOS-as splits and protocol. This omission is load-bearing because the attribution of robustness to GFI + GCC cannot be verified without isolating their contribution from the backbone or dataset properties.
- [§3 (Method)] §3 (Method): The descriptions of the Geometric Feature Injection module and the Geometric Consistency Constraint (including how classical operators are applied hierarchically under Deep Supervision) remain high-level. Without explicit equations, algorithmic pseudocode, or implementation details, it is not possible to assess whether these components sufficiently close the modality gap or are reproducible.
- [§4.1 (Dataset)] §4.1 (Dataset): No statistics or construction details are provided for FGOS-as (e.g., total images per category, train/test split ratios, selection criteria, or checks against implicit spatial alignment leakage). These are required to confirm that the reported gains are not artifacts of dataset curation under the unaligned all-to-all protocol.
minor comments (1)
- [Abstract] Abstract: Consider specifying the total number of images or samples in FGOS-as to give readers immediate context for the scale of the new benchmark.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where revisions are warranted, we will incorporate the suggested changes in the next version of the paper to strengthen clarity, reproducibility, and experimental rigor.
read point-by-point responses
-
Referee: [§4 (Experiments)] §4 (Experiments): The central performance claims (63.3% mAP, 77.0% Rank-1) are presented without ablation tables that compare GeoMamba against a plain MambaVision backbone on identical FGOS-as splits and protocol. This omission is load-bearing because the attribution of robustness to GFI + GCC cannot be verified without isolating their contribution from the backbone or dataset properties.
Authors: We agree that isolating the contributions of the GFI and GCC modules from the underlying MambaVision backbone is important for substantiating our claims. The current manuscript reports overall performance but does not include dedicated ablation tables on the exact same FGOS-as splits and retrieval protocol. In the revised version, we will add these ablation studies, including direct comparisons of the plain backbone versus variants with GFI, GCC, and their combination, to clearly attribute the observed gains. revision: yes
-
Referee: [§3 (Method)] §3 (Method): The descriptions of the Geometric Feature Injection module and the Geometric Consistency Constraint (including how classical operators are applied hierarchically under Deep Supervision) remain high-level. Without explicit equations, algorithmic pseudocode, or implementation details, it is not possible to assess whether these components sufficiently close the modality gap or are reproducible.
Authors: We acknowledge that the method section presents the GFI and GCC modules at a conceptual level without sufficient mathematical detail. To improve reproducibility and allow assessment of how these components address modality gaps, we will expand §3 in the revision with explicit equations for the Geometric Feature Injection process and the hierarchical application of classical operators within the GCC module under Deep Supervision. We will also include algorithmic pseudocode and key implementation hyperparameters. revision: yes
-
Referee: [§4.1 (Dataset)] §4.1 (Dataset): No statistics or construction details are provided for FGOS-as (e.g., total images per category, train/test split ratios, selection criteria, or checks against implicit spatial alignment leakage). These are required to confirm that the reported gains are not artifacts of dataset curation under the unaligned all-to-all protocol.
Authors: We recognize that detailed dataset documentation is essential for validating the experimental protocol. The current manuscript introduces FGOS-as but omits granular statistics and construction specifics. In the revised version, we will add a dedicated subsection or table in §4.1 reporting the total images per category, train/test split ratios, selection criteria for the 11 aerospace and maritime classes, and explicit checks or design choices to minimize any potential spatial alignment leakage under the unaligned all-to-all setting. revision: yes
Circularity Check
No significant circularity; empirical framework and results are self-contained
full rationale
The paper introduces a new architecture (GeoMamba with GFI and GCC modules) and a new dataset (FGOS-as), then reports experimental retrieval metrics on that dataset. No equations, predictions, or first-principles claims are shown to reduce by construction to fitted inputs, self-citations, or renamed known results. The performance numbers (63.3% mAP, 77.0% Rank-1) are presented as outcomes of standard training and evaluation rather than tautological re-statements of the method definition itself.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Geometric Feature Injection (GFI) module that enhances cross-modal feature interaction and incorporates structural priors... Geometric Consistency Constraint (GCC) module... using classical operators, i.e., the Sobel operator for optical contours and the Harris detector for SAR scattering centers.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dual-stream MambaVision backbone... State Space Models (SSMs)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Detection of building outlines based on the fusion of sar and optical features,
F. Tupin and M. Roux, “Detection of building outlines based on the fusion of sar and optical features,”ISPRS J. Photogramm. Remote Sens., vol. 58, no. 1-2, pp. 71–82, 2003
work page 2003
-
[2]
C. Wang, W. S. Lu, X. M. Li, J. Yang, and L. Luo, “M4-sar: A multi-resolution, multi-polarization, multi-scene, multi-source dataset and benchmark for optical-sar fusion object detection,”arXiv preprint arXiv:2505.10931, 2025
-
[3]
Z. Zhao, Y . Xu, A. Lu, C. Li, and J. Tang, “Towards robust optical-sar object detection under missing modalities: A dynamic quality-aware fusion framework,”arXiv preprint arXiv:2512.22447, 2025
-
[4]
Y . Zhang, L. Guo, Z. Wang, Y . Yu, X. Liu, and F. Xu, “Intelligent ship detection in remote sensing images based on multi-layer convolutional feature fusion,”Remote Sens., vol. 12, no. 20, p. 3316, 2020
work page 2020
-
[5]
Disaster risk reduction using image fusion of optical and sar data before and after tsunami,
Y . Kwak, A. Yorozuya, and Y . Iwami, “Disaster risk reduction using image fusion of optical and sar data before and after tsunami,” inProc. IEEE Aerosp. Conf.IEEE, 2016, pp. 1–11
work page 2016
-
[6]
Coastline detection using optical and synthetic aperture radar images,
T. Yu, S. W. Xu, B. Y . Tao, and W. Z. Shao, “Coastline detection using optical and synthetic aperture radar images,”Adv. Space Res., vol. 70, no. 1, pp. 70–84, 2022
work page 2022
-
[7]
Z. Zhang, L. Zhang, J. Wu, and W. Guo, “Optical and synthetic aperture radar image fusion for ship detection and recognition: Current state, challenges, and future prospects,”IEEE Geosci. Remote Sens. Mag., vol. 12, no. 4, pp. 132–168, 2024
work page 2024
-
[8]
M. Ahmed, N. El-Sheimy, and H. Leung, “Dual-modal approach for ship detection: Fusing synthetic aperture radar and optical satellite imagery,” Sensors, vol. 25, no. 2, p. 329, 2025
work page 2025
-
[9]
Machine learning based aircraft detection using sar & optical images,
M. Rane and S. Kumar, “Machine learning based aircraft detection using sar & optical images,”ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci., vol. 10, pp. 529–535, 2025
work page 2025
-
[10]
A deep cross- modality hashing network for sar and optical remote sensing images retrieval,
W. Xiong, Z. Xiong, Y . Zhang, Y . Cui, and X. Gu, “A deep cross- modality hashing network for sar and optical remote sensing images retrieval,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 13, pp. 5284–5296, 2020
work page 2020
-
[11]
Y . Sun, S. Feng, Y . Ye, X. Li, J. Kang, Z. Huang, and C. Luo, “Multisensor fusion and explicit semantic preserving-based deep hashing for cross- modal remote sensing image retrieval,”IEEE Trans. Geosci. Remote Sens., vol. 60, pp. 1–14, 2021
work page 2021
-
[12]
Deep multiscale fine-grained hashing for remote sensing cross-modal retrieval,
J. Huang, Y . Feng, M. Zhou, X. Xiong, Y . Wang, and B. Qiang, “Deep multiscale fine-grained hashing for remote sensing cross-modal retrieval,” IEEE Geosci. Remote Sens. Lett., vol. 21, pp. 1–5, 2024
work page 2024
-
[13]
Cross-modal retrieval algorithm based on patch aggregation,
J. Yang and Y . Tang, “Cross-modal retrieval algorithm based on patch aggregation,” inProc. 4th Int. Conf. Electron. Inf. Technol. (EIT). IEEE, 2025, pp. 632–637
work page 2025
-
[14]
Dynamic patch selection and dual- granularity alignment for cross-modal retrieval,
Z. Luo, M. Meng, and J. Wu, “Dynamic patch selection and dual- granularity alignment for cross-modal retrieval,”Neurocomputing, p. 132999, 2026
work page 2026
-
[15]
Canonical correlation analysis: An overview with application to learning methods,
D. R. Hardoon, S. Szedmak, and J. Shawe-Taylor, “Canonical correlation analysis: An overview with application to learning methods,”Neural Comput., vol. 16, no. 12, pp. 2639–2664, 2004
work page 2004
-
[16]
Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data,
A. A. Nielsen, “Multiset canonical correlations analysis and multispectral, truly multitemporal remote sensing data,”IEEE Trans. Image Process., vol. 11, no. 3, pp. 293–305, 2002
work page 2002
-
[17]
Distance metric learning for large margin nearest neighbor classification
K. Q. Weinberger and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification.”J. Mach. Learn. Res., vol. 10, no. 2, 2009
work page 2009
-
[18]
Data fusion through cross-modality metric learning using similarity-sensitive hashing,
M. M. Bronstein, A. M. Bronstein, F. Michel, and N. Paragios, “Data fusion through cross-modality metric learning using similarity-sensitive hashing,” inProc. IEEE CVPR. IEEE, 2010, pp. 3594–3601
work page 2010
-
[19]
Learning hash functions for cross-view similarity search,
S. Kumar and R. Udupa, “Learning hash functions for cross-view similarity search,” inProc. IJCAI, vol. 22, no. 1, 2011, p. 1360
work page 2011
-
[20]
Generalized multiview analysis: A discriminative latent space,
A. Sharma, A. Kumar, H. Daume, and D. W. Jacobs, “Generalized multiview analysis: A discriminative latent space,” inProc. IEEE CVPR. IEEE, 2012, pp. 2160–2167
work page 2012
-
[21]
Sar-sift: a sift-like algorithm for sar images,
F. Dellinger, J. Delon, Y . Gousseau, J. Michel, and F. Tupin, “Sar-sift: a sift-like algorithm for sar images,”IEEE Trans. Geosci. Remote Sens., vol. 53, no. 1, pp. 453–466, 2014
work page 2014
-
[22]
B. Demir and L. Bruzzone, “A novel active learning method in relevance feedback for content-based remote sensing image retrieval,”IEEE Trans. Geosci. Remote Sens., vol. 53, no. 5, pp. 2323–2334, 2014
work page 2014
-
[23]
H. Liu, X. Tan, and X. Zhou, “Parameter sharing exploration and hetero- center triplet loss for visible-thermal person re-identification,”IEEE Trans. Multimedia, vol. 23, pp. 4414–4425, 2020
work page 2020
-
[24]
T. Liang, Y . Jin, W. Liu, T. Wang, S. Feng, and Y . Li, “Bridging the gap: Multi-level cross-modality joint alignment for visible-infrared person re-identification,”IEEE Trans. Circuits Syst. Video Technol., vol. 34, no. 8, pp. 7683–7698, 2024
work page 2024
-
[25]
Learning by aligning: Visible- infrared person re-identification using cross-modal correspondences,
H. Park, S. Lee, J. Lee, and B. Ham, “Learning by aligning: Visible- infrared person re-identification using cross-modal correspondences,” in Proc. IEEE/CVF ICCV, 2021, pp. 12 046–12 055
work page 2021
-
[26]
Deep learning for person re-identification: A survey and outlook,
M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. Hoi, “Deep learning for person re-identification: A survey and outlook,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, 2021
work page 2021
-
[27]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
A. Dosovitskiy, “An image is worth 16x16 words: Transformers for image recognition at scale,”arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[28]
Training data-efficient image transformers & distillation through attention,
H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J ´egou, “Training data-efficient image transformers & distillation through attention,” inProc. ICML. PMLR, 2021, pp. 10 347–10 357
work page 2021
-
[29]
Transreid: Transformer-based object re-identification,
S. He, H. Luo, P. Wang, F. Wang, H. Li, and W. Jiang, “Transreid: Transformer-based object re-identification,” inProc. IEEE/CVF ICCV, 2021, pp. 15 013–15 022
work page 2021
-
[30]
A versatile framework for multi- scene person re-identification,
W.-S. Zheng, J. Yan, and Y .-X. Peng, “A versatile framework for multi- scene person re-identification,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 47, no. 3, pp. 1362–1380, 2024
work page 2024
-
[31]
W. Chen, X. Xu, J. Jia, H. Luo, Y . Wang, F. Wang, R. Jin, and X. Sun, “Beyond appearance: a semantic controllable self-supervised learning 13 framework for human-centric visual tasks,” inProc. IEEE/CVF CVPR, 2023, pp. 15 050–15 061
work page 2023
-
[32]
Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,
H. Wang, S. Li, J. Yang, Y . Liu, Y . Lv, and Z. Zhou, “Cross-modal ship re-identification via optical and sar imagery: A novel dataset and method,” inProc. IEEE/CVF ICCV, October 2025, pp. 7873–7883
work page 2025
-
[33]
Efficiently Modeling Long Sequences with Structured State Spaces
A. Gu, K. Goel, and C. R ´e, “Efficiently modeling long sequences with structured state spaces,”arXiv preprint arXiv:2111.00396, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[34]
S4nd: Modeling images and videos as multidimensional signals with state spaces,
E. Nguyen, K. Goel, A. Gu, G. Downs, P. Shah, T. Dao, S. Baccus, and C. R ´e, “S4nd: Modeling images and videos as multidimensional signals with state spaces,”Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 35, pp. 2846–2861, 2022
work page 2022
-
[35]
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
L. Zhu, B. Liao, Q. Zhang, X. Wang, W. Liu, and X. Wang, “Vision mamba: Efficient visual representation learning with bidirectional state space model,”arXiv preprint arXiv:2401.09417, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[36]
Vmamba: Visual state space model,
Y . Liu, Y . Tian, Y . Zhao, H. Yu, L. Xie, Y . Wang, Q. Ye, J. Jiao, and Y . Liu, “Vmamba: Visual state space model,”Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 37, pp. 103 031–103 063, 2024
work page 2024
-
[37]
Mambavision: A hybrid mamba- transformer vision backbone,
A. Hatamizadeh and J. Kautz, “Mambavision: A hybrid mamba- transformer vision backbone,” inProc. IEEE/CVF CVPR, 2025, pp. 25 261–25 270
work page 2025
-
[38]
Mambahsi: Spatial–spectral mamba for hyperspectral image classification,
Y . Li, Y . Luo, L. Zhang, Z. Wang, and B. Du, “Mambahsi: Spatial–spectral mamba for hyperspectral image classification,”IEEE Trans. Geosci. Remote Sens., vol. 62, pp. 1–16, 2024
work page 2024
-
[39]
Samba: Semantic segmentation of remotely sensed images with state space model,
Q. Zhu, Y . Cai, Y . b. Fang, Y . Yang, C. Chen, L. Fan, and A. Nguyen, “Samba: Semantic segmentation of remotely sensed images with state space model,”Heliyon, vol. 10, no. 19, 2024
work page 2024
-
[40]
Rsmamba: Remote sensing image classification with state space model,
K. Chen, B. Chen, C. Liu, W. Li, Z. Zou, and Z. Shi, “Rsmamba: Remote sensing image classification with state space model,”IEEE Geosci. Remote Sens. Lett., vol. 21, pp. 1–5, 2024
work page 2024
-
[41]
Robust registration of multimodal remote sensing images based on structural similarity,
Y . Ye, J. Shan, L. Bruzzone, and L. Shen, “Robust registration of multimodal remote sensing images based on structural similarity,”IEEE Trans. Geosci. Remote Sens., vol. 55, no. 5, pp. 2941–2958, 2017
work page 2017
-
[42]
Robust optical-to-sar image matching based on shape properties,
Y . Ye, L. Shen, M. Hao, J. Wang, and Z. Xu, “Robust optical-to-sar image matching based on shape properties,”IEEE Geosci. Remote Sens. Lett., vol. 14, no. 4, pp. 564–568, 2017
work page 2017
-
[43]
A new perspective on physics guided learning for sar image interpretation,
Z. Wang, Z. Huang, and M. Datcu, “A new perspective on physics guided learning for sar image interpretation,” inProc. IEEE IGARSS. IEEE, 2023, pp. 1926–1929
work page 2023
-
[44]
L. Yi, D. Lan, Z. Ke’er, and D. Yuang, “Deep network for sar target recognition based on attribute scattering center convolutional kernel modulation,”J. Radars, vol. 13, no. 2, pp. 443–456, 2024
work page 2024
-
[45]
Sar-gtr: Attributed scattering information guided sar graph transformer recognition algorithm,
X. Xiong, X. Zhang, W. Jiang, L. Liu, Y . Liu, and T. Liu, “Sar-gtr: Attributed scattering information guided sar graph transformer recognition algorithm,”arXiv preprint arXiv:2505.08547, 2025
-
[46]
Deep learning-based sar target recognition: A dual-perspective survey of closed set and open set
Y . Yang and H. Zhao, “Deep learning-based sar target recognition: A dual-perspective survey of closed set and open set.”Appl. Sci., vol. 15, no. 23, 2025
work page 2025
-
[47]
A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios,
D. Li, Z. Zhang, X. Chen, and K. Huang, “A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios,”IEEE Trans. Image Process., vol. 28, no. 4, pp. 1575–1590, 2019
work page 2019
-
[48]
A public dataset for fine-grained ship classification in optical remote sensing images,
Y . Di, Z. Jiang, and H. Zhang, “A public dataset for fine-grained ship classification in optical remote sensing images,”Remote Sens., vol. 13, no. 4, p. 747, 2021
work page 2021
-
[49]
Osdataset2. 0: Sar-optical image matching dataset and evaluation benchmark,
Y . Xiang, J. Chen, Z. Hong, N. Jiao, F. Wang, H. You, and X. Tong, “Osdataset2. 0: Sar-optical image matching dataset and evaluation benchmark,”J. Radars, vol. 14, pp. 1–13, 2025
work page 2025
-
[50]
Bag-of-visual-words and spatial extensions for land-use classification,
Y . Yang and S. Newsam, “Bag-of-visual-words and spatial extensions for land-use classification,” inProc. 18th SIGSPATIAL Int. Conf. Adv. Geogr . Inf. Syst., 2010, pp. 270–279
work page 2010
-
[51]
Structural high-resolution satellite image indexing,
G.-S. Xia, W. Yang, J. Delon, Y . Gousseau, H. Sun, and H. Ma ˆıtre, “Structural high-resolution satellite image indexing,” inProc. ISPRS TC VII Symp., vol. 38, 2010, pp. 298–303
work page 2010
-
[52]
Y . Li, Y . Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval,”IEEE Trans. Geosci. Remote Sens., vol. 56, no. 11, pp. 6521–6536, 2018
work page 2018
-
[53]
A deep cross-modal hashing technique for large-scale sar and vhr image retrieval,
Y . Sun, S. Feng, Y . Ye, X. Li, and J. Kang, “A deep cross-modal hashing technique for large-scale sar and vhr image retrieval,” inProc. BIGSARDATA. IEEE, 2021, pp. 1–4
work page 2021
-
[54]
An interpretable fusion siamese network for multi-modality remote sensing ship image retrieval,
W. Xiong, Z. Xiong, Y . Cui, L. Huang, and R. Yang, “An interpretable fusion siamese network for multi-modality remote sensing ship image retrieval,”IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 6, pp. 2696–2712, 2022
work page 2022
-
[55]
Sar-optical feature matching: A large-scale patch dataset and a deep local descriptor,
W. Xu, X. Yuan, Q. Hu, and J. Li, “Sar-optical feature matching: A large-scale patch dataset and a deep local descriptor,”Int. J. Appl. Earth Obs. Geoinf., vol. 122, p. 103433, 2023
work page 2023
-
[56]
X. Sun, P. Wang, Z. Yan, F. Xu, R. Wang, W. Diao, J. Chen, J. Li, Y . Feng, T. Xuet al., “Fair1m: A benchmark dataset for fine-grained object recognition in high-resolution remote sensing imagery,”ISPRS J. Photogramm. Remote Sens., vol. 184, pp. 116–130, 2022
work page 2022
-
[57]
X. Hou, W. Ao, Q. Songet al., “FUSAR-Ship: Building a high-resolution SAR-AIS matchup dataset of Gaofen-3 for ship detection and recognition,” Sci. China Inf. Sci., vol. 63, no. 4, p. 140303, 2020
work page 2020
-
[58]
Sar- aircraft-1.0: High-resolution sar aircraft detection and recognition dataset,
W. Zhirui, K. Yuzhuo, Z. Xuan, W. Yuelei, Z. Ting, and S. Xian, “Sar- aircraft-1.0: High-resolution sar aircraft detection and recognition dataset,” J. Radars, vol. 12, no. 4, pp. 906–922, 2023
work page 2023
-
[59]
Air- sarship-1.0: High-resolution sar ship detection dataset,
S. Xian, W. Zhirui, S. Yuanrui, D. Wenhui, Z. Yue, and F. Kun, “Air- sarship-1.0: High-resolution sar ship detection dataset,”J. Radars, vol. 8, no. 6, pp. 852–863, 2019
work page 2019
-
[60]
Y . Wu, Y . Suo, Q. Meng, W. Dai, T. Miao, W. Zhao, Z. Yan, W. Diao, G. Xie, Q. Ke, Y . Zhao, K. Fu, and X. Sun, “Fair-csar: A benchmark dataset for fine-grained object detection and recognition based on single- look complex sar images,”IEEE Trans. Geosci. Remote Sens., vol. 63, pp. 1–22, 2025
work page 2025
-
[61]
Gaussian-adaptive bilateral filter,
B.-H. Chen, Y .-S. Tseng, and J.-L. Yin, “Gaussian-adaptive bilateral filter,”IEEE Signal Process. Lett., vol. 27, pp. 1670–1674, 2020
work page 2020
-
[62]
Optimized laplacian image sharpening algorithm based on graphic processing unit,
T. Ma, L. Li, S. Ji, X. Wang, Y . Tian, A. Al-Dhelaan, and M. Al- Rodhaan, “Optimized laplacian image sharpening algorithm based on graphic processing unit,”Physica A, vol. 416, pp. 400–410, 2014
work page 2014
-
[63]
Speckle noise reduction in sar imagery using a local adaptive median filter,
F. Qiu, J. Berglund, J. R. Jensen, P. Thakkar, and D. Ren, “Speckle noise reduction in sar imagery using a local adaptive median filter,”GIScience Remote Sens., vol. 41, no. 3, pp. 244–266, 2004
work page 2004
-
[64]
Contrast limited adaptive local histogram equalization method for poor contrast image enhancement,
I. M. Mohammed and N. A. M. Isa, “Contrast limited adaptive local histogram equalization method for poor contrast image enhancement,” IEEE Access, 2025
work page 2025
-
[65]
Physics- guided detector for sar airplanes,
Z. Huang, L. Liu, S. M. Yang, Z. Wang, G. Cheng, and J. Han, “Physics- guided detector for sar airplanes,”IEEE Trans. Circuits Syst. Video Technol., 2025
work page 2025
-
[66]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,”Adv. Neural Inf. Process. Syst. (NeurIPS), vol. 30, 2017
work page 2017
-
[67]
Y . Zhang and H. Wang, “Diverse embedding expansion network and low-light cross-modality benchmark for visible-infrared person re- identification,” inProc. IEEE/CVF CVPR, June 2023, pp. 2153–2162
work page 2023
-
[68]
B. Liu, R. Huang, X. Pan, C. Li, J. Sun, J. Dong, and X. Wang, “Ad- vancing ship re-identification in the wild: The shipreid-2400 benchmark dataset and d2internet baseline method,” inProc. 48th Int. ACM SIGIR Conf. Res. Develop. Inf. Retr ., 2025, pp. 106–115
work page 2025
-
[69]
Imagenet: A large-scale hierarchical image database,
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” inProc. IEEE CVPR, 2009, pp. 248–255
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.