UAV as Urban Construction Change Monitor: A New Benchmark and Change Captioning Model
Pith reviewed 2026-05-08 17:52 UTC · model grok-4.3
The pith
PTNet uses a learnable prototype bank to model structured change semantics for generating natural language descriptions of urban construction changes from UAV image pairs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity.
What carries the argument
A learnable prototype bank that captures structured change semantics, guides cross-temporal feature alignment, and supports task-specific disentanglement in a joint change detection and captioning model.
If this is right
- Joint detection and captioning yields spatially grounded descriptions that align with actual changed regions.
- Explicit prototypes allow the model to handle complex, multi-object urban changes more coherently than implicit differencing.
- The UCCD benchmark provides a standardized testbed for future work on high-resolution construction monitoring.
- Detection priors injected into captioning improve fine-grained spatial sensitivity without sacrificing semantic quality.
Where Pith is reading between the lines
- The prototype approach could transfer to other change-description tasks such as vegetation or infrastructure monitoring if the bank is initialized from domain-specific data.
- If prototypes prove stable across datasets, the method might support lighter supervision for new regions rather than full retraining.
- Real-time UAV streams could feed the same prototype bank to produce ongoing natural-language summaries of construction activity.
Load-bearing premise
A learnable prototype bank can reliably capture and generalize structured change semantics across diverse urban construction scenarios without overfitting to the training distribution.
What would settle it
Evaluating PTNet on a new UAV dataset of urban construction changes from cities or construction types absent from UCCD training data, then checking whether caption coherence and accuracy gains disappear compared with baselines.
Figures
read the original abstract
Remote Sensing Image Change Captioning (RSICC) aims to generate spatially grounded natural language descriptions of scene evolution from bi-temporal imagery, moving beyond binary change masks toward semantic-level understanding. However, existing methods rely on implicit feature differencing without explicitly modeling structured change semantics, and struggle to reconcile the conflicting representation demands of change detection and caption generation. In addition, current benchmarks provide limited coverage of high-resolution urban construction scenarios. To address these challenges, we propose PTNet, a prototype-guided task-adaptive framework for joint change captioning and detection. PTNet explicitly models structured change semantics through a learnable prototype bank that guides cross-temporal interaction, disentangles task-specific representations via multi-head gating, and injects detection-derived spatial priors into caption generation, enabling coherent semantic correspondence while preserving fine-grained spatial sensitivity. Furthermore, we construct UCCD, a large-scale UAV-based benchmark comprising 9,000 high-resolution image pairs and 45,000 annotated sentences for urban construction monitoring. Extensive experiments on UCCD and WHU-CDC demonstrate that PTNet consistently outperforms existing methods. The dataset and source code are publicly available at https://github.com/G124556/ptnet.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PTNet, a prototype-guided task-adaptive network for remote sensing image change captioning (RSICC) that uses a learnable prototype bank to explicitly model structured change semantics, multi-head gating to disentangle change detection and captioning representations, and injection of detection-derived spatial priors into the caption decoder. It also presents UCCD, a new UAV-based benchmark with 9,000 high-resolution bi-temporal image pairs and 45,000 annotated sentences focused on urban construction changes. Experiments claim consistent outperformance over prior methods on both UCCD and the existing WHU-CDC dataset, with public release of data and code.
Significance. If the central claims hold, the work supplies a much-needed high-resolution urban construction benchmark and an architecture that moves RSICC beyond implicit differencing toward explicit semantic modeling. The public dataset and code are clear strengths that support reproducibility and further research in UAV-based monitoring applications.
major comments (2)
- [§3.2] §3.2 (Prototype Bank): The learnable prototype bank is presented as the key mechanism for capturing and guiding structured change semantics, yet the manuscript provides no details on prototype count selection, initialization, update rule, or regularization against collapse/overfitting. Because UCCD is newly introduced and the bank is fully learnable, this omission leaves open the possibility that reported gains arise from dataset-specific fitting rather than generalizable semantics.
- [§4] §4 (Experiments and Ablations): The ablation studies do not isolate the prototype bank's contribution from the multi-head gating and spatial-prior components. Without a controlled variant that removes or freezes the prototype bank while keeping other modules fixed, it is impossible to attribute the claimed outperformance on UCCD and WHU-CDC specifically to the structured semantic modeling.
minor comments (3)
- [§2] The description of the UCCD annotation protocol (number of annotators, quality control, sentence diversity across construction types) is insufficient for a new benchmark paper.
- [Figure 3] Figure 3 (architecture diagram) would benefit from explicit labeling of the prototype-bank interaction arrows and the gating module to match the text in §3.
- [§4.1] The abstract and §4.1 state that PTNet 'consistently outperforms' existing methods, but no statistical significance tests or variance across multiple runs are reported.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving clarity and experimental rigor, and we will revise the paper to address them fully.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Prototype Bank): The learnable prototype bank is presented as the key mechanism for capturing and guiding structured change semantics, yet the manuscript provides no details on prototype count selection, initialization, update rule, or regularization against collapse/overfitting. Because UCCD is newly introduced and the bank is fully learnable, this omission leaves open the possibility that reported gains arise from dataset-specific fitting rather than generalizable semantics.
Authors: We agree that the current description of the prototype bank lacks sufficient implementation details for full reproducibility and to rule out dataset-specific effects. In the revised manuscript, we will expand §3.2 (and add corresponding material to the supplement) with explicit descriptions of prototype count selection, initialization strategy, the update rule during training, and any regularization applied to prevent collapse or overfitting. These additions will clarify how the bank models generalizable structured change semantics rather than fitting idiosyncrasies of UCCD. revision: yes
-
Referee: [§4] §4 (Experiments and Ablations): The ablation studies do not isolate the prototype bank's contribution from the multi-head gating and spatial-prior components. Without a controlled variant that removes or freezes the prototype bank while keeping other modules fixed, it is impossible to attribute the claimed outperformance on UCCD and WHU-CDC specifically to the structured semantic modeling.
Authors: We acknowledge that the existing ablations do not isolate the prototype bank's specific contribution. In the revised §4, we will introduce a controlled ablation that removes or freezes the prototype bank while holding the multi-head gating and spatial-prior components fixed. Performance differences on both UCCD and WHU-CDC will be reported to directly attribute gains to the structured semantic modeling. revision: yes
Circularity Check
No circularity in PTNet derivation or UCCD benchmark claims
full rationale
The paper presents PTNet as an architectural proposal (learnable prototype bank guiding cross-temporal interaction, multi-head gating for disentanglement, and detection-derived spatial priors) whose behavior is defined by standard neural network components rather than by construction equaling any fitted output or prior result. Claims rest on empirical outperformance on the newly introduced UCCD dataset (9k pairs) and the external WHU-CDC benchmark, with no equations, self-citations, or uniqueness theorems shown that reduce the reported gains to tautological inputs. The derivation chain is therefore self-contained and externally falsifiable via the released code and data.
Axiom & Free-Parameter Ledger
free parameters (2)
- prototype bank size
- multi-head gating weights
axioms (2)
- domain assumption Paired bi-temporal images contain sufficient visual information to support both change localization and natural-language description.
- domain assumption Neural networks trained with standard supervision can learn disentangled and semantically meaningful representations when guided by prototypes.
invented entities (1)
-
learnable prototype bank
no independent evidence
Lean theorems connected to this paper
-
Foundation/DimensionForcing (2^D=8 from D=3) — superficially the number 8 appears, but here it is a hand-tuned hyperparameter, not a forced period.Not applicable; no derivational link. unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
For WHU-CDC and UCCD, the number of prototype clusters K is set to 5 and 8, respectively, determined by the semantic diversity of change types in each dataset.
-
Contrast with RS parameter-free chain in Foundation/RealityFromDistinction.reality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We employ the AdamW optimizer with a global initial learning rate of 1e-4... LoRA with rank r=16 and r=64... Training proceeds for 200 epochs.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Change cap- tioning: A new paradigm for multitemporal remote sensing image analysis
Genc Hoxha, Saliha Chouaf, Farid Melgani, and Youcef Smara. Change cap- tioning: A new paradigm for multitemporal remote sensing image analysis. IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2022
2022
-
[2]
Change3d: Revisiting change detection and captioning from a video modeling perspective
Duowang Zhu, Xiaohu Huang, Haiyan Huang, Hao Zhou, and Zhenfeng Shao. Change3d: Revisiting change detection and captioning from a video modeling perspective. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 24011–24022, 2025
2025
-
[3]
Cd4c: Change detection for remote sensing image change captioning.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025
Xiliang Li, Bin Sun, Zhenhua Wu, Shutao Li, and Hu Guo. Cd4c: Change detection for remote sensing image change captioning.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2025
2025
-
[4]
Pixel-level change detection pseudo-label learning for remote sensing change captioning
Chenyang Liu, Keyan Chen, Zipeng Qi, Zili Liu, Haotian Zhang, Zhengxia Zou, and Zhenwei Shi. Pixel-level change detection pseudo-label learning for remote sensing change captioning. InIGARSS 2024-2024 IEEE Interna- tional Geoscience and Remote Sensing Symposium, pages 8405–8408. IEEE, 2024
2024
-
[5]
Change caption- ing for satellite images time series.IEEE Geoscience and Remote Sensing Letters, 21:1–5, 2024
Wei Peng, Ping Jian, Zhuqing Mao, and Yingying Zhao. Change caption- ing for satellite images time series.IEEE Geoscience and Remote Sensing Letters, 21:1–5, 2024
2024
-
[6]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. InAdvances in Neural Information Processing Systems (NeurIPS), volume 30, pages 5998–6008, 2017
2017
-
[7]
An im- age is worth 16x16 words: Transformers for image recognition at scale
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An im- age is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations (ICLR), 2021
2021
-
[8]
Rsic-gmamba: A state space model with genetic operations for remote sensing image cap- tioning.IEEE Transactions on Geoscience and Remote Sensing, 2025
Lingwu Meng, Jing Wang, Yan Huang, and Liang Xiao. Rsic-gmamba: A state space model with genetic operations for remote sensing image cap- tioning.IEEE Transactions on Geoscience and Remote Sensing, 2025
2025
-
[9]
Mask approximation net: A novel diffusion model approach 16 Y
Dongwei Sun, Jing Yao, Wu Xue, Changsheng Zhou, Pedram Ghamisi, and Xiangyong Cao. Mask approximation net: A novel diffusion model approach 16 Y. Gao et al. for remote sensing change captioning.IEEE transactions on geoscience and remote sensing, 2025
2025
-
[10]
RS-LLaVA: A large vision-language model for joint captioning and question answering in remote sensing imagery.Remote Sensing, 16(9):1477, 2024
Bin Zhang, Shuting Zhao, Yuqi Liang, Jiaming Ye, Shuai Lu, and Jiawei Ma. RS-LLaVA: A large vision-language model for joint captioning and question answering in remote sensing imagery.Remote Sensing, 16(9):1477, 2024
2024
-
[11]
Describing land cover changes via multi-temporal remote sensing image cap- tioning using llm, vit, and lora.Remote Sensing, 18(1):166, 2026
Javier Lamar León, Vitor Nogueira, Pedro Salgueiro, and Paulo Quaresma. Describing land cover changes via multi-temporal remote sensing image cap- tioning using llm, vit, and lora.Remote Sensing, 18(1):166, 2026
2026
-
[12]
Multi-task learning for dense prediction tasks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3614–3633, 2021
Simon Vandenhende, Stamatios Georgoulis, Wouter Van Gansbeke, Marc Proesmans, Dengxin Dai, and Luc Van Gool. Multi-task learning for dense prediction tasks: A survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(7):3614–3633, 2021
2021
-
[14]
Detection assisted change captioning for remote sensing image
Xiliang Li, Bin Sun, and Shutao Li. Detection assisted change captioning for remote sensing image. InIGARSS 2024-2024 IEEE International Geo- science and Remote Sensing Symposium, pages 10454–10458. IEEE, 2024
2024
-
[15]
Change-agent: Toward interactive comprehensive remote sens- ing change interpretation and analysis.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2024
Chenyang Liu, Keyan Chen, Haotian Zhang, Zipeng Qi, Zhengxia Zou, and Zhenwei Shi. Change-agent: Toward interactive comprehensive remote sens- ing change interpretation and analysis.IEEE Transactions on Geoscience and Remote Sensing, 62:1–16, 2024
2024
-
[16]
Scnet: Lightweight spatial-channel attention network for remote sensing change captioning.IEEE Transactions on Geoscience and Remote Sensing, 2026
Dongwei Sun, Yuduo Wang, Jing Yao, Weikang Yu, Xiangyong Cao, and Pedram Ghamisi. Scnet: Lightweight spatial-channel attention network for remote sensing change captioning.IEEE Transactions on Geoscience and Remote Sensing, 2026
2026
-
[17]
Remote sensing spatiotemporal vision–language models: A comprehensive survey.IEEE Geoscience and Remote Sensing Magazine, 2025
Chenyang Liu, Jiafan Zhang, Keyan Chen, Man Wang, Zhengxia Zou, and Zhenwei Shi. Remote sensing spatiotemporal vision–language models: A comprehensive survey.IEEE Geoscience and Remote Sensing Magazine, 2025
2025
-
[18]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the International Conference on Machine Learning (ICML), pages 8748–8763, 2021
2021
-
[19]
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord, Yazhe Li, and Oriol Vinyals. Representation learning with contrastive predictive coding.arXiv preprint arXiv:1807.03748, 2018
work page Pith review arXiv 2018
-
[20]
SNUNet-CD: A densely connected siamese network for change detection of VHR images.IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022
Sheng Fang, Kaiyu Li, Jinyuan Shao, and Zhe Li. SNUNet-CD: A densely connected siamese network for change detection of VHR images.IEEE Geoscience and Remote Sensing Letters, 19:1–5, 2022
2022
-
[21]
Deep residual learning for image recognition
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016. Change Captioning for Urban Construction Monitoring 17
2016
-
[22]
Fully con- volutional siamese networks for change detection
Rodrigo Caye Daudt, Bertrand Le Saux, and Alexandre Boulch. Fully con- volutional siamese networks for change detection. InProceedings of the IEEE International Conference on Image Processing (ICIP), pages 4063– 4067, 2018
2018
-
[23]
A spatial-temporal attention-based method and a new dataset for remote sensing image change detection.Remote Sensing, 12(10):1662, 2020
Hao Chen and Zhenwei Shi. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection.Remote Sensing, 12(10):1662, 2020
2020
-
[24]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF International Con- ference on Computer Vision (ICCV), pages 10012–10022, 2021
2021
-
[25]
Remote sensing change detection with transformers trained from scratch.IEEE Transactions on Geoscience and Remote Sensing, 62:1–15, 2024
Mustansar Noman, Mustansar Fiaz, Hisham Cholakkal, Rao Muhammad Anwer, Salman Khan, and Fahad Shahbaz Khan. Remote sensing change detection with transformers trained from scratch.IEEE Transactions on Geoscience and Remote Sensing, 62:1–15, 2024
2024
-
[26]
Remote sensing image change detection with transformers.IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2021
Hao Chen, Zipeng Qi, and Zhenwei Shi. Remote sensing image change detection with transformers.IEEE Transactions on Geoscience and Remote Sensing, 60:1–14, 2021
2021
-
[27]
Wele Gedara Chaminda Bandara and Vishal M. Patel. A transformer- based siamese network for change detection. InProceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), pages 207–210, 2022
2022
-
[28]
Intertemporalinteractionandsymmetricdifferencelearningforremotesens- ingimagechangecaptioning.IEEE Transactions on Geoscience and Remote Sensing, 62:1–13, 2024
Yunpeng Li, Xiangrong Zhang, Xina Cheng, Puhua Chen, and Licheng Jiao. Intertemporalinteractionandsymmetricdifferencelearningforremotesens- ingimagechangecaptioning.IEEE Transactions on Geoscience and Remote Sensing, 62:1–13, 2024
2024
-
[30]
Changes to captions: An attentive network for remote sensing change captioning.IEEE Transactions on Image Processing, 32:6047–6060, 2023
Shizhen Chang and Pedram Ghamisi. Changes to captions: An attentive network for remote sensing change captioning.IEEE Transactions on Image Processing, 32:6047–6060, 2023
2023
-
[31]
A decoupling paradigm with prompt learning for remote sensing image change captioning.IEEE Transactions on Geoscience and Remote Sensing, 61:1–18, 2023
Chenyang Liu, Rui Zhao, Jianqi Chen, Zipeng Qi, Zhengxia Zou, and Zhen- wei Shi. A decoupling paradigm with prompt learning for remote sensing image change captioning.IEEE Transactions on Geoscience and Remote Sensing, 61:1–18, 2023
2023
-
[32]
RSCaMa: Remote sensing image change captioning with state space model.IEEE Geoscience and Remote Sensing Letters, 21:1–5, 2024
Chenyang Liu, Keyan Chen, Bowen Chen, Haotian Zhang, Zhengxia Zou, and Zhenwei Shi. RSCaMa: Remote sensing image change captioning with state space model.IEEE Geoscience and Remote Sensing Letters, 21:1–5, 2024
2024
-
[33]
Remote sensing image change captioning using multi-attentive network with diffusion model.Remote Sensing, 16(21):4083, 2024
Yunpeng Yang, Tingting Liu, Yonggang Pu, Lianming Liu, Qing Zhao, and Qian Wan. Remote sensing image change captioning using multi-attentive network with diffusion model.Remote Sensing, 16(21):4083, 2024
2024
-
[34]
Semantic-CC: Boosting remote sensing image change cap- 18 Y
Haoran Liu, Yibo Zhao, Yuan Jin, Keyan Li, Jiaqi Chen, Zhengxia Zou, and Zhenwei Shi. Semantic-CC: Boosting remote sensing image change cap- 18 Y. Gao et al. tioning via foundational knowledge and semantic guidance.arXiv preprint arXiv:2407.14032, 2024
-
[35]
Enhancing perception of key changes in remote sensing image change captioning.IEEE Transactions on Image Processing, 2025
Cong Yang, Zuchao Li, Hongzan Jiao, Zhi Gao, and Lefei Zhang. Enhancing perception of key changes in remote sensing image change captioning.IEEE Transactions on Image Processing, 2025
2025
-
[36]
Visual in- struction tuning
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual in- struction tuning. InAdvances in Neural Information Processing Systems (NeurIPS), 2023
2023
-
[37]
BLIP-2: Bootstrap- ping language-image pre-training with frozen image encoders and large lan- guage models
Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. BLIP-2: Bootstrap- ping language-image pre-training with frozen image encoders and large lan- guage models. InProceedings of the International Conference on Machine Learning (ICML), pages 19730–19742, 2023
2023
-
[38]
Advancing plain vision transformer towards remote sensing foundation model.IEEE Transactions on Geoscience and Remote Sensing, 61:1–15, 2023
Di Wang, Qiming Zhang, Yanxing Xu, Jing Zhang, Bo Du, Dacheng Tao, and Liangpei Zhang. Advancing plain vision transformer towards remote sensing foundation model.IEEE Transactions on Geoscience and Remote Sensing, 61:1–15, 2023
2023
-
[39]
RSVQA: Vi- sual question answering for remote sensing data.IEEE Transactions on Geoscience and Remote Sensing, 58(12):8555–8566, 2020
Sylvain Lobry, Diego Marcos, Jesse Murray, and Devis Tuia. RSVQA: Vi- sual question answering for remote sensing data.IEEE Transactions on Geoscience and Remote Sensing, 58(12):8555–8566, 2020
2020
-
[40]
GeoChat: Grounded large vision-language model for remote sensing
Kartik Kuckreja, Muhammad Sohail Danish, Muzammal Naseer, Abhijit Khan, Salman Khan, and Fahad Shahbaz Khan. GeoChat: Grounded large vision-language model for remote sensing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 27831–27840, 2024
2024
-
[41]
Yunpeng Wang, Wenbo Li, Jian Gong, Michael Kopp, and Devis Tuia. EarthVQA: Towards queryable earth via relational reasoning-based remote sensing visual question answering.arXiv preprint arXiv:2312.12222, 2023
-
[42]
Pei Deng, Wenqian Zhou, and Hanlin Wu. ChangeChat: An interactive model for remote sensing change analysis via multimodal instruction tuning. arXiv preprint arXiv:2409.08582, 2025
-
[43]
arXiv preprint arXiv:2409.16261
Mustansar Noman, Noor Ahsan, Muzammal Naseer, Hisham Cholakkal, RaoMuhammadAnwer,SalmanKhan,andFahadShahbazKhan. CDChat: A large multimodal model for remote sensing change description.arXiv preprint arXiv:2409.16261, 2024
-
[44]
Yujie Li et al. BTCChat: Advancing remote sensing bi-temporal change captioning with multimodal large language model.arXiv preprint arXiv:2509.05895, 2025
-
[45]
arXiv preprint arXiv:2410.10047 (2024)
Yuchao Wang, Wele Gedara Chaminda Yu, Michael Kopp, and Devis Tuia. ChangeMinds: Multi-task framework for detecting and describing changes in remote sensing.arXiv preprint arXiv:2410.10047, 2024
-
[46]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. LoRA: Low-rank adaptation of large language models. InInternational Conference on Learning Represen- tations (ICLR), 2022
2022
-
[47]
Feature pyramid networks for object detection
Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariha- ran, and Serge Belongie. Feature pyramid networks for object detection. Change Captioning for Urban Construction Monitoring 19 InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2117–2125, 2017
2017
-
[48]
Rbfim: Perceptual quality assessment for compressed point clouds usingradialbasisfunction interpolation.IEEE Transactions on Multimedia, 27:8579–8591, 2025
Zhang Chen, Shuai Wan, Siyu Ren, Fuzheng Yang, Mengting Yu, and Jun- hui Hou. Rbfim: Perceptual quality assessment for compressed point clouds usingradialbasisfunction interpolation.IEEE Transactions on Multimedia, 27:8579–8591, 2025
2025
-
[49]
Shikun Liu, Edward Johns, and Andrew J. Davison. End-to-end multi- task learning with attention. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1871–1880, 2019
2019
-
[50]
Lightglue: Local feature matching at light speed
Philipp Lindenberger, Paul-Edouard Sarlin, and Marc Pollefeys. Lightglue: Local feature matching at light speed. InProceedings of the IEEE/CVF international conference on computer vision, pages 17627–17638, 2023
2023
-
[51]
Kunping Yang, Jianchong Wei, Chengbin Chen, Zhensheng Wang, Junhui Lan, Xuanping Li, Duwei Hua, Dingli Xue, and Yi Wu. Restricted super- vised cascade information network for remote sensing change captioning with serial sentences.International Journal of Applied Earth Observation and Geoinformation, 142:104686, 2025
2025
-
[52]
A multitask network and two large-scale datasets for change detection and captioning in remote sensing images.IEEE Transactions on Geoscience and Remote Sensing, 62:1–17, 2024
Jingye Shi, Mengge Zhang, Yuewu Hou, Ruicong Zhi, and Jiqiang Liu. A multitask network and two large-scale datasets for change detection and captioning in remote sensing images.IEEE Transactions on Geoscience and Remote Sensing, 62:1–17, 2024
2024
-
[53]
Ali Can Karaca, Enes Ozelbas, Saadettin Berber, Orkhan Karimli, Turabi Yildirim, and Mehmet Fatih Amasyali. Robust change captioning in remote sensing: SECOND-CC dataset and MModalCC framework.IEEE Jour- nal of Selected Topics in Applied Earth Observations and Remote Sensing, 18:21494–21513, 2025
2025
-
[54]
Bleu: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. Bleu: a method for automatic evaluation of machine translation. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA, July 2002. Association for Computational Linguistics
2002
-
[55]
METEOR: An automatic metric for MT evaluation with improved correlation with human judgments
Satanjeev Banerjee and Alon Lavie. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. InPro- ceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Mea- sures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan, June 2005. Association for Computational Linguistics
2005
-
[56]
ROUGE: A package for automatic evaluation of summaries
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. InText Summarization Branches Out, pages 74–81, Barcelona, Spain, July
-
[57]
Association for Computational Linguistics
-
[58]
Lawrence Zitnick, and Devi Parikh
Ramakrishna Vedantam, C. Lawrence Zitnick, and Devi Parikh. CIDEr: Consensus-based image description evaluation. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4566–4575, 2015
2015
-
[59]
Ioulossfor2d/3dobjectdetection
Dingfu Zhou, Jin Fang, Xibin Song, Chenye Guan, Junbo Yin, Yuchao Dai, andRuigangYang. Ioulossfor2d/3dobjectdetection. In2019 international conference on 3D vision (3DV), pages 85–94. IEEE, 2019. 20 Y. Gao et al
2019
-
[60]
Nafiseh Ghasemian Sorboni, Jinfei Wang, and Mohammad Reza Najafi. Fu- sion of google street view, lidar, and orthophoto classifications using ranking classes based on f1 score for building land-use type detection.Remote Sens- ing, 16(11):2011, 2024
2011
-
[61]
Saras-net: Scale and relation aware siamese network for change detection
Chao-Peng Chen, Jun-Wei Hsieh, Ping-Yang Chen, Yi-Kuan Hsieh, and Bor-Shiun Wang. Saras-net: Scale and relation aware siamese network for change detection. InProceedings of the AAAI Conference on Artificial In- telligence, volume 37, pages 14187–14195, 2023
2023
-
[62]
Describing and localizing multiple changes with transformers
Yue Qiu, Shintaro Yamamoto, Kazutoshi Nakashima, Ryota Suzuki, Kenji Iwata, Hirokatsu Kataoka, and Yutaka Satoh. Describing and localizing multiple changes with transformers. InProceedings of the IEEE/CVF Inter- national Conference on Computer Vision (ICCV), pages 1951–1960, 2021
1951
-
[63]
Re- mote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset.IEEE Transactions on Geoscience and Remote Sensing, 60:1–20, 2022
Chenyang Liu, Rui Zhao, Hao Chen, Zhengxia Zou, and Zhenwei Shi. Re- mote sensing image change captioning with dual-branch transformers: A new method and a large scale dataset.IEEE Transactions on Geoscience and Remote Sensing, 60:1–20, 2022
2022
-
[64]
Progressive scale-aware network for remote sensing image change caption- ing
Chenyang Liu, Jiajun Yang, Zipeng Qi, Zhengxia Zou, and Zhenwei Shi. Progressive scale-aware network for remote sensing image change caption- ing. InIGARSS 2023-2023 IEEE International Geoscience and Remote Sensing Symposium, pages 6668–6671. IEEE, 2023
2023
-
[65]
Diffusion-RSCC: Diffusion probabilistic model for change captioning in remote sensing im- ages.IEEE Transactions on Geoscience and Remote Sensing, 2025
Xiaofei Yu, Yitong Li, Jie Ma, Chang Li, and Hanlin Wu. Diffusion-RSCC: Diffusion probabilistic model for change captioning in remote sensing im- ages.IEEE Transactions on Geoscience and Remote Sensing, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.