pith. sign in

arxiv: 2606.10775 · v2 · pith:5HQTPAUMnew · submitted 2026-06-09 · 💻 cs.CV

Spatially Selective Self-Training for Unsupervised Building Change Detection

Pith reviewed 2026-06-27 13:26 UTC · model grok-4.3

classification 💻 cs.CV
keywords unsupervised change detectionbuilding change detectionself-trainingremote sensingpseudo labelsbi-temporal imageslocal consistency
0
0 comments X

The pith

SST-CD learns building change detectors end-to-end from unlabeled images by training only on spatially reliable pseudo labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a method to perform unsupervised building change detection by turning temporal differences into noisy pseudo labels and then selectively training a detector only on pixels that pass a local consistency check. This addresses the problem that direct use of discrepancies often includes noise from non-building changes or registration errors. By learning a task-specific detector instead of relying on raw differences, the approach aims to produce more accurate change maps. Experiments demonstrate higher performance than prior label-free methods on three standard remote sensing datasets.

Core claim

SST-CD reformulates fully label-free building change detection as end-to-end detector learning under noisy pseudo supervision, using temporal discrepancies as candidate labels but supervising the detector only on pixels deemed reliable by a local consistency criterion, augmented by a feature adapter and prototype-based decoder.

What carries the argument

The local consistency criterion that filters inconsistent regions from supervision, allowing selective use of pseudo labels from temporal discrepancies.

Load-bearing premise

The local consistency criterion can accurately identify pixels where pseudo labels from temporal discrepancies are reliable enough for training.

What would settle it

On a new dataset where many building changes occur in regions that local consistency flags as inconsistent, the method would underperform direct discrepancy baselines.

Figures

Figures reproduced from arXiv: 2606.10775 by Anas M. I. Mohammed, Ratiba A. H. Abubaker, Wafaa I. M. Hussin, Xiang Zhou, Zhenming Peng, Zhi Lu.

Figure 1
Figure 1. Figure 1: Framework of the proposed Selective Self-Training Change Detection (SST-CD) method. A frozen SAM encoder extracts foundation features from [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Qualitative comparisons of the LEVIR-CD, WHU-CD, and DSIFN-CD datasets. The columns represent: (a) T1 Image, (b) T2 Image, (c) Ground [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison on representative LEVIR-CD samples. From [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Error-map comparison between random masking and selective [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
read the original abstract

Unsupervised building change detection aims to learn building-change masks from unlabeled bi-temporal remote sensing images. Existing label-free methods often follow a discrepancy-to-mask paradigm, directly using temporal differences, frozen foundation-model responses, prompt-based outputs, or post-processing results as final change maps. Although these strategies provide annotation-free cues, they do not learn a task-specific building-change detector and remain vulnerable to the gap between generic temporal discrepancies and building-defined structural changes. In practice, such discrepancies are often noisy and task-irrelevant, as appearance shifts, registration errors, and non-building modifications can produce strong but misleading responses. To address this problem, we propose SST-CD, a spatially selective self-training framework that reformulates fully label-free building change detection as end-to-end detector learning under noisy pseudo supervision. SST-CD uses temporal discrepancies as candidate pseudo labels and trains the detector only on spatially reliable pixels, whose reliability is estimated by a local consistency criterion that filters inconsistent regions from supervision. To further stabilize noisy self-training, a lightweight feature adapter recalibrates bi-temporal features, while a prototype-based decoder produces compact change and no-change representations. Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD show that SST-CD achieves F1 scores of 83.08%, 91.69%, and 86.60%, respectively, outperforming existing unsupervised and label-free baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes SST-CD, a spatially selective self-training framework for fully unsupervised building change detection. It reformulates the task as end-to-end detector learning under noisy pseudo-supervision by using temporal discrepancies as candidate labels, retaining only pixels deemed spatially reliable via a local consistency criterion, and stabilizing training with a lightweight feature adapter plus a prototype-based decoder. Experiments on LEVIR-CD, WHU-CD, and DSIFN-CD report F1 scores of 83.08%, 91.69%, and 86.60%, outperforming existing unsupervised and label-free baselines.

Significance. If the local consistency filter demonstrably isolates task-relevant building changes from other temporal discrepancies, the reformulation from discrepancy-to-mask to learned detector could meaningfully advance label-free change detection by reducing reliance on generic cues that are vulnerable to appearance shifts and registration errors.

major comments (2)
  1. [Abstract] Abstract: the central claim that the local consistency criterion selects pixels whose temporal-discrepancy pseudo-labels are sufficiently free of task-irrelevant noise rests on an unverified assumption; the text supplies no quantitative evidence (e.g., precision of retained pixels versus ground-truth changes, noise-type breakdown, or ablation removing the filter) that the selected supervision is building-specific rather than merely locally consistent noise.
  2. [Abstract] Abstract: no experimental controls, ablation studies, or analysis of how the consistency filter affects label noise are reported, so the contribution of the claimed reformulation versus the feature adapter or prototype decoder cannot be isolated and the reported F1 gains cannot be attributed to the core mechanism.
minor comments (1)
  1. The abstract lists specific F1 scores but does not name the exact unsupervised baselines or detail the evaluation protocol (e.g., threshold selection, post-processing).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback highlighting the need for stronger empirical validation of the local consistency filter and component contributions. We address each major comment below and will revise the manuscript to incorporate the requested evidence and analyses.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that the local consistency criterion selects pixels whose temporal-discrepancy pseudo-labels are sufficiently free of task-irrelevant noise rests on an unverified assumption; the text supplies no quantitative evidence (e.g., precision of retained pixels versus ground-truth changes, noise-type breakdown, or ablation removing the filter) that the selected supervision is building-specific rather than merely locally consistent noise.

    Authors: We agree that the current manuscript text does not provide direct quantitative validation (such as precision of retained pixels against ground-truth changes or an ablation removing the filter) to confirm that the local consistency criterion isolates building-specific changes rather than locally consistent noise. This is a valid observation. We will add a dedicated analysis subsection with precision/recall metrics on retained pseudo-labels versus ground truth, a noise-type breakdown where feasible, and an ablation study removing the consistency filter to demonstrate its impact on supervision quality. revision: yes

  2. Referee: [Abstract] Abstract: no experimental controls, ablation studies, or analysis of how the consistency filter affects label noise are reported, so the contribution of the claimed reformulation versus the feature adapter or prototype decoder cannot be isolated and the reported F1 gains cannot be attributed to the core mechanism.

    Authors: We concur that without explicit ablation studies and controls, it is not possible to isolate the contribution of the spatially selective self-training reformulation from the feature adapter and prototype decoder, nor to attribute F1 gains specifically to the core mechanism. The manuscript currently focuses on overall performance comparisons. We will add comprehensive ablation experiments (including variants with/without the consistency filter, adapter, and prototype decoder) and analysis of label noise reduction in the revised version to better substantiate the claims. revision: yes

Circularity Check

0 steps flagged

No circularity: method is empirical self-training on public datasets without self-referential derivations

full rationale

The paper presents SST-CD as an empirical framework that selects pseudo-label pixels via a local consistency criterion and trains a detector on them, with performance reported on standard public benchmarks (LEVIR-CD, WHU-CD, DSIFN-CD). No equations, parameter fits, or derivations are shown that reduce claimed outputs to quantities defined by the inputs themselves. The central claim rests on the effectiveness of the filtering step and architectural additions, which are evaluated externally rather than forced by construction or self-citation chains. This is the common case of a self-contained empirical method.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described beyond the high-level method components.

pith-pipeline@v0.9.1-grok · 5802 in / 1102 out tokens · 23895 ms · 2026-06-27T13:26:26.162984+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 1 linked inside Pith

  1. [1]

    Change detection techniques,

    D. Lu, P. Mausel, E. Brond ´ızio, and E. Moran, “Change detection techniques,”International Journal of Remote Sensing, vol. 25, no. 12, pp. 2365–2401, 2004

  2. [2]

    Stade- cdnet: Spatial–temporal attention with difference enhancement-based network for remote sensing image change detection,

    Z. Li, S. Cao, J. Deng, F. Wu, R. Wang, J. Luo, and Z. Peng, “Stade- cdnet: Spatial–temporal attention with difference enhancement-based network for remote sensing image change detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–17, 2024

  3. [3]

    Edgerefnet: An edge-guided refinement network for building change detection in remote sensing images,

    W. I. M. Hussin, Z. Lu, A. Ashraf, A. Mao, and Z. Peng, “Edgerefnet: An edge-guided refinement network for building change detection in remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 64, pp. 4 409 610–4 409 610, 2026

  4. [4]

    S2Looking: A satellite side-looking dataset for building change detection,

    L. Shen, Y . Lu, H. Chen, H. Wei, D. Xie, J. Yue, R. Chen, S. Lv, and B. Jiang, “S2Looking: A satellite side-looking dataset for building change detection,”Remote Sensing, vol. 13, no. 24, p. 5094, 2021

  5. [5]

    DSA-Net: A novel deeply supervised attention-guided network for building change detection in high-resolution remote sensing images,

    Q. Ding, Z. Shao, X. Huang, and O. Altan, “DSA-Net: A novel deeply supervised attention-guided network for building change detection in high-resolution remote sensing images,”International Journal of Applied Earth Observation and Geoinformation, vol. 105, p. 102591, 2021

  6. [6]

    HFA-Net: High frequency attention siamese network for building change detection in VHR remote sensing images,

    H. Zheng, M. Gong, T. Liu, F. Jiang, T. Zhan, D. Lu, and M. Zhang, “HFA-Net: High frequency attention siamese network for building change detection in VHR remote sensing images,”Pattern Recognition, vol. 129, p. 108717, 2022

  7. [7]

    A Survey of Sample-Efficient Deep Learning for Change Detection in Remote Sensing: Tasks, strategies, and challenges,

    L. Ding, D. Hong, M. Zhao, H. Chen, C. Li, J. Deng, N. Yokoya, L. Bruzzone, and J. Chanussot, “A Survey of Sample-Efficient Deep Learning for Change Detection in Remote Sensing: Tasks, strategies, and challenges,”IEEE Geoscience and Remote Sensing Magazine, pp. 2–27, 2025

  8. [8]

    Dynamically Updated Semi-Supervised Change Detection Network Combining Cross- Supervision and Screening Algorithms,

    S. Yuan, R. Zhong, C. Yang, Q. Li, and Y . Dong, “Dynamically Updated Semi-Supervised Change Detection Network Combining Cross- Supervision and Screening Algorithms,”IEEE Transactions on Geo- science and Remote Sensing, vol. 62, pp. 1–14, 2024

  9. [9]

    AdaSemiCD: An Adaptive Semi-Supervised Change Detection Method Based on Pseudo-Label Evaluation,

    L. Ran, D. Wen, T. Zhuo, S. Zhang, X. Zhang, and Y . Zhang, “AdaSemiCD: An Adaptive Semi-Supervised Change Detection Method Based on Pseudo-Label Evaluation,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–14, 2025

  10. [10]

    Integrating Local and Global Features via CNN and Mamba for Semi-Supervised Change Detection,

    Z. Zhang, Y . Zhou, L. Huang, X. Jiang, G. Xu, and X. Liu, “Integrating Local and Global Features via CNN and Mamba for Semi-Supervised Change Detection,”IEEE Transactions on Instrumentation and Mea- surement, vol. 74, pp. 1–15, 2025

  11. [11]

    Weakly supervised building change detection based on DeepCut and temporal invariant,

    L. Ma, Y . Huang, W. Shi, Y . Wang, and X. Ye, “Weakly supervised building change detection based on DeepCut and temporal invariant,” IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1– 18, 2024

  12. [12]

    TransWCD: Scene-Adaptive Joint Constrained Framework for Weakly Supervised Change Detection,

    Z. Zhao, L. Ru, C. Wu, and D. Wang, “TransWCD: Scene-Adaptive Joint Constrained Framework for Weakly Supervised Change Detection,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1– 12, 2025

  13. [13]

    SWCD: Toward Accurate Change Detection via Similarity-Awareness Weakly Super- vised Learning,

    Z. Tan, F. Luo, C. Fu, T. Guo, B. Du, and X. Gao, “SWCD: Toward Accurate Change Detection via Similarity-Awareness Weakly Super- vised Learning,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–16, 2025

  14. [14]

    SSLChange: A Self- Supervised Change Detection Framework Based on Domain Adapta- tion,

    Y . Zhao, T. Celik, N. Liu, F. Gao, and H.-C. Li, “SSLChange: A Self- Supervised Change Detection Framework Based on Domain Adapta- tion,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–14, 2024

  15. [15]

    ICSF: Integrating Inter-Modal and Cross-Modal Learning Framework for Self-Supervised Heterogeneous Change Detection,

    E. Zhang, H. Zong, X. Li, M. Feng, and J. Ren, “ICSF: Integrating Inter-Modal and Cross-Modal Learning Framework for Self-Supervised Heterogeneous Change Detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–16, 2025

  16. [16]

    Self-supervised change detection in mul- tiview remote sensing images,

    Y . Chen and L. Bruzzone, “Self-supervised change detection in mul- tiview remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022

  17. [17]

    Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection,

    Y . Zhang, Y . Zhao, Y . Dong, and B. Du, “Self-Supervised Pretraining via Multimodality Images With Transformer for Change Detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–11, 2023

  18. [18]

    S2C: Learning noise-resistant differences for unsupervised change detection in multimodal remote sensing images,

    L. Ding, X. Zuo, D. Hong, H. Guo, J. Lu, Z. Gong, and L. Bruzzone, “S2C: Learning noise-resistant differences for unsupervised change detection in multimodal remote sensing images,” 2025

  19. [19]

    Segment anything,

    A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Lo, P. Doll´ar, and R. Girshick, “Segment anything,” in2023 IEEE/CVF International Conference on Computer Vision (ICCV), 2023, pp. 3992–4003

  20. [20]

    DINOv2: Learning Robust Vi- sual Features without Supervision,

    M. Oquab, T. Darcet, T. Moutakanni, H. V . V o, M. Szafraniec, V . Khali- dov, P. Fernandez, D. Haziza, F. Massa, A. El-Nouby, M. Assran, N. Ballas, W. Galuba, R. Howes, P.-Y . Huang, S.-W. Li, I. Misra, M. Rabbat, V . Sharma, G. Synnaeve, H. Xu, H. J ´egou, J. Mairal, P. Labatut, A. Joulin, and P. Bojanowski, “DINOv2: Learning Robust Vi- sual Features wi...

  21. [21]

    Learning transferable visual models from natural language supervi- sion,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inProceedings of the 38th International Conference on Machine Learning, 2021, pp. 8748–8763

  22. [22]

    Segment any change,

    Z. Zheng, Y . Zhong, L. Zhang, and S. Ermon, “Segment any change,” arXiv preprint arXiv:2402.01188, 2024

  23. [23]

    Segment change model (SCM) for unsupervised change detection in VHR remote sensing images: A case study of buildings,

    X. Tan, G. Chen, T. Wang, J. Wang, and X. Zhang, “Segment change model (SCM) for unsupervised change detection in VHR remote sensing images: A case study of buildings,” inIEEE International Geoscience and Remote Sensing Symposium, 2024, pp. 8577–8580

  24. [24]

    DynamicEarth: How Far Are We from Open-V ocabulary Change Detection?

    K. Li, X. Cao, Y . Deng, C. Pang, Z. Xin, H. Qiao, T. Gong, D. Meng, and Z. Wang, “DynamicEarth: How Far Are We from Open-V ocabulary Change Detection?”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 40, no. 8, pp. 6279–6287, 2026

  25. [25]

    UniVCD: A new method for unsupervised change detection in the open-vocabulary era,

    Z. Zhu and B. Yang, “UniVCD: A new method for unsupervised change detection in the open-vocabulary era,” 2025

  26. [26]

    A Multi-Scale Remote Sensing Image Change Detection Network Based on Vision Foundation Model,

    S. Liu, D. Zhao, and L. Tang, “A Multi-Scale Remote Sensing Image Change Detection Network Based on Vision Foundation Model,”Remote Sensing, vol. 18, no. 3, 2026

  27. [27]

    Foundation Model-Driven Semantic Change Detection in Remote Sens- ing Imagery,

    H. Shen, L. Yan, H. Xie, Y . Wei, X. Li, W. Shen, P. Lv, and F. Tan, “Foundation Model-Driven Semantic Change Detection in Remote Sens- ing Imagery,” 2026

  28. [28]

    Cut and learn for unsu- pervised object detection and instance segmentation,

    X. Wang, R. Girdhar, S. X. Yu, and I. Misra, “Cut and learn for unsu- pervised object detection and instance segmentation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 3124–3134

  29. [29]

    Unsuper- vised universal image segmentation,

    D. Niu, X. Wang, X. Han, L. Lian, R. Herzig, and T. Darrell, “Unsuper- vised universal image segmentation,”arXiv preprint arXiv:2312.17243, 2023

  30. [30]

    SemiCDNet: A Semisupervised Convolutional Neural Network for Change Detection in High Resolution Remote-Sensing Images,

    D. Peng, L. Bruzzone, Y . Zhang, H. Guan, H. Ding, and X. Huang, “SemiCDNet: A Semisupervised Convolutional Neural Network for Change Detection in High Resolution Remote-Sensing Images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 59, no. 7, pp. 5891–5906, 2021

  31. [31]

    Reliable Contrastive Learning for Semi-Supervised Change Detection in Remote Sensing Images,

    J.-X. Wang, T. Li, S.-B. Chen, J. Tang, B. Luo, and R. C. Wilson, “Reliable Contrastive Learning for Semi-Supervised Change Detection in Remote Sensing Images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–13, 2022

  32. [32]

    Fully Convolutional Change Detection Framework With Generative Adversarial Network for Unsupervised, Weakly Supervised and Regional Supervised Change Detection,

    C. Wu, B. Du, and L. Zhang, “Fully Convolutional Change Detection Framework With Generative Adversarial Network for Unsupervised, Weakly Supervised and Regional Supervised Change Detection,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 8, pp. 9774–9788, 2023

  33. [33]

    ESAM-CD: Fine-Tuned EfficientSAM Network With LoRA for Weakly Supervised Remote Sensing Image Change Detection,

    M. Wang, L. Zhou, K. Zhang, X. Li, M. Hao, and Y . Ye, “ESAM-CD: Fine-Tuned EfficientSAM Network With LoRA for Weakly Supervised Remote Sensing Image Change Detection,”IEEE Transactions on Geo- science and Remote Sensing, vol. 62, pp. 1–16, 2024. 10

  34. [34]

    Digital change detection techniques using remotely-sensed data,

    A. Singh, “Digital change detection techniques using remotely-sensed data,”International Journal of Remote Sensing, vol. 10, no. 6, pp. 989– 1003, 1989

  35. [35]

    A theoretical framework for unsupervised change detection based on change vector analysis in the polar domain,

    F. Bovolo and L. Bruzzone, “A theoretical framework for unsupervised change detection based on change vector analysis in the polar domain,” IEEE Transactions on Geoscience and Remote Sensing, vol. 45, no. 1, pp. 218–236, 2007

  36. [36]

    Unsupervised change detection in satellite images using principal component analysis and k-means clustering,

    T. Celik, “Unsupervised change detection in satellite images using principal component analysis and k-means clustering,”IEEE Geoscience and Remote Sensing Letters, vol. 6, no. 4, pp. 772–776, 2009

  37. [37]

    Unsupervised deep change vector analysis for multiple-change detection in VHR images,

    S. Saha, F. Bovolo, and L. Bruzzone, “Unsupervised deep change vector analysis for multiple-change detection in VHR images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 6, pp. 3677–3693, 2019

  38. [38]

    Content- Invariant Dual Learning for Change Detection in Remote Sensing Images,

    B. Fang, G. Chen, G. Ouyang, J. Chen, R. Kou, and L. Wang, “Content- Invariant Dual Learning for Change Detection in Remote Sensing Images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–17, 2022

  39. [39]

    Unsupervised change detection based on image reconstruction loss with segment anything,

    H. Noh, J. Ju, Y . Kim, M. Kim, and D.-G. Choi, “Unsupervised change detection based on image reconstruction loss with segment anything,” Remote Sensing Letters, vol. 15, no. 9, pp. 919–929, 2024

  40. [40]

    Emerging Properties in Self-Supervised Vision Transform- ers,

    M. Caron, H. Touvron, I. Misra, H. Jegou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging Properties in Self-Supervised Vision Transform- ers,” in2021 IEEE/CVF International Conference on Computer Vision, 2021, pp. 9630–9640

  41. [41]

    A New Learning Paradigm for Foundation Model-Based Remote-Sensing Change Detection,

    K. Li, X. Cao, and D. Meng, “A New Learning Paradigm for Foundation Model-Based Remote-Sensing Change Detection,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–12, 2024

  42. [42]

    Adapting segment anything model for change detection in VHR remote sensing images,

    L. Ding, K. Zhu, D. Peng, H. Tang, K. Yang, and L. Bruzzone, “Adapting segment anything model for change detection in VHR remote sensing images,”IEEE Transactions on Geoscience and Remote Sensing, vol. 62, pp. 1–11, 2024

  43. [43]

    Changeclip: Remote sensing change detection with multimodal vision-language representation learn- ing,

    S. Dong, L. Wang, B. Du, and X. Meng, “Changeclip: Remote sensing change detection with multimodal vision-language representation learn- ing,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 208, pp. 53–69, 2024

  44. [44]

    Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,

    A. Tarvainen and H. Valpola, “Mean teachers are better role mod- els: Weight-averaged consistency targets improve semi-supervised deep learning results,” inAnnual Conference on Neural Information Process- ing Systems, vol. 30, 2017, pp. 1195–1204

  45. [45]

    Confidence regularized self-training,

    Y . Zou, Z. Yu, X. Liu, B. V . K. V . Kumar, and J. Wang, “Confidence regularized self-training,” in2019 IEEE/CVF International Conference on Computer Vision, 2019, pp. 5981–5990

  46. [46]

    Co-teaching: Robust training of deep neural networks with extremely noisy labels,

    B. Han, Q. Yao, X. Yu, G. Niu, M. Xu, W. Hu, I. Tsang, and M. Sugiyama, “Co-teaching: Robust training of deep neural networks with extremely noisy labels,” inAdvances in Neural Information Pro- cessing Systems, vol. 31, 2018

  47. [47]

    An image is worth 16x16 words: Transformers for image recognition at scale,

    A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, and S. Gelly, “An image is worth 16x16 words: Transformers for image recognition at scale,” inICLR, 2021

  48. [48]

    Squeeze-and-excitation networks,

    J. Hu, L. Shen, and G. Sun, “Squeeze-and-excitation networks,” in CVPR, 2018, pp. 7132–7141

  49. [49]

    Prototypical networks for few- shot learning,

    J. Snell, K. Swersky, and R. Zemel, “Prototypical networks for few- shot learning,” inAdvances in neural information processing systems, vol. 30, 2017

  50. [50]

    A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection,

    H. Chen and Z. Shi, “A Spatial-Temporal Attention-Based Method and a New Dataset for Remote Sensing Image Change Detection,”Remote Sensing, vol. 12, no. 10, p. 1662, 2020

  51. [51]

    Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set,

    S. Ji, S. Wei, and M. Lu, “Fully Convolutional Networks for Multisource Building Extraction From an Open Aerial and Satellite Imagery Data Set,”IEEE Transactions on Geoscience and Remote Sensing, vol. 57, no. 1, pp. 574–586, 2019

  52. [52]

    A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,

    C. Zhang, P. Yue, D. Tapete, L. Jiang, B. Shangguan, L. Huang, and G. Liu, “A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 166, pp. 183–200, 2020

  53. [53]

    Remote sensing image change detection with transformers,

    H. Chen, Z. Qi, and Z. Shi, “Remote sensing image change detection with transformers,”IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–14, 2022

  54. [54]

    Change vector analysis: an approach for detecting forest changes with landsat,

    W. A. Malila, “Change vector analysis: an approach for detecting forest changes with landsat,” inProceedings of the Machine Processing of Remotely Sensed Data Symposium. Purdue University, 1980, pp. 326– 335

  55. [55]

    Unsupervised change detection based on image reconstruction loss,

    H. cheol Noh, J. gi Ju, M. seok Seo, J. chan Park, and D. geol Choi, “Unsupervised change detection based on image reconstruction loss,”2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1351–1360, 2022. [Online]. Available: https://api.semanticscholar.org/CorpusID:247939825

  56. [56]

    Pytorch: An imperative style, high-performance deep learning library,

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antigaet al., “Pytorch: An imperative style, high-performance deep learning library,”Advances in neural information processing systems, vol. 32, 2019

  57. [57]

    Decoupled weight decay regularization,

    I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017