VGGT-CD: Training-Free Robust Registration for 3D Change Detection

China); Qiang Li (1); Qi Wang (1) ((1) Northwestern Polytechnical University; Songhua Li (1); Wei Zhang (1); Xi'an; Yihang Wu (1)

arxiv: 2605.16859 · v1 · pith:CCEIAIEUnew · submitted 2026-05-16 · 💻 cs.CV · cs.AI

VGGT-CD: Training-Free Robust Registration for 3D Change Detection

Wei Zhang (1) , Songhua Li (1) , Yihang Wu (1) , Qiang Li (1) , Qi Wang (1) ((1) Northwestern Polytechnical University , Xi'an , China) This is my paper

Pith reviewed 2026-05-19 20:36 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords 3D change detectionpoint cloud registrationtraining-free pipelinemulti-temporal reconstructionstatic background isolationSim(3) alignmentvisual geometry foundation model

0 comments

The pith

VGGT-CD registers multi-temporal point clouds by first aligning sparse keyframes into one metric space then purifying dense reconstructions to static background only.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a training-free pipeline that turns independent per-epoch reconstructions from a visual-geometry model into aligned 3D change maps. It first performs joint sparse-keyframe inference to remove scale ambiguity and produce an initial Sim(3) prior. The second stage then removes points belonging to physical changes so that a closed-form centroid alignment on the remaining static correspondences can refine translation while locking scale and rotation. A residual self-check is used to ensure the refinement step never worsens the initial prior. On an 11-scene benchmark the method cuts absolute trajectory error by 44 percent outdoors and 59 percent indoors while running more than six times faster than prior approaches.

Core claim

By decoupling registration from dynamic interference through a coarse sparse-keyframe stage that establishes a unified metric space followed by a fine stage that isolates static-background correspondences and performs closed-form centroid alignment with a residual self-check, the pipeline produces non-degrading refinements and high-purity 3D change maps without any task-specific training.

What carries the argument

Two-stage registration: coarse sparse keyframe joint inference for an initial Sim(3) prior, followed by dense-reconstruction purification that isolates static-background correspondences for closed-form centroid alignment with residual self-check.

If this is right

Multi-view images captured at different times can be turned directly into metric 3D change maps without retraining any model.
Registration speed increases by a factor of six or more because only static correspondences are used in the final alignment.
The residual self-check provides a mathematical guarantee that the fine stage never degrades the coarse-stage prior.
High-purity change maps become available for urban monitoring and autonomous driving without requiring paired training data for each new scene.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same coarse-to-fine separation could be applied to other dense reconstruction models that output per-epoch point clouds.
If the static-background isolation step were made probabilistic, the method might extend to scenes with moving objects that occupy the same location across epochs.
The closed-form centroid step suggests that once scale and rotation are fixed, translation refinement reduces to a simple average of residuals on trusted points.

Load-bearing premise

The fine stage can always separate static background points from points belonging to actual scene changes so that alignment on the remaining points improves rather than harms the initial estimate.

What would settle it

A test sequence in which the fine-stage isolation step leaves more than a small fraction of changed points in the static set, causing the refined translation to increase rather than decrease absolute trajectory error compared with the coarse prior.

Figures

Figures reproduced from arXiv: 2605.16859 by China), Qiang Li (1), Qi Wang (1) ((1) Northwestern Polytechnical University, Songhua Li (1), Wei Zhang (1), Xi'an, Yihang Wu (1).

**Figure 1.** Figure 1: Given bi-temporal multi-view images (doors closed vs. trunk open), independent reconstruction produces two point clouds in separate coordinate frames. Naive overlay without alignment exposes severe scale ambiguity and edge-flying noise, rendering the two epochs entirely incomparable (left). RANSAC + Scale-ICP fails to resolve the scale discrepancy, resulting in registration failure and false-positive-domin… view at source ↗

**Figure 2.** Figure 2: Overall architecture of the proposed VGGT-CD pipeline. Given unposed image sets from two temporal states (T1 and T2), our training-free system operates in a decoupled, coarse-to-fine manner. (Top) Coarse Stage: A sparse subset of keyframes undergoes joint inference to establish a unified metric space. By aligning the implicitly reconstructed camera frustums, we extract a reliable global prior, rigidly loc… view at source ↗

**Figure 3.** Figure 3: Visualization of predicted camera trajectories versus ground truth on representative scenes. Each subplot [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparison on representative scenes. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Effect of keyframe budget K on the coarse stage. (a) ATE saturates around K=5; (b) GPU memory grows super-linearly with K; (c) computation time scales linearly. Orange markers highlight our default K=5. Effect of keyframe budget (K). We study the trade-off between the number of keyframes K used in the coarse stage and the resulting accuracy, memory cost, and computation time [PITH_FULL_IMAGE:figures/full_… view at source ↗

read the original abstract

3D change detection from multi-view images is essential for urban monitoring, disaster assessment, and autonomous driving. However, existing methods predominantly operate in the 2D domain, where viewpoint variations are mistaken for physical changes and depth is unavailable. While visual geometry foundation models like VGGT rapidly produce dense point clouds from unposed images, independent per-epoch reconstruction encounters fundamental obstacles: unpredictable inter-epoch scale ambiguity, registration-change paradox where scene changes corrupt alignment, and pervasive edge-flying noise. To address these challenges, we present VGGT-CD, a training-free pipeline decoupling cross-temporal registration from dynamic-change interference. In the Coarse Stage, sparse keyframe joint inference establishes a unified metric space and yields an initial Sim(3) prior. In the Fine Stage, dense reconstructions are purified by isolating static-background correspondences. A closed-form centroid alignment refines the translation while locking scale and rotation, using a residual self-check to mathematically guarantee non-degradation. Evaluated on an 11-scene benchmark from the World Across Time dataset, VGGT-CD reduces Absolute Trajectory Error by 44% outdoors and 59% indoors. It completes registration over 6 times faster, producing high-purity 3D change maps without task-specific training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

VGGT-CD gives a training-free two-stage registration pipeline on VGGT outputs that claims solid error cuts on a small benchmark, but the fine-stage static-point isolation is underspecified.

read the letter

The paper's core move is to split registration for 3D change detection into a coarse joint keyframe step that sets up a shared metric space and a fine step that tries to strip out changing regions before a simple centroid shift. This directly targets the registration-change paradox without needing any task-specific training or fine-tuning on VGGT itself. The closed-form refinement with locked scale and rotation plus a residual check is a clean way to avoid making things worse once you have candidate static points. On their 11-scene World Across Time set it reports 44% and 59% drops in absolute trajectory error plus a 6x speed-up, which would matter for urban monitoring or disaster work where retraining is impractical. The approach is new in how it chains the sparse prior to the purification step rather than just applying existing robust estimators end-to-end. The main soft spot is the fine-stage isolation itself. The abstract and stress-test note both leave the actual selection rule, threshold, or invariance property vague, so it is not clear how well the method holds when dynamic elements dominate or edge noise is high. If the selected static subset is small or biased, the later self-check cannot rescue the alignment. The quantitative numbers are given without error bars or explicit data-exclusion details, which makes them hard to reproduce from the text alone. This work is aimed at computer-vision groups already using VGGT or similar foundation models for temporal 3D tasks. A reader looking for practical, training-free baselines in change detection would find the pipeline worth testing. It deserves a serious referee because the decoupling idea is concrete and the speed/accuracy claims are testable, even though the isolation mechanism needs clearer description before the numbers can be taken as settled.

Referee Report

2 major / 1 minor

Summary. The paper introduces VGGT-CD, a training-free pipeline for robust 3D registration in change detection tasks. It leverages the VGGT visual geometry foundation model to generate dense point clouds from unposed multi-view images. The method consists of a Coarse Stage that uses sparse keyframe joint inference to establish a unified metric space and an initial Sim(3) prior, and a Fine Stage that purifies dense reconstructions by isolating static-background correspondences, followed by a closed-form centroid alignment to refine translation while locking scale and rotation, with a residual self-check to ensure non-degradation. On an 11-scene benchmark from the World Across Time dataset, it claims to reduce Absolute Trajectory Error by 44% outdoors and 59% indoors, complete registration over 6 times faster, and produce high-purity 3D change maps without task-specific training.

Significance. If the results hold, this work is significant for providing an efficient, training-free solution to 3D change detection that avoids the pitfalls of independent per-epoch reconstructions. The use of closed-form centroid alignment and a residual self-check for mathematical non-degradation is a notable strength, enhancing reproducibility and computational efficiency. This could have practical impact in applications like urban monitoring and autonomous driving by enabling high-purity change maps from multi-view images.

major comments (2)

Fine Stage: The isolation of static-background correspondences from dynamic-change interference is invoked to resolve the registration-change paradox (abstract), but no explicit algorithm, threshold, invariance property, or robust selection mechanism is specified. This is load-bearing for the claim that the subsequent closed-form centroid alignment (with scale/rotation locked and residual self-check) yields a non-degrading refinement, as the self-check operates after selection and could fail if the subset is small or biased under high change ratios.
Evaluation section: The reported ATE reductions (44% outdoors, 59% indoors) on the 11-scene benchmark lack error bars, exact data-exclusion rules, and full derivation steps for the metrics. This prevents verification of the quantitative claims and the cross-scene consistency asserted in the abstract.

minor comments (1)

Abstract: The claim of 'high-purity 3D change maps' is not accompanied by a definition or quantification of the purity metric.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive assessment of the work's significance. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: Fine Stage: The isolation of static-background correspondences from dynamic-change interference is invoked to resolve the registration-change paradox (abstract), but no explicit algorithm, threshold, invariance property, or robust selection mechanism is specified. This is load-bearing for the claim that the subsequent closed-form centroid alignment (with scale/rotation locked and residual self-check) yields a non-degrading refinement, as the self-check operates after selection and could fail if the subset is small or biased under high change ratios.

Authors: We agree that the Fine Stage description would benefit from greater explicitness. Section 3.2 describes purification via residual errors after the initial Sim(3) prior to isolate static correspondences before the closed-form centroid alignment and residual self-check. To address the concern directly, we will add an algorithm box with the precise selection procedure, the threshold criterion, and a short invariance argument (static points remain consistent under the locked scale/rotation). We will also include a brief analysis of performance under high change ratios to show the self-check remains effective even when the static subset is reduced. revision: yes
Referee: Evaluation section: The reported ATE reductions (44% outdoors, 59% indoors) on the 11-scene benchmark lack error bars, exact data-exclusion rules, and full derivation steps for the metrics. This prevents verification of the quantitative claims and the cross-scene consistency asserted in the abstract.

Authors: We accept this point. The current evaluation reports aggregate ATE reductions on the 11 scenes but does not include per-scene variance or explicit exclusion criteria. In the revision we will add error bars (standard deviation across scenes), state the exact data-exclusion rules applied, and append the metric derivation steps (including how ATE is computed from the aligned trajectories) to the evaluation section and supplementary material. These additions will make the 44 % / 59 % figures and cross-scene consistency fully verifiable. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on independent geometric operations

full rationale

The VGGT-CD pipeline is self-contained: the coarse stage uses sparse keyframe joint inference to produce a unified metric space and initial Sim(3) prior, while the fine stage applies closed-form centroid alignment on isolated static correspondences with a residual self-check. These operations are defined via standard rigid-body geometry and do not reduce the reported ATE improvements or non-degradation guarantee to any fitted parameter or self-citation within the same derivation. The isolation step is an assumption but does not create definitional equivalence between inputs and outputs. External VGGT foundation model and benchmark evaluation provide independent grounding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that VGGT produces usable dense point clouds and that static-background points can be separated from dynamic ones without introducing new fitted parameters beyond those already present in the foundation model.

axioms (2)

domain assumption VGGT rapidly produces dense point clouds from unposed images
Invoked in the opening paragraph as the starting point for both coarse and fine stages.
domain assumption Static-background correspondences can be isolated from dynamic-change interference
Stated when the fine stage is described as purifying dense reconstructions.

pith-pipeline@v0.9.0 · 5779 in / 1380 out tokens · 28521 ms · 2026-05-19T20:36:06.767229+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

[1]

Change detection of urban objects using 3D point clouds: A review

Uwe Stilla, Y usheng Xu. Change detection of urban objects using 3D point clouds: A review. ISPRS Journal of Photogrammetry and Remote Sensing , 197, pp. 228–255. 2023

work page 2023
[2]

Change detection in urban point clouds: An experimental comparison with simulated 3d datasets

Iris de Gélis, Sébastien Lefèvre, Thomas Corpetti. Change detection in urban point clouds: An experimental comparison with simulated 3d datasets. Remote Sensing, 13, (13), pp. 2629. 2021

work page 2021
[3]

Point cloud registration and change detection in urban environ- ment using an onboard Lidar sensor and MLS reference data

Örkény Zováthi, Balázs Nagy, Csaba Benedek. Point cloud registration and change detection in urban environ- ment using an onboard Lidar sensor and MLS reference data. International Journal of Applied Earth Observation and Geoinformation, 110, pp. 102767. 2022

work page 2022
[4]

Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man- made disasters

Zhuo Zheng, Y anfei Zhong, Junjue Wang, Ailong Ma, Liangpei Zhang. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man- made disasters. Remote Sensing of Environment, 265, pp. 112636. 2021

work page 2021
[5]

Integrating machine learning and remote sensing in disaster management: A decadal review of post-disaster building damage assessment

Sultan Al Shaﬁan, Da Hu. Integrating machine learning and remote sensing in disaster management: A decadal review of post-disaster building damage assessment. Buildings, 14, (8), pp. 2344. 2024

work page 2024
[6]

Rapid automatic detection of collapsed buildings with single period LiDAR data after an earthquake

Ömer Canözü, Hayrettin Acar. Rapid automatic detection of collapsed buildings with single period LiDAR data after an earthquake. Earth Science Informatics, 18, (1), pp. 151. 2025

work page 2025
[7]

SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection

Chun-Jung Lin, Tat-Jun Chin, Sourav Garg, Feras Dayoub. SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6330–6339. 2026

work page 2026
[8]

Urban 3D Change Detection Using LiDAR Sensor for HD Map Maintenance and Smart Mobility

Hezam Albagami, Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Zainy M Malakan, Abdullah M Alqamdi, et al.. Urban 3D Change Detection Using LiDAR Sensor for HD Map Maintenance and Smart Mobility. arXiv preprint arXiv:2510.21112. 2025

work page arXiv 2025
[9]

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images

Chenxiao Zhang, Peng Y ue, Deodato Tapete, Liangcun Jiang, Boyi Shangguan, Li Huang, et al.. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing , 166, pp. 183–200. 2020

work page 2020
[10]

A spatial-temporal attention-based method and a new dataset for remote sensing image change detection

Hao Chen, Zhenwei Shi. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote sensing, 12, (10), pp. 1662. 2020

work page 2020
[11]

Remote sensing image change detection with transformers

Hao Chen, Zipeng Qi, Zhenwei Shi. Remote sensing image change detection with transformers. IEEE Transac- tions on Geoscience and Remote Sensing , 60, pp. 1–14. 2021

work page 2021
[12]

Vggt: Vi- sual geometry grounded transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea V edaldi, Christian Rupprecht, David Novotny. Vggt: Vi- sual geometry grounded transformer. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 5294–5306. 2025

work page 2025
[13]

Dust3r: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Y ohann Cabon, Boris Chidlovskii, Jerome Revaud. Dust3r: Geometric 3d vision made easy. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 20697– 20709. 2024. 11 VGGT-CD A P REPRINT

work page 2024
[14]

Grounding image matching in 3d with mast3r

Vincent Leroy, Y ohann Cabon, Jérôme Revaud. Grounding image matching in 3d with mast3r. In European conference on computer vision , pp. 71–91. 2024

work page 2024
[15]

π3: Scalable Permutation- Equivariant Visual Geometry Learning

Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Y ang Zhou, Zizun Li, et al.. π3: Scalable Permutation- Equivariant Visual Geometry Learning. arXiv e-prints, pp. arXiv–2507. 2025

work page 2025
[16]

Streaming 4D Visual Geometry Transformer

Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Y uqi Wu, Jie Zhou, Jiwen Lu. Streaming 4d visual geometry trans- former. arXiv preprint arXiv:2507.11539. 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[17]

InfiniteVGGT: Visual geometry grounded transformer for endless streams

Shuai Y uan, Y antai Y ang, Xiaotian Y ang, Xupeng Zhang, Zhonghao Zhao, Lingming Zhang, et al.. InﬁniteVGGT: Visual Geometry Grounded Transformer for Endless Streams. arXiv preprint arXiv:2601.02281. 2026

work page arXiv 2026
[18]

Method for registration of 3-D shapes

Paul J Besl, Neil D McKay. Method for registration of 3-D shapes. In Sensor fusion IV: control paradigms and data structures, 1611, pp. 586–606. 1992

work page 1992
[19]

Least-squares estimation of transformation parameters between two point patterns

Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans- actions on pattern analysis and machine intelligence , 13, (4), pp. 376–380. 2002

work page 2002
[20]

A critical synthesis of remotely sensed optical image change detection techniques

Andrew P Tewkesbury, Alexis J Comber, Nicholas J Tate, Alistair Lamb, Peter F Fisher. A critical synthesis of remotely sensed optical image change detection techniques. Remote Sensing of Environment , 160, pp. 1–14. 2015

work page 2015
[21]

Review article digital change detection techniques using remotely-sensed data

Ashbindu Singh. Review article digital change detection techniques using remotely-sensed data. International journal of remote sensing , 10, (6), pp. 989–1003. 1989

work page 1989
[22]

Airborne laser scanningan introduction and overview

Aloysius Wehr, Uwe Lohr. Airborne laser scanningan introduction and overview. ISPRS Journal of photogram- metry and remote sensing , 54, (2-3), pp. 68–82. 1999

work page 1999
[23]

Photo tourism: exploring photo collections in 3D

Noah Snavely, Steven M Seitz, Richard Szeliski. Photo tourism: exploring photo collections in 3D. In ACM siggraph 2006 papers, pp. 835–846. 2006

work page 2006
[24]

PGN3DCD: Prior-Knowledge-Guided Network for Urban 3-D Point Cloud Change Detection

Wenxiao Zhan, Ruozhen Cheng, Jing Chen. PGN3DCD: Prior-Knowledge-Guided Network for Urban 3-D Point Cloud Change Detection. IEEE Transactions on Geoscience and Remote Sensing , 62, pp. 1–15. 2024

work page 2024
[25]

Living scenes: Multi-object relocalization and reconstruction in changing 3d environments

Liyuan Zhu, Shengyu Huang, Konrad Schindler, Iro Armeni. Living scenes: Multi-object relocalization and reconstruction in changing 3d environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 28014–28024. 2024

work page 2024
[26]

Mvsnet: Depth inference for unstructured multi-view stereo

Y ao Y ao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV) , pp. 767–783. 2018

work page 2018
[27]

Cascade cost volume for high- resolution multi-view stereo and stereo matching

Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, Ping Tan. Cascade cost volume for high- resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 2495–2504. 2020

work page 2020
[28]

Visual Consistency Enhancement for Multi-view Stereo Reconstruc- tion in Remote Sensing

Wei Zhang, Qiang Li, Y uan Y uan, Qi Wang. Visual Consistency Enhancement for Multi-view Stereo Reconstruc- tion in Remote Sensing. IEEE Transactions on Geoscience and Remote Sensing . 2024

work page 2024
[29]

Semantic-Guided Multiview Stereo Reconstruction for Aerial Image

Wei Zhang, Zhigang Y ang, Qiang Li, Qi Wang. Semantic-Guided Multiview Stereo Reconstruction for Aerial Image. IEEE Transactions on Geoscience and Remote Sensing , 63, pp. 1-11. 2025

work page 2025
[30]

Reﬁned Cascade Cost V olume for Multiview Remote Sensing Image Recon- struction

Wei Zhang, Qiang Li, Qi Wang. Reﬁned Cascade Cost V olume for Multiview Remote Sensing Image Recon- struction. IEEE Transactions on Geoscience and Remote Sensing , 63, pp. 1-11. 2025

work page 2025
[31]

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Y ong-Qiang Mao, Hanbo Bi, Liangyu Xu, Kaiqiang Chen, Zhirui Wang, Xian Sun, et al.. SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing. IEEE Transactions on Geoscience and Remote Sensing . 2024

work page 2024
[32]

Edge aware depth infer- ence for large-scale aerial building multi-view stereo

Song Zhang, ZhiWei Wei, WenJia Xu, LiLi Zhang, Y ang Wang, JinMing Zhang, et al.. Edge aware depth infer- ence for large-scale aerial building multi-view stereo. ISPRS Journal of Photogrammetry and Remote Sensing , 207, pp. 27–42. 2024

work page 2024
[33]

A hierarchical deformable deep neural network and an aerial image benchmark dataset for surface multiview stereo reconstruction

Jiayi Li, Xin Huang, Y ujin Feng, Zhen Ji, Shulei Zhang, Dawei Wen. A hierarchical deformable deep neural network and an aerial image benchmark dataset for surface multiview stereo reconstruction. IEEE Transactions on Geoscience and Remote Sensing , 61, pp. 1–12. 2023

work page 2023
[34]

A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset

Jin Liu, Shunping Ji. A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 6050–6059. 2020

work page 2020
[35]

Rethinking depth estimation for multi- view stereo: A uniﬁed representation

Rui Peng, Rongjie Wang, Zhenyu Wang, Y awen Lai, Ronggang Wang. Rethinking depth estimation for multi- view stereo: A uniﬁed representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8645–8654. 2022. 12 VGGT-CD A P REPRINT

work page 2022
[36]

Fast global registration

Qian-Yi Zhou, Jaesik Park, Vladlen Koltun. Fast global registration. In European conference on computer vision, pp. 766–782. 2016

work page 2016
[37]

Geotransformer: Fast and robust point cloud registration with geometric transformer

Zheng Qin, Hao Y u, Changjian Wang, Y ulan Guo, Y uxing Peng, Slobodan Ilic, et al.. Geotransformer: Fast and robust point cloud registration with geometric transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, (8), pp. 9806–9821. 2023

work page 2023
[38]

Dynamic cues-assisted transformer for robust point cloud regis- tration

Hong Chen, Pei Y an, Sihe Xiang, Yihua Tan. Dynamic cues-assisted transformer for robust point cloud regis- tration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 21698– 21707. 2024

work page 2024
[39]

Robust multiview point cloud registration with reliable pose graph initialization and history reweighting

Haiping Wang, Y uan Liu, Zhen Dong, Y ulan Guo, Y u-Shen Liu, Wenping Wang, et al.. Robust multiview point cloud registration with reliable pose graph initialization and history reweighting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 9506–9515. 2023

work page 2023
[40]

Clnerf: Continual learning meets nerf

Zhipeng Cai, Matthias Müller. Clnerf: Continual learning meets nerf. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp. 23185–23194. 2023. 13

work page 2023

[1] [1]

Change detection of urban objects using 3D point clouds: A review

Uwe Stilla, Y usheng Xu. Change detection of urban objects using 3D point clouds: A review. ISPRS Journal of Photogrammetry and Remote Sensing , 197, pp. 228–255. 2023

work page 2023

[2] [2]

Change detection in urban point clouds: An experimental comparison with simulated 3d datasets

Iris de Gélis, Sébastien Lefèvre, Thomas Corpetti. Change detection in urban point clouds: An experimental comparison with simulated 3d datasets. Remote Sensing, 13, (13), pp. 2629. 2021

work page 2021

[3] [3]

Point cloud registration and change detection in urban environ- ment using an onboard Lidar sensor and MLS reference data

Örkény Zováthi, Balázs Nagy, Csaba Benedek. Point cloud registration and change detection in urban environ- ment using an onboard Lidar sensor and MLS reference data. International Journal of Applied Earth Observation and Geoinformation, 110, pp. 102767. 2022

work page 2022

[4] [4]

Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man- made disasters

Zhuo Zheng, Y anfei Zhong, Junjue Wang, Ailong Ma, Liangpei Zhang. Building damage assessment for rapid disaster response with a deep object-based semantic change detection framework: From natural disasters to man- made disasters. Remote Sensing of Environment, 265, pp. 112636. 2021

work page 2021

[5] [5]

Integrating machine learning and remote sensing in disaster management: A decadal review of post-disaster building damage assessment

Sultan Al Shaﬁan, Da Hu. Integrating machine learning and remote sensing in disaster management: A decadal review of post-disaster building damage assessment. Buildings, 14, (8), pp. 2344. 2024

work page 2024

[6] [6]

Rapid automatic detection of collapsed buildings with single period LiDAR data after an earthquake

Ömer Canözü, Hayrettin Acar. Rapid automatic detection of collapsed buildings with single period LiDAR data after an earthquake. Earth Science Informatics, 18, (1), pp. 151. 2025

work page 2025

[7] [7]

SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection

Chun-Jung Lin, Tat-Jun Chin, Sourav Garg, Feras Dayoub. SceneEdited: A City-Scale Benchmark for 3D HD Map Updating via Image-Guided Change Detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 6330–6339. 2026

work page 2026

[8] [8]

Urban 3D Change Detection Using LiDAR Sensor for HD Map Maintenance and Smart Mobility

Hezam Albagami, Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Zainy M Malakan, Abdullah M Alqamdi, et al.. Urban 3D Change Detection Using LiDAR Sensor for HD Map Maintenance and Smart Mobility. arXiv preprint arXiv:2510.21112. 2025

work page arXiv 2025

[9] [9]

A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images

Chenxiao Zhang, Peng Y ue, Deodato Tapete, Liangcun Jiang, Boyi Shangguan, Li Huang, et al.. A deeply supervised image fusion network for change detection in high resolution bi-temporal remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing , 166, pp. 183–200. 2020

work page 2020

[10] [10]

A spatial-temporal attention-based method and a new dataset for remote sensing image change detection

Hao Chen, Zhenwei Shi. A spatial-temporal attention-based method and a new dataset for remote sensing image change detection. Remote sensing, 12, (10), pp. 1662. 2020

work page 2020

[11] [11]

Remote sensing image change detection with transformers

Hao Chen, Zipeng Qi, Zhenwei Shi. Remote sensing image change detection with transformers. IEEE Transac- tions on Geoscience and Remote Sensing , 60, pp. 1–14. 2021

work page 2021

[12] [12]

Vggt: Vi- sual geometry grounded transformer

Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea V edaldi, Christian Rupprecht, David Novotny. Vggt: Vi- sual geometry grounded transformer. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 5294–5306. 2025

work page 2025

[13] [13]

Dust3r: Geometric 3d vision made easy

Shuzhe Wang, Vincent Leroy, Y ohann Cabon, Boris Chidlovskii, Jerome Revaud. Dust3r: Geometric 3d vision made easy. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 20697– 20709. 2024. 11 VGGT-CD A P REPRINT

work page 2024

[14] [14]

Grounding image matching in 3d with mast3r

Vincent Leroy, Y ohann Cabon, Jérôme Revaud. Grounding image matching in 3d with mast3r. In European conference on computer vision , pp. 71–91. 2024

work page 2024

[15] [15]

π3: Scalable Permutation- Equivariant Visual Geometry Learning

Yifan Wang, Jianjun Zhou, Haoyi Zhu, Wenzheng Chang, Y ang Zhou, Zizun Li, et al.. π3: Scalable Permutation- Equivariant Visual Geometry Learning. arXiv e-prints, pp. arXiv–2507. 2025

work page 2025

[16] [16]

Streaming 4D Visual Geometry Transformer

Dong Zhuo, Wenzhao Zheng, Jiahe Guo, Y uqi Wu, Jie Zhou, Jiwen Lu. Streaming 4d visual geometry trans- former. arXiv preprint arXiv:2507.11539. 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[17] [17]

InfiniteVGGT: Visual geometry grounded transformer for endless streams

Shuai Y uan, Y antai Y ang, Xiaotian Y ang, Xupeng Zhang, Zhonghao Zhao, Lingming Zhang, et al.. InﬁniteVGGT: Visual Geometry Grounded Transformer for Endless Streams. arXiv preprint arXiv:2601.02281. 2026

work page arXiv 2026

[18] [18]

Method for registration of 3-D shapes

Paul J Besl, Neil D McKay. Method for registration of 3-D shapes. In Sensor fusion IV: control paradigms and data structures, 1611, pp. 586–606. 1992

work page 1992

[19] [19]

Least-squares estimation of transformation parameters between two point patterns

Shinji Umeyama. Least-squares estimation of transformation parameters between two point patterns. IEEE Trans- actions on pattern analysis and machine intelligence , 13, (4), pp. 376–380. 2002

work page 2002

[20] [20]

A critical synthesis of remotely sensed optical image change detection techniques

Andrew P Tewkesbury, Alexis J Comber, Nicholas J Tate, Alistair Lamb, Peter F Fisher. A critical synthesis of remotely sensed optical image change detection techniques. Remote Sensing of Environment , 160, pp. 1–14. 2015

work page 2015

[21] [21]

Review article digital change detection techniques using remotely-sensed data

Ashbindu Singh. Review article digital change detection techniques using remotely-sensed data. International journal of remote sensing , 10, (6), pp. 989–1003. 1989

work page 1989

[22] [22]

Airborne laser scanningan introduction and overview

Aloysius Wehr, Uwe Lohr. Airborne laser scanningan introduction and overview. ISPRS Journal of photogram- metry and remote sensing , 54, (2-3), pp. 68–82. 1999

work page 1999

[23] [23]

Photo tourism: exploring photo collections in 3D

Noah Snavely, Steven M Seitz, Richard Szeliski. Photo tourism: exploring photo collections in 3D. In ACM siggraph 2006 papers, pp. 835–846. 2006

work page 2006

[24] [24]

PGN3DCD: Prior-Knowledge-Guided Network for Urban 3-D Point Cloud Change Detection

Wenxiao Zhan, Ruozhen Cheng, Jing Chen. PGN3DCD: Prior-Knowledge-Guided Network for Urban 3-D Point Cloud Change Detection. IEEE Transactions on Geoscience and Remote Sensing , 62, pp. 1–15. 2024

work page 2024

[25] [25]

Living scenes: Multi-object relocalization and reconstruction in changing 3d environments

Liyuan Zhu, Shengyu Huang, Konrad Schindler, Iro Armeni. Living scenes: Multi-object relocalization and reconstruction in changing 3d environments. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 28014–28024. 2024

work page 2024

[26] [26]

Mvsnet: Depth inference for unstructured multi-view stereo

Y ao Y ao, Zixin Luo, Shiwei Li, Tian Fang, Long Quan. Mvsnet: Depth inference for unstructured multi-view stereo. In Proceedings of the European conference on computer vision (ECCV) , pp. 767–783. 2018

work page 2018

[27] [27]

Cascade cost volume for high- resolution multi-view stereo and stereo matching

Xiaodong Gu, Zhiwen Fan, Siyu Zhu, Zuozhuo Dai, Feitong Tan, Ping Tan. Cascade cost volume for high- resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 2495–2504. 2020

work page 2020

[28] [28]

Visual Consistency Enhancement for Multi-view Stereo Reconstruc- tion in Remote Sensing

Wei Zhang, Qiang Li, Y uan Y uan, Qi Wang. Visual Consistency Enhancement for Multi-view Stereo Reconstruc- tion in Remote Sensing. IEEE Transactions on Geoscience and Remote Sensing . 2024

work page 2024

[29] [29]

Semantic-Guided Multiview Stereo Reconstruction for Aerial Image

Wei Zhang, Zhigang Y ang, Qiang Li, Qi Wang. Semantic-Guided Multiview Stereo Reconstruction for Aerial Image. IEEE Transactions on Geoscience and Remote Sensing , 63, pp. 1-11. 2025

work page 2025

[30] [30]

Reﬁned Cascade Cost V olume for Multiview Remote Sensing Image Recon- struction

Wei Zhang, Qiang Li, Qi Wang. Reﬁned Cascade Cost V olume for Multiview Remote Sensing Image Recon- struction. IEEE Transactions on Geoscience and Remote Sensing , 63, pp. 1-11. 2025

work page 2025

[31] [31]

SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing

Y ong-Qiang Mao, Hanbo Bi, Liangyu Xu, Kaiqiang Chen, Zhirui Wang, Xian Sun, et al.. SDL-MVS: View Space and Depth Deformable Learning Paradigm for Multi-View Stereo Reconstruction in Remote Sensing. IEEE Transactions on Geoscience and Remote Sensing . 2024

work page 2024

[32] [32]

Edge aware depth infer- ence for large-scale aerial building multi-view stereo

Song Zhang, ZhiWei Wei, WenJia Xu, LiLi Zhang, Y ang Wang, JinMing Zhang, et al.. Edge aware depth infer- ence for large-scale aerial building multi-view stereo. ISPRS Journal of Photogrammetry and Remote Sensing , 207, pp. 27–42. 2024

work page 2024

[33] [33]

A hierarchical deformable deep neural network and an aerial image benchmark dataset for surface multiview stereo reconstruction

Jiayi Li, Xin Huang, Y ujin Feng, Zhen Ji, Shulei Zhang, Dawei Wen. A hierarchical deformable deep neural network and an aerial image benchmark dataset for surface multiview stereo reconstruction. IEEE Transactions on Geoscience and Remote Sensing , 61, pp. 1–12. 2023

work page 2023

[34] [34]

A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset

Jin Liu, Shunping Ji. A novel recurrent encoder-decoder structure for large-scale multi-view stereo reconstruction from an open aerial dataset. In Proc. IEEE Conf. Comput. Vis. Pattern Recognit., pp. 6050–6059. 2020

work page 2020

[35] [35]

Rethinking depth estimation for multi- view stereo: A uniﬁed representation

Rui Peng, Rongjie Wang, Zhenyu Wang, Y awen Lai, Ronggang Wang. Rethinking depth estimation for multi- view stereo: A uniﬁed representation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 8645–8654. 2022. 12 VGGT-CD A P REPRINT

work page 2022

[36] [36]

Fast global registration

Qian-Yi Zhou, Jaesik Park, Vladlen Koltun. Fast global registration. In European conference on computer vision, pp. 766–782. 2016

work page 2016

[37] [37]

Geotransformer: Fast and robust point cloud registration with geometric transformer

Zheng Qin, Hao Y u, Changjian Wang, Y ulan Guo, Y uxing Peng, Slobodan Ilic, et al.. Geotransformer: Fast and robust point cloud registration with geometric transformer. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, (8), pp. 9806–9821. 2023

work page 2023

[38] [38]

Dynamic cues-assisted transformer for robust point cloud regis- tration

Hong Chen, Pei Y an, Sihe Xiang, Yihua Tan. Dynamic cues-assisted transformer for robust point cloud regis- tration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pp. 21698– 21707. 2024

work page 2024

[39] [39]

Robust multiview point cloud registration with reliable pose graph initialization and history reweighting

Haiping Wang, Y uan Liu, Zhen Dong, Y ulan Guo, Y u-Shen Liu, Wenping Wang, et al.. Robust multiview point cloud registration with reliable pose graph initialization and history reweighting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pp. 9506–9515. 2023

work page 2023

[40] [40]

Clnerf: Continual learning meets nerf

Zhipeng Cai, Matthias Müller. Clnerf: Continual learning meets nerf. In Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision, pp. 23185–23194. 2023. 13

work page 2023