pith. sign in

arxiv: 2604.22202 · v1 · submitted 2026-04-24 · 💻 cs.CV

ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild

Pith reviewed 2026-05-08 12:30 UTC · model grok-4.3

classification 💻 cs.CV
keywords 3D symmetry detectionarchitectural landmarksreflectional symmetrysingle-view detectionSfM annotation pipelinesigned distance mapsin-the-wild imagesscene geometry
0
0 comments X

The pith

A single photo of a building can now yield a full 3D reflection symmetry plane through a detector trained on automatically labeled real-world data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to create the first workable system for finding reflection symmetries that are grounded in actual 3D space when given only one ordinary photograph of an architectural scene. Earlier learning methods could not do this because they were trained almost entirely on centered objects or synthetic scenes and could not resolve the position of a symmetry plane amid monocular scale ambiguity. The authors solve the data problem with an automatic pipeline that harvests 3D symmetry labels from structure-from-motion reconstructions by matching features across multiple views, producing the ArchSym collection. They then train a detector that outputs signed distance maps anchored to an estimated scene geometry, so the symmetry plane is recovered in metric 3D rather than as an orientation alone. A reader should care because this removes a long-standing barrier to using symmetry as a reliable geometric prior on everyday photographs.

Core claim

The central discovery is that a scalable annotation pipeline based on cross-view matching in SfM reconstructions can produce reliable 3D symmetry labels at scale, and that a detector trained on those labels can localize reflectional symmetry planes in full 3D by regressing signed distance maps relative to a predicted scene geometry, thereby overcoming the orientation-only limitation of prior single-image methods and generalizing to in-the-wild architectural images.

What carries the argument

The single-view symmetry detector that parameterizes each symmetry as a signed distance map defined relative to a predicted scene geometry.

If this is right

  • Symmetry can now serve as a 3D prior for single-image reconstruction and editing tasks on architectural scenes.
  • The detector supplies both orientation and position of the symmetry plane, resolving the scale ambiguity that limited earlier methods.
  • A new benchmark of real-world architectural images becomes available for evaluating 3D symmetry detectors.
  • The same automatic labeling approach can in principle be reused to create training data for other 3D geometric properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The signed-distance representation could be directly fed into downstream geometric algorithms such as plane fitting or symmetry-aware meshing without additional conversion steps.
  • The dataset curation technique might be extended to label other repeating structures such as facades or window grids once the core matching pipeline is in place.

Load-bearing premise

Cross-view image matching on SfM reconstructions can produce accurate 3D symmetry annotations without significant geometric errors or labeling mistakes in real scenes.

What would settle it

Manual inspection or additional LiDAR capture revealing that a large fraction of the automatically generated 3D symmetry planes deviate by more than a few degrees or meters from ground truth would undermine the data pipeline and the detector trained on it.

Figures

Figures reproduced from arXiv: 2604.22202 by Hanyu Chen, Noah Snavely, Ruojin Cai, Steve Marschner.

Figure 1
Figure 1. Figure 1: Our method robustly detects 3D-grounded symmetries in challenging, in-the-wild images. From a single RGB image (left in each pair), our model recovers dominant 3D symmetry planes (right) even when they are partially occluded or not directly visible. To train our model, we introduce a novel pipeline to automatically curate ArchSym, a large-scale dataset of landmark symmetries. The results above are on image… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our automated pipeline for extracting symmetry annotations. We visualize within-view (left) and cross-view (right) matching on image pairs sampled from an SfM reconstruction. For each pair, we horizontally flip one image, find dense matches with the other image via MASt3R [7], unproject matched pixels to 3D points using depth maps, and fit a plane to the resulting point pairs. The final symmetr… view at source ↗
Figure 3
Figure 3. Figure 3: Visualization of automated symmetry plane annotations. We run LANGEVIN [13] and our symmetry extraction pipeline on six scenes from MegaScenes. Extracted planes are visualized with a dense point cloud from COLMAP [33, 34]. For LANGEVIN, the dense point cloud from COLMAP is used as input. For OURS-ANNOTATION, sampled pairs of input images and depth maps are used as input. We observe that LANGEVIN, as a pure… view at source ↗
Figure 4
Figure 4. Figure 4: Statistics of the ArchSym dataset, showing the distribution of the number of images available (left) and the number of global symmetries annotated (right) in each scene. and fails to identify symmetries where points to one side of the symmetry plane are largely missing (e.g., Isa Khan’s Tomb, Frauenkirche). It often prioritizes the coarse shape of the point cloud over the underlying architectural semantics… view at source ↗
Figure 5
Figure 5. Figure 5: Overview of our single-view symmetry detector architecture. A frozen VGGT [42] backbone first extracts features from a single input image. The features are processed by a transformer decoder with learnable instance queries to identify symmetries. A lightweight MLP generates conditioning parameters from the resulting instance features. Then, a DPT-style [29] prediction head fuses multi-layer features to gen… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative comparison of single-view symmetry detection results. Input images are sampled from eight different test scenes. Since REFLECT3D [19] does not predict plane offsets, for visualization purposes, we use the point closest to the center of the landmark on the corresponding ground truth plane as an anchor point for REFLECT3D’s predicted normals. Point clouds shown are predicted by VGGT [42]. We obse… view at source ↗
Figure 7
Figure 7. Figure 7: Single-view completion using detected symmetries. We view at source ↗
read the original abstract

Symmetry detection is a fundamental problem in computer vision, and symmetries serve as powerful priors for downstream tasks. However, existing learning-based methods for detecting 3D symmetries from single images have been almost exclusively trained and evaluated on object-centric or synthetic datasets, and thus fail to generalize to real-world scenes. Furthermore, due to the inherent scale ambiguity of monocular inputs, which makes localizing the 3D plane an ill-posed problem, many existing works only predict the plane's orientation. In this paper, we address these limitations by presenting the first framework for detecting 3D-grounded reflectional symmetries from single, in-the-wild RGB images, focusing on architectural landmarks. We introduce two key innovations: (1) a scalable data annotation pipeline to automatically curate a large-scale dataset of architectural symmetries, ArchSym, from SfM reconstructions by leveraging cross-view image matching; and building on the dataset, (2) a single-view symmetry detector that accurately localizes symmetries in 3D by parameterizing them as signed distance maps defined relative to predicted scene geometry. We validate our symmetry annotation pipeline against geometry-based alternatives and demonstrate that our symmetry detector significantly outperforms state-of-the-art baselines on our new benchmark.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ArchSym, the first framework for 3D-grounded reflectional symmetry detection from single in-the-wild RGB images of architectural scenes. It contributes (1) a scalable automatic annotation pipeline that leverages SfM reconstructions and cross-view image matching to curate a large dataset of 3D symmetries without manual labeling, and (2) a monocular detector that outputs signed-distance symmetry maps defined relative to predicted scene geometry, thereby resolving scale ambiguity. The annotation pipeline is validated against geometry-based alternatives, and the detector is shown to outperform state-of-the-art baselines on the new ArchSym benchmark.

Significance. If the automatic 3D labels prove reliable, the work meaningfully extends symmetry detection beyond object-centric and synthetic regimes to real architectural scenes, where symmetries are both prevalent and useful for downstream tasks such as 3D reconstruction and facade parsing. The signed-distance-map parameterization is a technically sound way to make the 3D plane localization well-posed from monocular input. The SfM-based annotation strategy is a practical contribution for scalable dataset creation. These strengths would be strengthened by explicit quantification of label accuracy in the presence of repeated architectural structures.

major comments (2)
  1. [§3] §3 (Annotation Pipeline): The central claim that the detector achieves accurate 3D-grounded detection rests on the quality of the automatically generated 3D symmetry labels. Architectural scenes frequently contain repeated facades and symmetric elements that induce SfM correspondence errors, scale drift, and erroneous plane hypotheses. While the manuscript states that the pipeline is validated against geometry-based alternatives, no quantitative metrics (e.g., plane-parameter error distributions, agreement with manual annotations on a held-out subset, or failure-case analysis) are reported in the validation section. Without these, it is impossible to determine whether label noise undermines the downstream detector training and the reported benchmark gains.
  2. [§5] §5 (Experiments): The claim of significant outperformance over baselines is load-bearing for the paper's contribution. The manuscript should provide ablations isolating the effect of the signed-distance-map formulation versus simpler orientation-only predictions, as well as an analysis of how annotation noise from the SfM pipeline propagates to detector performance. Current results appear to lack these controls, making it difficult to attribute improvements specifically to the proposed 3D-grounded representation.
minor comments (2)
  1. [§4] Figure captions and the method section would benefit from an explicit diagram showing how the predicted signed-distance map is converted back to a 3D plane equation and how this resolves scale ambiguity.
  2. [Abstract] The abstract states that the detector 'significantly outperforms' baselines; adding the key quantitative deltas (e.g., mIoU or plane-error reductions) would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the two major comments below and will revise the paper accordingly to strengthen the validation of the annotation pipeline and the experimental analysis.

read point-by-point responses
  1. Referee: [§3] §3 (Annotation Pipeline): The central claim that the detector achieves accurate 3D-grounded detection rests on the quality of the automatically generated 3D symmetry labels. Architectural scenes frequently contain repeated facades and symmetric elements that induce SfM correspondence errors, scale drift, and erroneous plane hypotheses. While the manuscript states that the pipeline is validated against geometry-based alternatives, no quantitative metrics (e.g., plane-parameter error distributions, agreement with manual annotations on a held-out subset, or failure-case analysis) are reported in the validation section. Without these, it is impossible to determine whether label noise undermines the downstream detector training and the reported benchmark gains.

    Authors: We agree that explicit quantification of label accuracy is important, particularly given the challenges of repeated structures in architectural scenes. Our current validation compares the SfM-based pipeline to geometry-based alternatives, but we acknowledge this is insufficiently quantitative. In the revision we will add: (1) plane-parameter error distributions on a held-out subset of 200 images for which we obtain manual plane annotations, (2) agreement metrics (e.g., angular and distance errors) between automatic and manual labels, and (3) a targeted failure-case analysis on scenes with repeated facades. These additions will allow readers to assess label reliability directly. revision: yes

  2. Referee: [§5] §5 (Experiments): The claim of significant outperformance over baselines is load-bearing for the paper's contribution. The manuscript should provide ablations isolating the effect of the signed-distance-map formulation versus simpler orientation-only predictions, as well as an analysis of how annotation noise from the SfM pipeline propagates to detector performance. Current results appear to lack these controls, making it difficult to attribute improvements specifically to the proposed 3D-grounded representation.

    Authors: We concur that isolating the contribution of the signed-distance-map representation and quantifying noise sensitivity would strengthen the experimental claims. In the revised manuscript we will add: (1) an ablation comparing the full signed-distance-map model against an orientation-only baseline (normal vector prediction without distance), and (2) a controlled noise-injection study that perturbs the training labels with increasing levels of plane-parameter noise and reports the resulting drop in detector metrics. These controls will clarify the benefit of the 3D-grounded formulation and the robustness to annotation noise. revision: yes

Circularity Check

0 steps flagged

No significant circularity; dataset creation and model training are independent of fitted predictions

full rationale

The paper's derivation chain begins with an external SfM-based annotation pipeline that generates the ArchSym dataset via cross-view matching on reconstructions; this process is not derived from or equivalent to the single-view detector's outputs. The detector is then trained to predict signed-distance symmetry maps relative to scene geometry estimated from the input image. No equations or steps reduce a claimed prediction to a fitted parameter by construction, and no self-citations are invoked as load-bearing uniqueness theorems or ansatzes. Validation against geometry-based alternatives is presented as an independent check. The approach remains self-contained against external benchmarks with no reduction of the central claim to its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review limits visibility into specific parameters or assumptions; the method appears to rest on standard SfM accuracy and the validity of signed distance map parameterization for symmetry localization.

pith-pipeline@v0.9.0 · 5523 in / 1026 out tokens · 45449 ms · 2026-05-08T12:30:52.828513+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages

  1. [1]

    Mikhail J. Atallah. On symmetry detection.IEEE Transactions on Computers, 100(7):663–666, 1985. 1

  2. [2]

    Doppelgangers: Learning to disambiguate images of similar structures

    Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch- Elor, Bharath Hariharan, and Noah Snavely. Doppelgangers: Learning to disambiguate images of similar structures. In ICCV, 2023. 2, 3

  3. [3]

    End-to- end object detection with transformers

    Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InECCV, 2020. 6

  4. [4]

    Shapenet: An information-rich 3d model repository.arXiv preprint, 2015

    Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint, 2015. 2

  5. [5]

    Finding mirror symmetry via registration and optimal symmetric pairwise assignment of curves: Algorithm and results

    Marcelo Cicconet, David GC Hildebrand, and Hunter Elliott. Finding mirror symmetry via registration and optimal symmetric pairwise assignment of curves: Algorithm and results. InICCV Workshops, pages 1759–1763, 2017. 2

  6. [6]

    Objaverse: A universe of annotated 3d objects

    Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InCVPR, 2023. 2

  7. [7]

    Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion.arXiv preprint, 2024

    Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion.arXiv preprint, 2024. 3, 11

  8. [8]

    Wavelet-based reflection symmetry detection via textural and color histograms

    Mohamed Elawady, Christophe Ducottet, Olivier Alata, C´ecile Barat, and Philippe Colantoni. Wavelet-based reflection symmetry detection via textural and color histograms. In ICCV Workshops, pages 1725–1733, 2017. 2

  9. [9]

    A density-based algorithm for discovering clusters in large spatial databases with noise

    Martin Ester, Hans-Peter Kriegel, J¨org Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. InProc. KDD, page 226–231. AAAI Press, 1996. 3, 4

  10. [10]

    Beyond planar symmetry: Modeling human perception of reflection and rotation symmetries in the wild

    Christopher Funk and Yanxi Liu. Beyond planar symmetry: Modeling human perception of reflection and rotation symmetries in the wild. InICCV, 2017. 2

  11. [11]

    2017 iccv challenge: Detecting symmetry in the wild

    Christopher Funk, Seungkyu Lee, Martin R Oswald, Stavros Tsogkas, Wei Shen, Andrea Cohen, Sven Dickinson, and Yanxi Liu. 2017 iccv challenge: Detecting symmetry in the wild. InICCV Workshops, 2017. 2

  12. [12]

    Prs-net: Planar reflective symmetry detection net for 3d models.IEEE TVCG, 27(6): 3007–3018, 2020

    Lin Gao, Ling-Xiao Zhang, Hsien-Yu Meng, Yi-Hui Ren, Yu-Kun Lai, and Leif Kobbelt. Prs-net: Planar reflective symmetry detection net for 3d models.IEEE TVCG, 27(6): 3007–3018, 2020. 2

  13. [13]

    Robust symmetry detection via riemannian langevin dynamics

    Jihyeon Je, Jiayi Liu, Guandao Yang, Boyang Deng, Shengqu Cai, Gordon Wetzstein, Or Litany, and Leonidas Guibas. Robust symmetry detection via riemannian langevin dynamics. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11,

  14. [14]

    Detecting symmetry in grey level images: The global optimization approach.International Journal of Computer Vision, 29(1):29–45, 1998

    Nahum Kiryati and Yossi Gofman. Detecting symmetry in grey level images: The global optimization approach.International Journal of Computer Vision, 29(1):29–45, 1998. 2

  15. [15]

    Dense 3d reconstruction of symmetric scenes from a single image

    Kevin K¨oser, Christopher Zach, and Marc Pollefeys. Dense 3d reconstruction of symmetric scenes from a single image. InJoint Pattern Recognition Symposium, pages 266–275. Springer, 2011. 2, 3

  16. [16]

    The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97,

    Harold W Kuhn. The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97,

  17. [17]

    Grounding image matching in 3d with mast3r

    Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Grounding image matching in 3d with mast3r. InECCV, 2024. 3

  18. [18]

    E3sym: Leveraging e (3) invariance for unsupervised 3d planar reflective symmetry detection

    Ren-Wu Li, Ling-Xiao Zhang, Chunpeng Li, Yu-Kun Lai, and Lin Gao. E3sym: Leveraging e (3) invariance for unsupervised 3d planar reflective symmetry detection. InICCV, 2023. 2

  19. [19]

    Sym- metry strikes back: From single-image symmetry detection to 3d generation

    Xiang Li, Zixuan Huang, Anh Thai, and James M Rehg. Sym- metry strikes back: From single-image symmetry detection to 3d generation. InCVPR, 2025. 1, 2, 5, 7, 8, 11, 12, 14

  20. [20]

    Refinenet: Multi-path refinement networks for high-resolution semantic segmentation

    Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. InCVPR, 2017. 6

  21. [21]

    Nerd++: Improved 3d-mirror symmetry learning from a single image.arXiv preprint, 2021

    Yancong Lin, Silvia-Laura Pintea, and Jan van Gemert. Nerd++: Improved 3d-mirror symmetry learning from a single image.arXiv preprint, 2021. 2

  22. [22]

    Detecting symmetry and symmetric constellations of features

    Gareth Loy and Jan-Olof Eklundh. Detecting symmetry and symmetric constellations of features. InECCV, 2006. 2

  23. [23]

    Symmetry and uncertainty-aware object slam for 6dof object pose estimation

    Nathaniel Merrill, Yuliang Guo, Xingxing Zuo, Xinyu Huang, Stefan Leutenegger, Xi Peng, Liu Ren, and Guoquan Huang. Symmetry and uncertainty-aware object slam for 6dof object pose estimation. InCVPR, 2022. 1

  24. [24]

    Srinivasan, Matthew Tancik, Jonathan T

    Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InECCV, 2020. 13

  25. [25]

    Symmetry in 3d geometry: Extraction and applications

    Niloy J Mitra, Mark Pauly, Michael Wand, and Duygu Ceylan. Symmetry in 3d geometry: Extraction and applications. In Comput. Graph. Forum, pages 1–23. Wiley Online Library,

  26. [26]

    Symmmap: Estimation of the 2-d reflection symmetry map and its applications

    Rajendra Nagar and Shanmuganathan Raman. Symmmap: Estimation of the 2-d reflection symmetry map and its applications. InICCV Workshops, pages 1715–1724, 2017. 2

  27. [27]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, pages 4195–4205, 2023. 6

  28. [28]

    Film: Visual reasoning with a general conditioning layer

    Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InAAAI, 2018. 6, 11, 12

  29. [29]

    Vision transformers for dense prediction

    Ren´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vision transformers for dense prediction. InICCV, pages 12179–12188, 2021. 6, 12

  30. [30]

    Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

    Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InProceedings of the IEEE/CVF international conference on computer vision, pages 10901–10911, 2021. 13

  31. [31]

    Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction

    Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InInternational Conference on Computer Vision, 2021. 13

  32. [32]

    Detecting 3-d mirror symmetry in a 2-d camera image for 3-d shape re- covery.Proceedings of the IEEE, 102(10):1588–1606, 2014

    Tadamasa Sawada, Yunfeng Li, and Zygmunt Pizlo. Detecting 3-d mirror symmetry in a 2-d camera image for 3-d shape re- covery.Proceedings of the IEEE, 102(10):1588–1606, 2014. 2 9

  33. [33]

    Structure- from-motion revisited

    Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. InCVPR, 2016. 2, 4

  34. [34]

    Pixelwise view selection for unstructured multi-view stereo

    Johannes Lutz Sch¨onberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for unstructured multi-view stereo. InEuropean Conference on Computer Vision (ECCV), 2016. 4, 13

  35. [35]

    Symmetrynet: Learning to predict reflectional and rotational symmetries of 3d shapes from single- view rgb-d images.ACM TOG, 39(6):1–14, 2020

    Yifei Shi, Junwen Huang, Hongjia Zhang, Xin Xu, Szymon Rusinkiewicz, and Kai Xu. Symmetrynet: Learning to predict reflectional and rotational symmetries of 3d shapes from single- view rgb-d images.ACM TOG, 39(6):1–14, 2020. 1, 2, 5, 7

  36. [36]

    Symmetrygrasp: Symmetry-aware antipodal grasp detection from single-view rgb-d images.RA-L, 7(4): 12235–12242, 2022

    Yifei Shi, Zixin Tang, Xiangting Cai, Hongjia Zhang, Dewen Hu, and Xin Xu. Symmetrygrasp: Symmetry-aware antipodal grasp detection from single-view rgb-d images.RA-L, 7(4): 12235–12242, 2022. 1

  37. [37]

    Learning to detect 3d symmetry from single-view rgb-d images with weak supervision.IEEE TPAMI, 45(4): 4882–4896, 2022

    Yifei Shi, Xin Xu, Junhua Xi, Xiaochang Hu, Dewen Hu, and Kai Xu. Learning to detect 3d symmetry from single-view rgb-d images with weak supervision.IEEE TPAMI, 45(4): 4882–4896, 2022. 2, 5

  38. [38]

    To aggregate or not to aggregate: Selective match kernels for image search

    Giorgos Tolias, Yannis Avrithis, and Herv ´e J ´egou. To aggregate or not to aggregate: Selective match kernels for image search. InICCV, pages 1401–1408, 2013. 3

  39. [39]

    Learning-based symmetry detection in natural images

    Stavros Tsogkas and Iasonas Kokkinos. Learning-based symmetry detection in natural images. InECCV, 2012. 2

  40. [40]

    Megascenes: Scene-level view synthesis at scale

    Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, and Noah Snavely. Megascenes: Scene-level view synthesis at scale. In ECCV, 2024. 3, 13

  41. [41]

    Attention is all you need.NeurIPS, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017. 5

  42. [42]

    Vggt: Visual geometry grounded transformer

    Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InCVPR, 2025. 5, 6, 7, 8, 11

  43. [43]

    Dust3r: Geometric 3d vision made easy

    Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, pages 20697–20709, 2024. 6

  44. [44]

    Optimal al- gorithms for symmetry detection in two and three dimensions

    Jan D Wolter, Tony C Woo, and Richard A V olz. Optimal al- gorithms for symmetry detection in two and three dimensions. The Visual Computer, 1(1):37–48, 1985. 1

  45. [45]

    Unsupervised learning of probably symmetric deformable 3d objects from images in the wild

    Shangzhe Wu, Christian Rupprecht, and Andrea Vedaldi. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. InCVPR, 2020. 1, 2

  46. [46]

    De-rendering the world’s revolutionary artefacts

    Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, and Angjoo Kanazawa. De-rendering the world’s revolutionary artefacts. InCVPR, 2021. 1

  47. [47]

    Doppelgangers++: Improved visual disam- biguation with geometric 3d features

    Yuanbo Xiangli, Ruojin Cai, Hanyu Chen, Jeffrey Byrne, and Noah Snavely. Doppelgangers++: Improved visual disam- biguation with geometric 3d features. InCVPR, 2025. 2, 3

  48. [48]

    Front2back: Single view 3d shape reconstruction via front to back prediction

    Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal, and Alla Sheffer. Front2back: Single view 3d shape reconstruction via front to back prediction. InCVPR,

  49. [49]

    Single depth-image 3d reflection symmetry and shape prediction

    Zhaoxuan Zhang, Bo Dong, Tong Li, Felix Heide, Pieter Peers, Baocai Yin, and Xin Yang. Single depth-image 3d reflection symmetry and shape prediction. InICCV, 2023. 2

  50. [50]

    Learning symmetry-aware geometry correspondences for 6d object pose estimation

    Heng Zhao, Shenxing Wei, Dahu Shi, Wenming Tan, Zheyang Li, Ye Ren, Xing Wei, Yi Yang, and Shiliang Pu. Learning symmetry-aware geometry correspondences for 6d object pose estimation. InICCV, 2023. 1

  51. [51]

    Nerd: Neural 3d reflection symmetry detector

    Yichao Zhou, Shichen Liu, and Yi Ma. Nerd: Neural 3d reflection symmetry detector. InCVPR, 2021. 1, 2, 5 10 ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild Supplementary Material A. Implementation details Our implementation builds upon the official MASt3R [7] and VGGT [42] codebases. A.1. Training details We use a base learning rate of...