ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild
Pith reviewed 2026-05-08 12:30 UTC · model grok-4.3
The pith
A single photo of a building can now yield a full 3D reflection symmetry plane through a detector trained on automatically labeled real-world data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central discovery is that a scalable annotation pipeline based on cross-view matching in SfM reconstructions can produce reliable 3D symmetry labels at scale, and that a detector trained on those labels can localize reflectional symmetry planes in full 3D by regressing signed distance maps relative to a predicted scene geometry, thereby overcoming the orientation-only limitation of prior single-image methods and generalizing to in-the-wild architectural images.
What carries the argument
The single-view symmetry detector that parameterizes each symmetry as a signed distance map defined relative to a predicted scene geometry.
If this is right
- Symmetry can now serve as a 3D prior for single-image reconstruction and editing tasks on architectural scenes.
- The detector supplies both orientation and position of the symmetry plane, resolving the scale ambiguity that limited earlier methods.
- A new benchmark of real-world architectural images becomes available for evaluating 3D symmetry detectors.
- The same automatic labeling approach can in principle be reused to create training data for other 3D geometric properties.
Where Pith is reading between the lines
- The signed-distance representation could be directly fed into downstream geometric algorithms such as plane fitting or symmetry-aware meshing without additional conversion steps.
- The dataset curation technique might be extended to label other repeating structures such as facades or window grids once the core matching pipeline is in place.
Load-bearing premise
Cross-view image matching on SfM reconstructions can produce accurate 3D symmetry annotations without significant geometric errors or labeling mistakes in real scenes.
What would settle it
Manual inspection or additional LiDAR capture revealing that a large fraction of the automatically generated 3D symmetry planes deviate by more than a few degrees or meters from ground truth would undermine the data pipeline and the detector trained on it.
Figures
read the original abstract
Symmetry detection is a fundamental problem in computer vision, and symmetries serve as powerful priors for downstream tasks. However, existing learning-based methods for detecting 3D symmetries from single images have been almost exclusively trained and evaluated on object-centric or synthetic datasets, and thus fail to generalize to real-world scenes. Furthermore, due to the inherent scale ambiguity of monocular inputs, which makes localizing the 3D plane an ill-posed problem, many existing works only predict the plane's orientation. In this paper, we address these limitations by presenting the first framework for detecting 3D-grounded reflectional symmetries from single, in-the-wild RGB images, focusing on architectural landmarks. We introduce two key innovations: (1) a scalable data annotation pipeline to automatically curate a large-scale dataset of architectural symmetries, ArchSym, from SfM reconstructions by leveraging cross-view image matching; and building on the dataset, (2) a single-view symmetry detector that accurately localizes symmetries in 3D by parameterizing them as signed distance maps defined relative to predicted scene geometry. We validate our symmetry annotation pipeline against geometry-based alternatives and demonstrate that our symmetry detector significantly outperforms state-of-the-art baselines on our new benchmark.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ArchSym, the first framework for 3D-grounded reflectional symmetry detection from single in-the-wild RGB images of architectural scenes. It contributes (1) a scalable automatic annotation pipeline that leverages SfM reconstructions and cross-view image matching to curate a large dataset of 3D symmetries without manual labeling, and (2) a monocular detector that outputs signed-distance symmetry maps defined relative to predicted scene geometry, thereby resolving scale ambiguity. The annotation pipeline is validated against geometry-based alternatives, and the detector is shown to outperform state-of-the-art baselines on the new ArchSym benchmark.
Significance. If the automatic 3D labels prove reliable, the work meaningfully extends symmetry detection beyond object-centric and synthetic regimes to real architectural scenes, where symmetries are both prevalent and useful for downstream tasks such as 3D reconstruction and facade parsing. The signed-distance-map parameterization is a technically sound way to make the 3D plane localization well-posed from monocular input. The SfM-based annotation strategy is a practical contribution for scalable dataset creation. These strengths would be strengthened by explicit quantification of label accuracy in the presence of repeated architectural structures.
major comments (2)
- [§3] §3 (Annotation Pipeline): The central claim that the detector achieves accurate 3D-grounded detection rests on the quality of the automatically generated 3D symmetry labels. Architectural scenes frequently contain repeated facades and symmetric elements that induce SfM correspondence errors, scale drift, and erroneous plane hypotheses. While the manuscript states that the pipeline is validated against geometry-based alternatives, no quantitative metrics (e.g., plane-parameter error distributions, agreement with manual annotations on a held-out subset, or failure-case analysis) are reported in the validation section. Without these, it is impossible to determine whether label noise undermines the downstream detector training and the reported benchmark gains.
- [§5] §5 (Experiments): The claim of significant outperformance over baselines is load-bearing for the paper's contribution. The manuscript should provide ablations isolating the effect of the signed-distance-map formulation versus simpler orientation-only predictions, as well as an analysis of how annotation noise from the SfM pipeline propagates to detector performance. Current results appear to lack these controls, making it difficult to attribute improvements specifically to the proposed 3D-grounded representation.
minor comments (2)
- [§4] Figure captions and the method section would benefit from an explicit diagram showing how the predicted signed-distance map is converted back to a 3D plane equation and how this resolves scale ambiguity.
- [Abstract] The abstract states that the detector 'significantly outperforms' baselines; adding the key quantitative deltas (e.g., mIoU or plane-error reductions) would improve readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on our manuscript. We address the two major comments below and will revise the paper accordingly to strengthen the validation of the annotation pipeline and the experimental analysis.
read point-by-point responses
-
Referee: [§3] §3 (Annotation Pipeline): The central claim that the detector achieves accurate 3D-grounded detection rests on the quality of the automatically generated 3D symmetry labels. Architectural scenes frequently contain repeated facades and symmetric elements that induce SfM correspondence errors, scale drift, and erroneous plane hypotheses. While the manuscript states that the pipeline is validated against geometry-based alternatives, no quantitative metrics (e.g., plane-parameter error distributions, agreement with manual annotations on a held-out subset, or failure-case analysis) are reported in the validation section. Without these, it is impossible to determine whether label noise undermines the downstream detector training and the reported benchmark gains.
Authors: We agree that explicit quantification of label accuracy is important, particularly given the challenges of repeated structures in architectural scenes. Our current validation compares the SfM-based pipeline to geometry-based alternatives, but we acknowledge this is insufficiently quantitative. In the revision we will add: (1) plane-parameter error distributions on a held-out subset of 200 images for which we obtain manual plane annotations, (2) agreement metrics (e.g., angular and distance errors) between automatic and manual labels, and (3) a targeted failure-case analysis on scenes with repeated facades. These additions will allow readers to assess label reliability directly. revision: yes
-
Referee: [§5] §5 (Experiments): The claim of significant outperformance over baselines is load-bearing for the paper's contribution. The manuscript should provide ablations isolating the effect of the signed-distance-map formulation versus simpler orientation-only predictions, as well as an analysis of how annotation noise from the SfM pipeline propagates to detector performance. Current results appear to lack these controls, making it difficult to attribute improvements specifically to the proposed 3D-grounded representation.
Authors: We concur that isolating the contribution of the signed-distance-map representation and quantifying noise sensitivity would strengthen the experimental claims. In the revised manuscript we will add: (1) an ablation comparing the full signed-distance-map model against an orientation-only baseline (normal vector prediction without distance), and (2) a controlled noise-injection study that perturbs the training labels with increasing levels of plane-parameter noise and reports the resulting drop in detector metrics. These controls will clarify the benefit of the 3D-grounded formulation and the robustness to annotation noise. revision: yes
Circularity Check
No significant circularity; dataset creation and model training are independent of fitted predictions
full rationale
The paper's derivation chain begins with an external SfM-based annotation pipeline that generates the ArchSym dataset via cross-view matching on reconstructions; this process is not derived from or equivalent to the single-view detector's outputs. The detector is then trained to predict signed-distance symmetry maps relative to scene geometry estimated from the input image. No equations or steps reduce a claimed prediction to a fitted parameter by construction, and no self-citations are invoked as load-bearing uniqueness theorems or ansatzes. Validation against geometry-based alternatives is presented as an independent check. The approach remains self-contained against external benchmarks with no reduction of the central claim to its own inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Mikhail J. Atallah. On symmetry detection.IEEE Transactions on Computers, 100(7):663–666, 1985. 1
work page 1985
-
[2]
Doppelgangers: Learning to disambiguate images of similar structures
Ruojin Cai, Joseph Tung, Qianqian Wang, Hadar Averbuch- Elor, Bharath Hariharan, and Noah Snavely. Doppelgangers: Learning to disambiguate images of similar structures. In ICCV, 2023. 2, 3
work page 2023
-
[3]
End-to- end object detection with transformers
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko. End-to- end object detection with transformers. InECCV, 2020. 6
work page 2020
-
[4]
Shapenet: An information-rich 3d model repository.arXiv preprint, 2015
Angel X Chang, Thomas Funkhouser, Leonidas Guibas, Pat Hanrahan, Qixing Huang, Zimo Li, Silvio Savarese, Manolis Savva, Shuran Song, Hao Su, et al. Shapenet: An information-rich 3d model repository.arXiv preprint, 2015. 2
work page 2015
-
[5]
Marcelo Cicconet, David GC Hildebrand, and Hunter Elliott. Finding mirror symmetry via registration and optimal symmetric pairwise assignment of curves: Algorithm and results. InICCV Workshops, pages 1759–1763, 2017. 2
work page 2017
-
[6]
Objaverse: A universe of annotated 3d objects
Matt Deitke, Dustin Schwenk, Jordi Salvador, Luca Weihs, Oscar Michel, Eli VanderBilt, Ludwig Schmidt, Kiana Ehsani, Aniruddha Kembhavi, and Ali Farhadi. Objaverse: A universe of annotated 3d objects. InCVPR, 2023. 2
work page 2023
-
[7]
Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion.arXiv preprint, 2024
Bardienus Duisterhof, Lojze Zust, Philippe Weinzaepfel, Vincent Leroy, Yohann Cabon, and Jerome Revaud. Mast3r-sfm: a fully-integrated solution for unconstrained structure-from-motion.arXiv preprint, 2024. 3, 11
work page 2024
-
[8]
Wavelet-based reflection symmetry detection via textural and color histograms
Mohamed Elawady, Christophe Ducottet, Olivier Alata, C´ecile Barat, and Philippe Colantoni. Wavelet-based reflection symmetry detection via textural and color histograms. In ICCV Workshops, pages 1725–1733, 2017. 2
work page 2017
-
[9]
A density-based algorithm for discovering clusters in large spatial databases with noise
Martin Ester, Hans-Peter Kriegel, J¨org Sander, and Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. InProc. KDD, page 226–231. AAAI Press, 1996. 3, 4
work page 1996
-
[10]
Beyond planar symmetry: Modeling human perception of reflection and rotation symmetries in the wild
Christopher Funk and Yanxi Liu. Beyond planar symmetry: Modeling human perception of reflection and rotation symmetries in the wild. InICCV, 2017. 2
work page 2017
-
[11]
2017 iccv challenge: Detecting symmetry in the wild
Christopher Funk, Seungkyu Lee, Martin R Oswald, Stavros Tsogkas, Wei Shen, Andrea Cohen, Sven Dickinson, and Yanxi Liu. 2017 iccv challenge: Detecting symmetry in the wild. InICCV Workshops, 2017. 2
work page 2017
-
[12]
Prs-net: Planar reflective symmetry detection net for 3d models.IEEE TVCG, 27(6): 3007–3018, 2020
Lin Gao, Ling-Xiao Zhang, Hsien-Yu Meng, Yi-Hui Ren, Yu-Kun Lai, and Leif Kobbelt. Prs-net: Planar reflective symmetry detection net for 3d models.IEEE TVCG, 27(6): 3007–3018, 2020. 2
work page 2020
-
[13]
Robust symmetry detection via riemannian langevin dynamics
Jihyeon Je, Jiayi Liu, Guandao Yang, Boyang Deng, Shengqu Cai, Gordon Wetzstein, Or Litany, and Leonidas Guibas. Robust symmetry detection via riemannian langevin dynamics. InSIGGRAPH Asia 2024 Conference Papers, pages 1–11,
work page 2024
-
[14]
Nahum Kiryati and Yossi Gofman. Detecting symmetry in grey level images: The global optimization approach.International Journal of Computer Vision, 29(1):29–45, 1998. 2
work page 1998
-
[15]
Dense 3d reconstruction of symmetric scenes from a single image
Kevin K¨oser, Christopher Zach, and Marc Pollefeys. Dense 3d reconstruction of symmetric scenes from a single image. InJoint Pattern Recognition Symposium, pages 266–275. Springer, 2011. 2, 3
work page 2011
-
[16]
The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97,
Harold W Kuhn. The hungarian method for the assignment problem.Naval research logistics quarterly, 2(1-2):83–97,
-
[17]
Grounding image matching in 3d with mast3r
Vincent Leroy, Yohann Cabon, and J´erˆome Revaud. Grounding image matching in 3d with mast3r. InECCV, 2024. 3
work page 2024
-
[18]
E3sym: Leveraging e (3) invariance for unsupervised 3d planar reflective symmetry detection
Ren-Wu Li, Ling-Xiao Zhang, Chunpeng Li, Yu-Kun Lai, and Lin Gao. E3sym: Leveraging e (3) invariance for unsupervised 3d planar reflective symmetry detection. InICCV, 2023. 2
work page 2023
-
[19]
Sym- metry strikes back: From single-image symmetry detection to 3d generation
Xiang Li, Zixuan Huang, Anh Thai, and James M Rehg. Sym- metry strikes back: From single-image symmetry detection to 3d generation. InCVPR, 2025. 1, 2, 5, 7, 8, 11, 12, 14
work page 2025
-
[20]
Refinenet: Multi-path refinement networks for high-resolution semantic segmentation
Guosheng Lin, Anton Milan, Chunhua Shen, and Ian Reid. Refinenet: Multi-path refinement networks for high-resolution semantic segmentation. InCVPR, 2017. 6
work page 2017
-
[21]
Nerd++: Improved 3d-mirror symmetry learning from a single image.arXiv preprint, 2021
Yancong Lin, Silvia-Laura Pintea, and Jan van Gemert. Nerd++: Improved 3d-mirror symmetry learning from a single image.arXiv preprint, 2021. 2
work page 2021
-
[22]
Detecting symmetry and symmetric constellations of features
Gareth Loy and Jan-Olof Eklundh. Detecting symmetry and symmetric constellations of features. InECCV, 2006. 2
work page 2006
-
[23]
Symmetry and uncertainty-aware object slam for 6dof object pose estimation
Nathaniel Merrill, Yuliang Guo, Xingxing Zuo, Xinyu Huang, Stefan Leutenegger, Xi Peng, Liu Ren, and Guoquan Huang. Symmetry and uncertainty-aware object slam for 6dof object pose estimation. InCVPR, 2022. 1
work page 2022
-
[24]
Srinivasan, Matthew Tancik, Jonathan T
Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, and Ren Ng. Nerf: Representing scenes as neural radiance fields for view synthesis. InECCV, 2020. 13
work page 2020
-
[25]
Symmetry in 3d geometry: Extraction and applications
Niloy J Mitra, Mark Pauly, Michael Wand, and Duygu Ceylan. Symmetry in 3d geometry: Extraction and applications. In Comput. Graph. Forum, pages 1–23. Wiley Online Library,
-
[26]
Symmmap: Estimation of the 2-d reflection symmetry map and its applications
Rajendra Nagar and Shanmuganathan Raman. Symmmap: Estimation of the 2-d reflection symmetry map and its applications. InICCV Workshops, pages 1715–1724, 2017. 2
work page 2017
-
[27]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InICCV, pages 4195–4205, 2023. 6
work page 2023
-
[28]
Film: Visual reasoning with a general conditioning layer
Ethan Perez, Florian Strub, Harm De Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. InAAAI, 2018. 6, 11, 12
work page 2018
-
[29]
Vision transformers for dense prediction
Ren´e Ranftl, Alexey Bochkovskiy, and Vladlen Koltun. Vision transformers for dense prediction. InICCV, pages 12179–12188, 2021. 6, 12
work page 2021
-
[30]
Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Com- mon objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InProceedings of the IEEE/CVF international conference on computer vision, pages 10901–10911, 2021. 13
work page 2021
-
[31]
Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction
Jeremy Reizenstein, Roman Shapovalov, Philipp Henzler, Luca Sbordone, Patrick Labatut, and David Novotny. Common objects in 3d: Large-scale learning and evaluation of real-life 3d category reconstruction. InInternational Conference on Computer Vision, 2021. 13
work page 2021
-
[32]
Tadamasa Sawada, Yunfeng Li, and Zygmunt Pizlo. Detecting 3-d mirror symmetry in a 2-d camera image for 3-d shape re- covery.Proceedings of the IEEE, 102(10):1588–1606, 2014. 2 9
work page 2014
-
[33]
Structure- from-motion revisited
Johannes L Schonberger and Jan-Michael Frahm. Structure- from-motion revisited. InCVPR, 2016. 2, 4
work page 2016
-
[34]
Pixelwise view selection for unstructured multi-view stereo
Johannes Lutz Sch¨onberger, Enliang Zheng, Marc Pollefeys, and Jan-Michael Frahm. Pixelwise view selection for unstructured multi-view stereo. InEuropean Conference on Computer Vision (ECCV), 2016. 4, 13
work page 2016
-
[35]
Yifei Shi, Junwen Huang, Hongjia Zhang, Xin Xu, Szymon Rusinkiewicz, and Kai Xu. Symmetrynet: Learning to predict reflectional and rotational symmetries of 3d shapes from single- view rgb-d images.ACM TOG, 39(6):1–14, 2020. 1, 2, 5, 7
work page 2020
-
[36]
Yifei Shi, Zixin Tang, Xiangting Cai, Hongjia Zhang, Dewen Hu, and Xin Xu. Symmetrygrasp: Symmetry-aware antipodal grasp detection from single-view rgb-d images.RA-L, 7(4): 12235–12242, 2022. 1
work page 2022
-
[37]
Yifei Shi, Xin Xu, Junhua Xi, Xiaochang Hu, Dewen Hu, and Kai Xu. Learning to detect 3d symmetry from single-view rgb-d images with weak supervision.IEEE TPAMI, 45(4): 4882–4896, 2022. 2, 5
work page 2022
-
[38]
To aggregate or not to aggregate: Selective match kernels for image search
Giorgos Tolias, Yannis Avrithis, and Herv ´e J ´egou. To aggregate or not to aggregate: Selective match kernels for image search. InICCV, pages 1401–1408, 2013. 3
work page 2013
-
[39]
Learning-based symmetry detection in natural images
Stavros Tsogkas and Iasonas Kokkinos. Learning-based symmetry detection in natural images. InECCV, 2012. 2
work page 2012
-
[40]
Megascenes: Scene-level view synthesis at scale
Joseph Tung, Gene Chou, Ruojin Cai, Guandao Yang, Kai Zhang, Gordon Wetzstein, Bharath Hariharan, and Noah Snavely. Megascenes: Scene-level view synthesis at scale. In ECCV, 2024. 3, 13
work page 2024
-
[41]
Attention is all you need.NeurIPS, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszko- reit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.NeurIPS, 30, 2017. 5
work page 2017
-
[42]
Vggt: Visual geometry grounded transformer
Jianyuan Wang, Minghao Chen, Nikita Karaev, Andrea Vedaldi, Christian Rupprecht, and David Novotny. Vggt: Visual geometry grounded transformer. InCVPR, 2025. 5, 6, 7, 8, 11
work page 2025
-
[43]
Dust3r: Geometric 3d vision made easy
Shuzhe Wang, Vincent Leroy, Yohann Cabon, Boris Chidlovskii, and Jerome Revaud. Dust3r: Geometric 3d vision made easy. InCVPR, pages 20697–20709, 2024. 6
work page 2024
-
[44]
Optimal al- gorithms for symmetry detection in two and three dimensions
Jan D Wolter, Tony C Woo, and Richard A V olz. Optimal al- gorithms for symmetry detection in two and three dimensions. The Visual Computer, 1(1):37–48, 1985. 1
work page 1985
-
[45]
Unsupervised learning of probably symmetric deformable 3d objects from images in the wild
Shangzhe Wu, Christian Rupprecht, and Andrea Vedaldi. Unsupervised learning of probably symmetric deformable 3d objects from images in the wild. InCVPR, 2020. 1, 2
work page 2020
-
[46]
De-rendering the world’s revolutionary artefacts
Shangzhe Wu, Ameesh Makadia, Jiajun Wu, Noah Snavely, Richard Tucker, and Angjoo Kanazawa. De-rendering the world’s revolutionary artefacts. InCVPR, 2021. 1
work page 2021
-
[47]
Doppelgangers++: Improved visual disam- biguation with geometric 3d features
Yuanbo Xiangli, Ruojin Cai, Hanyu Chen, Jeffrey Byrne, and Noah Snavely. Doppelgangers++: Improved visual disam- biguation with geometric 3d features. InCVPR, 2025. 2, 3
work page 2025
-
[48]
Front2back: Single view 3d shape reconstruction via front to back prediction
Yuan Yao, Nico Schertler, Enrique Rosales, Helge Rhodin, Leonid Sigal, and Alla Sheffer. Front2back: Single view 3d shape reconstruction via front to back prediction. InCVPR,
-
[49]
Single depth-image 3d reflection symmetry and shape prediction
Zhaoxuan Zhang, Bo Dong, Tong Li, Felix Heide, Pieter Peers, Baocai Yin, and Xin Yang. Single depth-image 3d reflection symmetry and shape prediction. InICCV, 2023. 2
work page 2023
-
[50]
Learning symmetry-aware geometry correspondences for 6d object pose estimation
Heng Zhao, Shenxing Wei, Dahu Shi, Wenming Tan, Zheyang Li, Ye Ren, Xing Wei, Yi Yang, and Shiliang Pu. Learning symmetry-aware geometry correspondences for 6d object pose estimation. InICCV, 2023. 1
work page 2023
-
[51]
Nerd: Neural 3d reflection symmetry detector
Yichao Zhou, Shichen Liu, and Yi Ma. Nerd: Neural 3d reflection symmetry detector. InCVPR, 2021. 1, 2, 5 10 ArchSym: Detecting 3D-Grounded Architectural Symmetries in the Wild Supplementary Material A. Implementation details Our implementation builds upon the official MASt3R [7] and VGGT [42] codebases. A.1. Training details We use a base learning rate of...
work page 2021
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.