SAGE3D: Soft-guided attention and graph excitation for 3D point cloud corner detection
Pith reviewed 2026-06-30 21:07 UTC · model grok-4.3
The pith
SAGE3D injects ground-truth corner labels as log-priors into attention during training and applies positive-only graph boosting at multiple scales to raise both precision and recall in 3D LiDAR corner detection.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that a hybrid transformer model for per-point corner detection in airborne LiDAR point clouds reaches higher precision when ground-truth corner labels are injected as a log-prior into attention logits at training time, and reaches higher recall when an excitatory graph network performs positive-only message passing at selected resolutions so that confident corners boost neighboring predictions.
What carries the argument
Soft-Guided Attention (log-prior from labels added to attention logits) paired with an Excitatory Graph Neural Network (positive-only message passing at strategic hierarchy levels) inside a Set Abstraction / Feature Propagation encoder-decoder.
If this is right
- Corner signals remain amplified rather than diluted when features are extracted at multiple resolutions.
- Precision gains come specifically from the label-derived prior in attention.
- Recall gains come specifically from learned boosting among confident points.
- The architecture keeps the two improvements separate so each can be tuned for its target metric.
Where Pith is reading between the lines
- The same training-time prior technique could be tested on other sparse detection tasks where labels exist only during training.
- Positive-only message passing might be compared against standard graph attention to isolate the effect of forbidding negative messages.
- The method could be evaluated on terrestrial or indoor LiDAR to check whether the airborne-specific hierarchy transfers.
- If the boosting step proves robust, it could be inserted into existing point-cloud detectors without retraining the full backbone.
Load-bearing premise
That the training-time use of ground-truth corner labels to guide attention will produce better results at test time when those labels are unavailable, and that positive-only reinforcement will not amplify false positives.
What would settle it
Training the same hierarchical backbone once with the soft-guided attention disabled and once with the excitatory graph disabled, then measuring whether precision or recall drops on a held-out airborne LiDAR test set.
read the original abstract
We present SAGE3D, a hybrid Transformer-based model for corner detection in airborne LiDAR point clouds. We propose a multi-stage solution built on a hierarchical encoder-decoder architecture that progressively downsamples point clouds through Set Abstraction layers and recovers per-point predictions via Feature Propagation. We introduce two innovations: Soft-Guided Attention, which injects ground-truth corner labels as a log-prior into attention logits during training to improve precision; then an Excitatory Graph Neural Network positioned at strategic resolutions in the hierarchy, employing positive-only message passing where high-confidence corners reinforce predictions through learned boosting, optimizing for recall. The hierarchical design enables multi-scale feature extraction while our guided attention and excitatory modules ensure corner signals are amplified rather than diluted across scales.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SAGE3D, a hybrid Transformer-based model for corner detection in airborne LiDAR point clouds. It builds on a hierarchical encoder-decoder using Set Abstraction layers for progressive downsampling and Feature Propagation for per-point recovery, and introduces two modules: Soft-Guided Attention (injecting ground-truth corner labels as a log-prior into attention logits during training) and an Excitatory Graph Neural Network (positive-only message passing at strategic resolutions to reinforce high-confidence corners).
Significance. If the claimed precision-recall improvements from the training-time log-prior and excitatory GNN are empirically verified, the work would offer a concrete mechanism for amplifying sparse corner signals in multi-scale point cloud hierarchies. The absence of any results, ablations, or test-time formulations currently leaves the practical impact undetermined.
major comments (3)
- [Abstract] Abstract: the central claim that Soft-Guided Attention and the Excitatory GNN 'ensure corner signals are amplified rather than diluted across scales' and optimize the precision-recall trade-off is asserted without any experimental results, ablation studies, or quantitative evidence. This is load-bearing for the contribution.
- [Abstract] Abstract: the Soft-Guided Attention module is defined only for training via ground-truth log-prior injection; the manuscript supplies neither the inference-time formulation of the attention logits nor any demonstration that the learned attention patterns transfer to label-free test data.
- [Abstract] Abstract: the Excitatory GNN is described as using positive-only message passing to optimize recall, yet no analysis or evidence is given on whether this mechanism amplifies false positives generated by the Set Abstraction / Feature Propagation stages.
minor comments (1)
- The phrase 'strategic resolutions' for GNN placement is used without specifying the exact layers or feature-map scales at which the modules are inserted.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive comments. We address each major comment below, agreeing where the manuscript requires clarification or additional content.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that Soft-Guided Attention and the Excitatory GNN 'ensure corner signals are amplified rather than diluted across scales' and optimize the precision-recall trade-off is asserted without any experimental results, ablation studies, or quantitative evidence. This is load-bearing for the contribution.
Authors: We agree that the abstract asserts these benefits without supporting quantitative evidence or ablations in the submitted manuscript. The current version prioritizes the architectural description. In revision we will add experimental results, ablation studies, and quantitative evidence, and we will revise the abstract language to align with the new empirical content. revision: yes
-
Referee: [Abstract] Abstract: the Soft-Guided Attention module is defined only for training via ground-truth log-prior injection; the manuscript supplies neither the inference-time formulation of the attention logits nor any demonstration that the learned attention patterns transfer to label-free test data.
Authors: This observation is correct. The Soft-Guided Attention is specified only for training. At inference the ground-truth log-prior term is omitted and standard attention logits are used. We will add an explicit inference-time formulation and any available analysis of learned pattern transfer in the revised manuscript. revision: yes
-
Referee: [Abstract] Abstract: the Excitatory GNN is described as using positive-only message passing to optimize recall, yet no analysis or evidence is given on whether this mechanism amplifies false positives generated by the Set Abstraction / Feature Propagation stages.
Authors: We acknowledge the absence of analysis on false-positive amplification. The excitatory mechanism is designed to reinforce high-confidence corners, but its effect on false positives requires explicit study. We will add an analysis of false-positive behavior and its impact on precision-recall in the revised manuscript. revision: yes
Circularity Check
No circularity detected; architectural proposal is self-contained
full rationale
The paper describes a hierarchical encoder-decoder with Set Abstraction and Feature Propagation, plus two training-time modules: Soft-Guided Attention (injecting GT corner labels as log-prior into attention logits) and an Excitatory GNN with positive-only message passing. These are presented as design innovations without any equations, derivations, or claims that reduce outputs to inputs by construction. No self-citations appear as load-bearing premises, no fitted parameters are renamed as predictions, and no uniqueness theorems or ansatzes are smuggled in. The central claims concern empirical improvements from the architecture and training procedure, which remain externally falsifiable and do not collapse to tautology.
Axiom & Free-Parameter Ledger
free parameters (2)
- log-prior strength for soft-guided attention
- boosting parameters in excitatory GNN
axioms (2)
- domain assumption Corner labels are available during training and can be used as priors without causing label leakage at inference.
- domain assumption High-confidence corner predictions can be used to reinforce other predictions via positive message passing without introducing systematic bias.
Reference graph
Works this paper leans on
-
[1]
SAGE3D: Soft-guided attention and graph excitation for 3D point cloud corner detection
INTRODUCTION The automatic reconstruction of 3D building models from airborne LiDAR point clouds is a fundamental task in ur- ban scene understanding, with applications spanning smart cities [1], digital twin generation, and autonomous driv- ing [2]. Accurate corner detection is the critical prerequisite for wireframe reconstruction, yet it faces unique c...
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[2]
METHODOLOGY Inspired by Point2Roof [9], our encoder-decoder architecture processes raw point clouds to detect wireframe corners. The encoder hierarchically abstracts features through four Set Ab- straction (SA) layers, while the decoder restores point-wise predictions via Feature Propagation (FP) layers with skip con- nections. Input point clouds contain ...
-
[3]
LOSS DESIGN Our training objective combines distance-weighted classifica- tion and regression losses:L=L dist +L offset. Soft Label Generation.We generate soft corner labels based on proximity to ground-truth vertices using exponential decay:y cls i = exp(−di/dthresh), whered i = minj ∥pi −v j∥ andd thresh = 0.05. Distance-Weighted Focal Loss.We use focal...
-
[4]
We apply DBSCAN [14] withϵ= 0.05and minPts= 1to group nearby predictions, then compute cluster centroids as final corners
POST-PROCESSING AND INFERENCE At inference, points exceeding confidence thresholdτ= 0.3 are selected as corner candidates with offset-refined positions. We apply DBSCAN [14] withϵ= 0.05and minPts= 1to group nearby predictions, then compute cluster centroids as final corners. Fig. 2. SAGE3D architecture. The encoder uses four Set Abstraction layers with Po...
-
[5]
Implementation Details The model is implemented in PyTorch and trained on a desk- top PC with an RTX 4070 GPU (12GB VRAM), batch size
EXPERIMENTAL RESULTS 5.1. Implementation Details The model is implemented in PyTorch and trained on a desk- top PC with an RTX 4070 GPU (12GB VRAM), batch size
-
[6]
Training takes roughly 1 day on the Building3D Tallinn dataset
We use AdamW with weight decay 0.01 and OneCycleLR schedule over 60 epochs (max LR 0.01, 10% warmup, cosine annealing). Training takes roughly 1 day on the Building3D Tallinn dataset. We evaluate using Average Corner Offset (ACO), Corner Precision (CP), Corner Recall (CR), and Corner-F1 (CF1) with Hungarian algorithm matching [15]. Table 1. Ablation study...
-
[7]
CONCLUSION This paper introduces SAGE3D, a hybrid Transformer and GNN architecture for efficient 3D corner detection. By proposing Soft-Guided Attention to inject geometric priors and an Excitatory GNN to boost recall, we achieve state-of- the-art localization accuracy (0.134 ACO) on the Building3D benchmark using a single consumer-grade GPU. This estab- ...
-
[8]
Poly- room: Room-aware transformer for floorplan recon- struction,
Yuzhou Liu, Lingjie Zhu, Xiaodong Ma, Hanqiao Ye, Xiang Gao, Xianwei Zheng, and Shuhan Shen, “Poly- room: Room-aware transformer for floorplan recon- struction,” inEuropean Conference on Computer Vision. Springer, 2024, pp. 322–339
2024
-
[9]
Maptrv2: An end-to-end framework for on- line vectorized hd map construction,
Bencheng Liao, Shaoyu Chen, Xinggang Wang, Tian- heng Cheng, Qian Zhang, Wenyu Liu, and Chang Huang, “Maptrv2: An end-to-end framework for on- line vectorized hd map construction,” inInternational Journal of Computer Vision, 2024
2024
-
[10]
Pointnet++: Deep hierarchical feature learning on point sets in a metric space,
Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,”Advances in Neural Information Processing Systems, vol. 30, 2017
2017
-
[11]
Point transformer,
Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and Vladlen Koltun, “Point transformer,” inIEEE/CVF International Conference on Computer Vision, 2021, pp. 16259–16268
2021
-
[12]
Dy- namic graph cnn for learning on point clouds,
Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E Sarma, Michael M Bronstein, and Justin M Solomon, “Dy- namic graph cnn for learning on point clouds,”ACM Transactions on Graphics, vol. 38, no. 5, pp. 1–12, 2019
2019
-
[13]
Pro- fessor forcing: A new algorithm for training recurrent networks,
Alex M Lamb, Anirudh Goyal, Ying Zhang, Saizheng Zhang, Aaron C Courville, and Yoshua Bengio, “Pro- fessor forcing: A new algorithm for training recurrent networks,”Advances in Neural Information Processing Systems, vol. 29, 2016
2016
-
[14]
Learning using privileged information: Similarity control and knowl- edge transfer,
Vladimir Vapnik and Akshay Vashist, “Learning using privileged information: Similarity control and knowl- edge transfer,”Journal of Machine Learning Research, vol. 12, pp. 2023–2049, 2011
2023
-
[15]
Graph attention networks,
Petar Velickovic, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio, “Graph attention networks,” inInternational Confer- ence on Learning Representations, 2018
2018
-
[16]
Point2roof: End-to-end 3d building roof model- ing from airborne lidar point clouds,
Li Li, Minhyuk Sung, Anton Duber, and Niloy J Mi- tra, “Point2roof: End-to-end 3d building roof model- ing from airborne lidar point clouds,”ISPRS Journal of Photogrammetry and Remote Sensing, vol. 193, pp. 17–28, 2022
2022
-
[17]
Point transformer v2: Grouped vector attention and partition-based pooling,
Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Hengshuang Zhao, “Point transformer v2: Grouped vector attention and partition-based pooling,” inAd- vances in Neural Information Processing Systems, 2022, vol. 35, pp. 33330–33342
2022
-
[18]
U-net: Convolutional networks for biomedical im- age segmentation,
Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-net: Convolutional networks for biomedical im- age segmentation,” inMedical Image Computing and Computer-Assisted Intervention, 2015, pp. 234–241
2015
-
[19]
Focal loss for dense object detection,
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar, “Focal loss for dense object detection,” IEEE Transactions on Pattern Analysis and Machine In- telligence, vol. 42, no. 2, pp. 318–327, 2020
2020
-
[20]
Fast R-CNN,
Ross Girshick, “Fast R-CNN,” inIEEE International Conference on Computer Vision, 2015, pp. 1440–1448
2015
-
[21]
A density-based algorithm for discovering clusters in large spatial databases with noise,
Martin Ester, Hans-Peter Kriegel, Jorg Sander, and Xi- aowei Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” inInter- national Conference on Knowledge Discovery and Data Mining, 1996, pp. 226–231
1996
-
[22]
The hungarian method for the assign- ment problem,
Harold W Kuhn, “The hungarian method for the assign- ment problem,”Naval Research Logistics Quarterly, vol. 2, no. 1-2, pp. 83–97, 1955
1955
-
[23]
Building3d: An urban-scale dataset and bench- marks for learning roof structures from point clouds,
Ruisheng Wang, Jiju Peethambaran, and Dong Chen, “Building3d: An urban-scale dataset and bench- marks for learning roof structures from point clouds,” IEEE/CVF International Conference on Computer Vi- sion, pp. 20076–20085, 2023
2023
-
[24]
Pbwr: Parametric building wireframe re- construction from aerial lidar point clouds,
Zhaiyu Huang, Fan Zhang, Zeren Hu, Yao Jin, and Siyan Chen, “Pbwr: Parametric building wireframe re- construction from aerial lidar point clouds,”ISPRS Jour- nal of Photogrammetry and Remote Sensing, vol. 203, pp. 1–14, 2023
2023
-
[25]
Bwformer: Building wireframe reconstruction from airborne lidar point cloud with transformer,
Yuzhou Liu, Lingjie Zhu, Hanqiao Ye, Shangfeng Huang, Xiang Gao, Xianwei Zheng, and Shuhan Shen, “Bwformer: Building wireframe reconstruction from airborne lidar point cloud with transformer,” inPro- ceedings of the Computer Vision and Pattern Recogni- tion Conference, 2025, pp. 22215–22224
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.