OPTNet: Ordering Point Transformer Network for Post-disaster 3D Semantic Segmentation

Ehsan Karimi; Maryam Rahnemoonfar; Nhut Le

arxiv: 2605.17197 · v1 · pith:YWRLEXELnew · submitted 2026-05-16 · 💻 cs.LG · cs.CV

OPTNet: Ordering Point Transformer Network for Post-disaster 3D Semantic Segmentation

Nhut Le , Ehsan Karimi , Maryam Rahnemoonfar This is my paper

Pith reviewed 2026-05-20 14:02 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords point cloud semantic segmentationpoint transformerlearnable orderingself-supervised lossdisaster damage assessmentattention locality3D scene understanding

0 comments

The pith

A learnable point sorter predicts optimal orderings to improve attention locality in 3D transformer networks for disaster scenes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Fixed ordering methods such as Hilbert curves or Z-order fail to capture the irregular geometry of post-disaster point clouds effectively. OPTNet adds a Point Sorter module trained by a self-supervised ordering loss that learns a permutation maximizing locality for window-based attention. The resulting network is tested on the 3DAeroRelief dataset and reports higher accuracy than prior point transformer variants. This addresses the need for rapid identification of damaged buildings, roads, and other infrastructure after events such as hurricanes or earthquakes.

Core claim

OPTNet introduces a learnable Point Sorter module that uses a self-supervised ordering loss to dynamically predict an optimal permutation of points. The permutation maximizes locality for the attention mechanism in a point transformer architecture, replacing static serialization methods. When evaluated on the 3DAeroRelief dataset the approach yields higher semantic segmentation performance than current state-of-the-art baselines.

What carries the argument

The Point Sorter module, a learnable component that outputs a permutation of input points to increase locality within attention windows.

If this is right

Window-based attention can operate on larger point clouds without expensive neighbor search or farthest-point sampling.
Segmentation accuracy increases for classes representing damaged infrastructure in irregular post-disaster scenes.
The network adapts its internal ordering to the specific geometry of each input rather than using one fixed rule for all data.
Overall inference speed improves while maintaining or raising accuracy on large-scale 3D scenes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sorter idea could be inserted into other transformer models that process unordered data such as meshes or graphs.
Evaluating the learned orderings on non-disaster point-cloud datasets would show whether the gain is tied to highly irregular geometries.
Combining the ordering loss with supervised segmentation loss from the start might further stabilize training.

Load-bearing premise

A permutation learned through self-supervision will reliably improve attention locality for complex disaster geometries without creating training instability or overfitting.

What would settle it

Replace the learned ordering with a fixed Hilbert-curve ordering inside the same network and check whether segmentation accuracy on the 3DAeroRelief dataset drops to the level of prior baselines.

Figures

Figures reproduced from arXiv: 2605.17197 by Ehsan Karimi, Maryam Rahnemoonfar, Nhut Le.

**Figure 1.** Figure 1: The overview of OPTNet Framework. The network utilizes a Learnable Point Sorter to dynamically serialize the input point cloud, optimizing the point order for the subsequent Point Transformer Backbone. This learnable serialization preserves geometric locality more effectively than static heuristics, enhancing the efficiency of windowed attention. However, the efficacy of this serialization strategy depends… view at source ↗

**Figure 2.** Figure 2: The core mechanism of OPTNet. Point Sorter: An MLP consumes point coordinates and features to predict a scalar score si ∈ [0, 1] for every point. These scores are sorted to produce a permutation π that serializes the point cloud. Self-Supervised Ordering Loss: We train the sorter using a Locality Loss (Llocal), which minimizes the score variance among spatial k-nearest neighbors to preserve geometric struc… view at source ↗

read the original abstract

Post-disaster damage assessment requires rapid and accurate semantic segmentation of 3D point clouds to identify critical infrastructure such as damaged buildings and roads. Early Point Transformers (e.g., PTv1, PTv2) relied on computationally expensive neighbor searching (k-NN) and Farthest Point Sampling (FPS). To improve efficiency, recent architectures like Point Transformer V3 (PTv3) adopted static serialization methods, such as Hilbert curves or Z-order, to organize unstructured points for window-based attention. However, these fixed orderings are not optimal for capturing the complex geometry of disaster scenes. In this paper, we propose OPTNet (Ordering Point Transformer Network), which introduces a learnable Point Sorter module. OPTNet utilizes a self-supervised ordering loss to dynamically predict an optimal permutation that maximizes the locality of the attention mechanism. We evaluate our method on the 3DAeroRelief dataset, significantly outperforming state-of-the-art baselines.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OPTNet adds a learnable Point Sorter with self-supervised loss to improve ordering for attention in post-disaster point clouds, but offers no direct metrics showing the ordering actually boosts locality over fixed curves.

read the letter

The main point to know is that this paper adds a learnable Point Sorter module to the Point Transformer architecture, trained with a self-supervised ordering loss, to create better permutations of points for window-based attention in post-disaster 3D semantic segmentation. It shows improved performance over baselines on the 3DAeroRelief dataset. What stands out is the focus on a real application where point clouds from disasters have complex and irregular structures that static serialization methods like Hilbert curves or Z-order may not handle optimally. The self-supervised loss provides an independent signal to optimize the ordering without relying on the final task loss alone. This seems like a straightforward and logical step from the static approaches in PTv3. On the downside, the central assumption that the learned ordering produces superior locality remains unverified by direct measures. There are no reported statistics comparing the learned permutations to fixed ones on metrics such as mean consecutive point distance or intra-window point density. As a result, the accuracy improvements could arise simply from the additional capacity introduced by the sorter module or from changes in the training procedure. The description also omits details on ablation studies, error bars, and dataset characteristics, which would help assess the reliability of the gains. This paper would appeal to researchers in applied computer vision working on 3D data for emergency management or infrastructure assessment. A practitioner looking for efficiency tweaks in point cloud transformers for specific domains might find the module worth experimenting with. Overall, the work shows clear thinking on adapting recent transformer techniques to a practical problem. It deserves serious peer review to allow referees to request the missing locality comparisons and experimental controls.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes OPTNet, a Point Transformer variant for post-disaster 3D semantic segmentation. It adds a learnable Point Sorter module trained with a self-supervised ordering loss that predicts a permutation intended to maximize locality for subsequent window-based attention, claiming this yields significant gains over prior static serialization methods (Hilbert curves, Z-order) and state-of-the-art baselines on the 3DAeroRelief dataset.

Significance. If the central mechanism is verified, the work could improve efficiency and accuracy of point-cloud transformers in geometrically complex, time-sensitive settings such as disaster response. The self-supervised ordering approach directly targets a recognized limitation of fixed serialization in PTv3-style architectures.

major comments (2)

[§4 and §3.2] §4 (Experiments) and §3.2 (Point Sorter): the manuscript reports mIoU and accuracy improvements on 3DAeroRelief but provides no quantitative locality metric (e.g., mean Euclidean distance between consecutive points after permutation, or average intra-window point coherence) comparing the learned ordering against Hilbert/Z-order baselines. Because the central claim attributes gains specifically to superior locality rather than added capacity or training dynamics, this omission is load-bearing for the result interpretation.
[§3.3] §3.3 (Ordering Loss): the self-supervised loss is described as encouraging locality, yet no ablation isolates its contribution from the Point Sorter module's extra parameters or from changes in attention-window statistics. Without this isolation, it remains unclear whether the reported outperformance stems from the claimed mechanism.

minor comments (2)

[Abstract and §4] The abstract and §4 omit basic dataset statistics (point count, class distribution, train/val/test split sizes) for 3DAeroRelief; these should be added for reproducibility.
[Figure 3] Figure 3 (permutation visualization) would benefit from side-by-side comparison with Hilbert and Z-order curves on the same scene to illustrate the locality difference.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and describe the revisions we will make.

read point-by-point responses

Referee: [§4 and §3.2] §4 (Experiments) and §3.2 (Point Sorter): the manuscript reports mIoU and accuracy improvements on 3DAeroRelief but provides no quantitative locality metric (e.g., mean Euclidean distance between consecutive points after permutation, or average intra-window point coherence) comparing the learned ordering against Hilbert/Z-order baselines. Because the central claim attributes gains specifically to superior locality rather than added capacity or training dynamics, this omission is load-bearing for the result interpretation.

Authors: We agree that a direct quantitative locality metric would strengthen the interpretation of the results. In the revised manuscript we will add comparisons using mean Euclidean distance between consecutive points after permutation and average intra-window point coherence, computed for the learned ordering versus the Hilbert and Z-order baselines. These metrics will be reported in Section 4 alongside the existing mIoU and accuracy numbers. revision: yes
Referee: [§3.3] §3.3 (Ordering Loss): the self-supervised loss is described as encouraging locality, yet no ablation isolates its contribution from the Point Sorter module's extra parameters or from changes in attention-window statistics. Without this isolation, it remains unclear whether the reported outperformance stems from the claimed mechanism.

Authors: We acknowledge that an ablation isolating the self-supervised ordering loss is needed. In the revision we will add an experiment that trains the Point Sorter using only the downstream segmentation loss (removing the self-supervised term) while keeping the module architecture fixed, and we will report the resulting mIoU and locality metrics. This will separate the effect of the loss from the added parameters. revision: yes

Circularity Check

0 steps flagged

No significant circularity; self-supervised loss is independent training signal

full rationale

The paper's core proposal is a learnable Point Sorter trained via a separate self-supervised ordering loss whose objective is to maximize attention locality; this loss is not defined in terms of the downstream segmentation accuracy, nor does any equation or claim reduce the reported performance gains to a re-expression of the input data or fitted parameters by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling appear in the abstract or method outline. The evaluation on the external 3DAeroRelief dataset supplies an independent benchmark, keeping the derivation chain self-contained against external falsification.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a self-supervised locality loss can be optimized jointly with the segmentation task without destabilizing training, plus standard transformer assumptions about attention locality. No explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption Static orderings such as Hilbert curves fail to capture complex geometry in disaster scenes
Stated in the abstract as motivation for the learnable sorter.

pith-pipeline@v0.9.0 · 5701 in / 1302 out tokens · 35528 ms · 2026-05-20T14:02:58.987470+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 2 internal anchors

[1]

Remote Sensing 14(8), 1797 (2022)

Chen, J., Huang, B., Li, J., Wang, Y., Ren, M., Xu, T.: Learning spatio-temporal attention based siamese network for tracking uavs in the wild. Remote Sensing 14(8), 1797 (2022)

work page 2022
[2]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Choy, C., Gwak, J., Savarese, S.: 4d spatio-temporal convnets: Minkowski convo- lutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3075–3084 (2019)

work page 2019
[3]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

Gupta, R., Goodman, B., Patel, N., Hosfelt, R., Sajeev, S., Heim, E., Doshi, J., Lucas, K., Choset, H., Gaston, M.: Creating xbd: A dataset for assessing building damage from satellite imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 10–17 (2019)

work page 2019
[4]

Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, N., Markham, A.: Randla-net: Efficient semantic segmentation of large-scale point clouds. Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

work page 2020
[5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

Landrieu, L., Simonovsky, M.: Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

work page 2017
[6]

Le, N., Karimi, E., Rahnemoonfar, M.: 3daerorelief: The first 3d benchmark uav dataset for post-disaster assessment (2025), https://arxiv.org/abs/2509.11097

work page arXiv 2025
[7]

In: Palaniappan, K., Seetharaman, G., Irvine, J.M

Le, N., Rahnemoonfar, M.: 3D semantic segmentation network for post- disaster assessment with unmanned aerial vehicles. In: Palaniappan, K., Seetharaman, G., Irvine, J.M. (eds.) Geospatial Informatics XV. vol. 13461, p. 134610B. International Society for Optics and Photonics, SPIE (2025). https://doi.org/10.1117/12.3053919

work page doi:10.1117/12.3053919 2025
[8]

In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R

Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: Convolution on x-transformed points. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc. (2018)

work page 2018
[9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Park, C., Jeong, Y., Cho, M., Park, J.: Fast point transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16949–16958 (June 2022)

work page 2022
[10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Peng, B., Wu, X., Jiang, L., Chen, Y., Zhao, H., Tian, Z., Jia, J.: Oa-cnns: Omni-adaptive sparse cnns for 3d semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21305–21315 (June 2024)

work page 2024
[11]

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016
[12]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017) 14 N. Le et al

work page internal anchor Pith review Pith/arXiv arXiv 2017
[13]

Advances in neural information processing systems35, 23192–23204 (2022)

Qian, G., Li, Y., Peng, H., Mai, J., Hammoud, H., Elhoseiny, M., Ghanem, B.: Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in neural information processing systems35, 23192–23204 (2022)

work page 2022
[14]

Scientific data10(1), 913 (2023)

Rahnemoonfar, M., Chowdhury, T., Murphy, R.: Rescuenet: a high resolution uav semantic segmentation dataset for natural disaster damage assessment. Scientific data10(1), 913 (2023)

work page 2023
[15]

IEEE Access9, 89644–89654 (2021)

Rahnemoonfar, M., Chowdhury, T., Sarkar, A., Varshney, D., Yari, M., Murphy, R.R.: Floodnet: A high resolution aerial imagery dataset for post flood scene un- derstanding. IEEE Access9, 89644–89654 (2021)

work page 2021
[16]

Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Robert, D., Raguet, H., Landrieu, L.: Efficient 3d semantic segmentation with superpoint transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023
[17]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2019)

Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: Kpconv: Flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2019)

work page 2019
[18]

In: CVPR (2024)

Wu, X., Jiang, L., Wang, P.S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., Zhao, H.: Point transformer v3: Simpler, faster, stronger. In: CVPR (2024)

work page 2024
[19]

In: NeurIPS (2022)

Wu, X., Lao, Y., Jiang, L., Liu, X., Zhao, H.: Point transformer v2: Grouped vector attention and partition-based pooling. In: NeurIPS (2022)

work page 2022
[20]

Yang, Y.Q., Guo, Y.X., Xiong, J.Y., Liu, Y., Pan, H., Wang, P.S., Tong, X., Guo, B.: Swin3d: A pretrained transformer backbone for 3d indoor scene understanding (2023)

work page 2023
[21]

Remote Sensing 17(10) (2025)

Zhao, F., Zhang, C., Zhang, R., Wang, T.: Visual prompt learning of foundation models for post-disaster damage evaluation. Remote Sensing 17(10) (2025). https://doi.org/10.3390/rs17101664, https://www.mdpi.com/2072- 4292/17/10/1664

work page doi:10.3390/rs17101664 2025
[22]

In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision

Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision. pp. 16259– 16268 (2021)

work page 2021

[1] [1]

Remote Sensing 14(8), 1797 (2022)

Chen, J., Huang, B., Li, J., Wang, Y., Ren, M., Xu, T.: Learning spatio-temporal attention based siamese network for tracking uavs in the wild. Remote Sensing 14(8), 1797 (2022)

work page 2022

[2] [2]

In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Choy, C., Gwak, J., Savarese, S.: 4d spatio-temporal convnets: Minkowski convo- lutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. pp. 3075–3084 (2019)

work page 2019

[3] [3]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops

Gupta, R., Goodman, B., Patel, N., Hosfelt, R., Sajeev, S., Heim, E., Doshi, J., Lucas, K., Choset, H., Gaston, M.: Creating xbd: A dataset for assessing building damage from satellite imagery. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops. pp. 10–17 (2019)

work page 2019

[4] [4]

Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

Hu, Q., Yang, B., Xie, L., Rosa, S., Guo, Y., Wang, Z., Trigoni, N., Markham, A.: Randla-net: Efficient semantic segmentation of large-scale point clouds. Pro- ceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2020)

work page 2020

[5] [5]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

Landrieu, L., Simonovsky, M.: Large-scale point cloud semantic segmentation with superpoint graphs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2017)

work page 2017

[6] [6]

Le, N., Karimi, E., Rahnemoonfar, M.: 3daerorelief: The first 3d benchmark uav dataset for post-disaster assessment (2025), https://arxiv.org/abs/2509.11097

work page arXiv 2025

[7] [7]

In: Palaniappan, K., Seetharaman, G., Irvine, J.M

Le, N., Rahnemoonfar, M.: 3D semantic segmentation network for post- disaster assessment with unmanned aerial vehicles. In: Palaniappan, K., Seetharaman, G., Irvine, J.M. (eds.) Geospatial Informatics XV. vol. 13461, p. 134610B. International Society for Optics and Photonics, SPIE (2025). https://doi.org/10.1117/12.3053919

work page doi:10.1117/12.3053919 2025

[8] [8]

In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R

Li, Y., Bu, R., Sun, M., Wu, W., Di, X., Chen, B.: Pointcnn: Convolution on x-transformed points. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems. vol. 31. Curran Associates, Inc. (2018)

work page 2018

[9] [9]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Park, C., Jeong, Y., Cho, M., Park, J.: Fast point transformer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 16949–16958 (June 2022)

work page 2022

[10] [10]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Peng, B., Wu, X., Jiang, L., Chen, Y., Zhao, H., Tian, Z., Jia, J.: Oa-cnns: Omni-adaptive sparse cnns for 3d semantic segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 21305–21315 (June 2024)

work page 2024

[11] [11]

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)

work page internal anchor Pith review Pith/arXiv arXiv 2016

[12] [12]

PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space

Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learn- ing on point sets in a metric space. arXiv preprint arXiv:1706.02413 (2017) 14 N. Le et al

work page internal anchor Pith review Pith/arXiv arXiv 2017

[13] [13]

Advances in neural information processing systems35, 23192–23204 (2022)

Qian, G., Li, Y., Peng, H., Mai, J., Hammoud, H., Elhoseiny, M., Ghanem, B.: Pointnext: Revisiting pointnet++ with improved training and scaling strategies. Advances in neural information processing systems35, 23192–23204 (2022)

work page 2022

[14] [14]

Scientific data10(1), 913 (2023)

Rahnemoonfar, M., Chowdhury, T., Murphy, R.: Rescuenet: a high resolution uav semantic segmentation dataset for natural disaster damage assessment. Scientific data10(1), 913 (2023)

work page 2023

[15] [15]

IEEE Access9, 89644–89654 (2021)

Rahnemoonfar, M., Chowdhury, T., Sarkar, A., Varshney, D., Yari, M., Murphy, R.R.: Floodnet: A high resolution aerial imagery dataset for post flood scene un- derstanding. IEEE Access9, 89644–89654 (2021)

work page 2021

[16] [16]

Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

Robert, D., Raguet, H., Landrieu, L.: Efficient 3d semantic segmentation with superpoint transformer. Proceedings of the IEEE/CVF International Conference on Computer Vision (2023)

work page 2023

[17] [17]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2019)

Thomas, H., Qi, C.R., Deschaud, J.E., Marcotegui, B., Goulette, F., Guibas, L.J.: Kpconv: Flexible and deformable convolution for point clouds. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (October 2019)

work page 2019

[18] [18]

In: CVPR (2024)

Wu, X., Jiang, L., Wang, P.S., Liu, Z., Liu, X., Qiao, Y., Ouyang, W., He, T., Zhao, H.: Point transformer v3: Simpler, faster, stronger. In: CVPR (2024)

work page 2024

[19] [19]

In: NeurIPS (2022)

Wu, X., Lao, Y., Jiang, L., Liu, X., Zhao, H.: Point transformer v2: Grouped vector attention and partition-based pooling. In: NeurIPS (2022)

work page 2022

[20] [20]

Yang, Y.Q., Guo, Y.X., Xiong, J.Y., Liu, Y., Pan, H., Wang, P.S., Tong, X., Guo, B.: Swin3d: A pretrained transformer backbone for 3d indoor scene understanding (2023)

work page 2023

[21] [21]

Remote Sensing 17(10) (2025)

Zhao, F., Zhang, C., Zhang, R., Wang, T.: Visual prompt learning of foundation models for post-disaster damage evaluation. Remote Sensing 17(10) (2025). https://doi.org/10.3390/rs17101664, https://www.mdpi.com/2072- 4292/17/10/1664

work page doi:10.3390/rs17101664 2025

[22] [22]

In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision

Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision. pp. 16259– 16268 (2021)

work page 2021