GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer

Alessandra Micheletti; Cl\`audia Soares; Diogo Lavado

arxiv: 2605.24243 · v1 · pith:UDKB6CQZnew · submitted 2026-05-22 · 💻 cs.CV · cs.AI· stat.ML

GIBLy: Improving 3D Semantic Segmentation through an Architecture-Agnostic Lightweight Geometric Inductive Bias Layer

Diogo Lavado , Alessandra Micheletti , Cl\`audia Soares This is my paper

Pith reviewed 2026-06-30 15:33 UTC · model grok-4.3

classification 💻 cs.CV cs.AIstat.ML

keywords 3D semantic segmentationgeometric inductive biaslightweight layerpoint cloud processingarchitecture-agnosticlearnable primitivesscene understanding

0 comments

The pith

GIBLy adds a lightweight layer to any 3D segmentation architecture that supplies features aligned with simple geometric shapes to raise accuracy at low cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GIBLy as an add-on layer that injects learnable geometric priors into existing 3D semantic segmentation models. Current deep networks capture basic shapes only indirectly through scale and data volume, which raises training costs and can limit generalization. GIBLy supplies aligned features from simple geometric shapes in a form that plugs into MLP, convolution, or transformer backbones without architecture-specific changes. Experiments report consistent gains on multiple benchmarks, including an 11.5 percent mIoU increase on TS40K when paired with PTV3, while adding only 58K parameters. The core argument is that explicit encoding of geometric structure supports more accurate and efficient 3D scene understanding.

Core claim

GIBLy is a lightweight geometric inductive bias layer that integrates learnable geometric priors into 3D segmentation pipelines. It enhances existing architectures by providing features aligned with simple geometric shapes that improve segmentation performance with minimal computational overhead. Validation across multiple benchmarks shows consistent performance gains, including up to 11.5 percent mIoU on TS40K with PTV3 while adding only 58K extra parameters.

What carries the argument

The GIBLy layer, which supplies features aligned with simple geometric shapes to the model backbone.

If this is right

The same layer produces gains when attached to MLP-based, convolution-based, and transformer-based backbones.
Consistent accuracy lifts appear across several standard 3D semantic segmentation datasets.
The added cost stays low at roughly 58K parameters regardless of the host architecture.
The supplied features remain aligned with human-interpretable geometric shapes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Smaller overall models could reach performance levels that currently require much larger networks in 3D tasks.
The same plug-in approach may transfer to other 3D problems such as object detection or instance segmentation.
The human-interpretable geometric features could support post-hoc inspection of model decisions on point clouds.

Load-bearing premise

The observed performance gains are produced by the geometric inductive bias rather than by the simple addition of new parameters or altered training dynamics.

What would settle it

Replace the geometric parameters inside GIBLy with random values of the same count, retrain the same backbones, and check whether the reported mIoU gains on TS40K and other benchmarks disappear.

Figures

Figures reproduced from arXiv: 2605.24243 by Alessandra Micheletti, Cl\`audia Soares, Diogo Lavado.

**Figure 1.** Figure 1: GIBLy injects learnable geometric priors to improve 3D understanding. Left: A Cylinder geometric inductive bias (GIB) aligns to a chair leg (neighborhood Nq), using learned orientation ϕ and radius r, producing an alignment score. Right: On TS40K [21], adding a GIB-Layer (GIBLy) single-handedly boosts mIoU across multiple backbones (up to +11.5% mIoU on PTV3 [49]) with only 58K extra parameters. ing models… view at source ↗

**Figure 2.** Figure 2: Overview of our method. (a) Our GIBLy module injects geometric awareness into point features by applying a set of learnable geometric inductive biases (GIBs) at each query point. We adopt a multi-region design: all input points are treated as query points, and multiple neighborhood scales are considered per point. For a given query i, gi,N denotes the GIB alignment scores computed at N neighborhoods. These… view at source ↗

**Figure 3.** Figure 3: A normalized bias ensures that neighbors aligning with [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative results on the TS40K dataset. Each row presents a different scene. From left to right: input point cloud, prediction from the baseline PointTransformerV3 (PTV3), and prediction from PTV3 augmented with GIBLy. In the three rows, the baseline fails to detect support towers, introduces spurious noise in vegetation regions, or misclassifies artifacts as vegetation. In contrast, the GIBLy-augmente… view at source ↗

read the original abstract

In 3D scene understanding, deep learning models rely on large models and extensive training to capture basic geometric structures that are present in the 3D data. However, existing methods lack explicit mechanisms to incorporate geometric information, such as learnable primitive shapes, often necessitating large models and more training data which in turn increases cost and can limit generalization. We introduce GIBLy, a lightweight geometric inductive bias layer that integrates learnable geometric priors into 3D segmentation pipelines. GIBLy enhances existing architectures -- whether MLP-based, convolution-based, or transformer-based -- by providing features aligned with simple geometric shapes (and thus human-interpretable) that improve segmentation performance with minimal computational overhead. We validate our approach across multiple 3D semantic segmentation benchmarks, demonstrating consistent performance gains, including up to +11.5% mIoU on TS40K with PTV3, while adding only 58K extra parameters. Our results highlight the benefit of explicitly encoding geometric structure to support accurate and efficient 3D scene understanding, with a lightweight add-on layer

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GIBLy adds a small plug-in layer with learnable shape priors to 3D segmentation models and reports gains like +11.5% mIoU, but the abstract gives no controls to show the geometry itself is responsible rather than extra capacity.

read the letter

The main point is that this paper introduces GIBLy, a lightweight layer meant to inject learnable priors based on simple geometric shapes into existing 3D segmentation networks. It claims to work across MLP, convolution, and transformer backbones, with the biggest lift being +11.5% mIoU on TS40K using PTV3 while adding just 58K parameters.

What the work does reasonably well is keep the overhead tiny and target a practical issue: many models end up large because they have to discover basic geometry from data alone. Framing the layer as architecture-agnostic and human-interpretable is a clear engineering choice, and the focus on minimal added cost makes sense for deployed systems.

The soft spot is the missing evidence that the geometric priors are doing the lifting. The abstract shows performance numbers but does not describe any comparison to a parameter-matched non-geometric module at the same insertion point, nor any ablation that freezes or randomizes the priors. Without those, it is difficult to separate the claimed inductive bias from the general effect of extra parameters on training. The numbers also lack error bars or details on splits and significance, which leaves the central result plausible but unconfirmed.

This paper is for applied researchers who already run 3D segmentation pipelines and want a low-cost add-on to try. A reader looking for incremental efficiency tricks could get value from the code if it ships, even if the attribution to geometry needs more work.

It deserves peer review because the overhead is small enough that the idea is easy to test, and the gap it targets is real. A referee could reasonably ask for the capacity-matched controls and then decide.

Referee Report

2 major / 2 minor

Summary. The paper introduces GIBLy, a lightweight add-on layer that injects learnable geometric priors (aligned with simple shapes) into existing 3D semantic segmentation architectures (MLP-, CNN-, or transformer-based). It claims consistent mIoU gains across benchmarks, including +11.5% on TS40K with PTV3, while adding only 58K parameters and negligible compute, by providing human-interpretable geometric features that reduce reliance on large models or extra data.

Significance. If the reported gains are shown to stem specifically from the geometric inductive bias (rather than added capacity), the method would offer a practical, architecture-agnostic way to encode basic 3D structure in segmentation pipelines. This could support more efficient training and better generalization on geometric scenes, with the small parameter count making it easy to adopt as a plug-in module.

major comments (2)

[Experiments] Experiments section (and abstract): the central attribution of mIoU gains (e.g., +11.5% on TS40K with PTV3) to the geometric inductive bias is not isolated from the effect of inserting any 58K-parameter module. No ablation compares GIBLy against a parameter-matched non-geometric control (random features, plain MLP, or frozen weights) at the identical insertion point, nor reports architecture-specific retuning ablations. This leaves open whether the improvement arises from the learnable priors or from optimization dynamics of the added capacity.
[Method and Results] § on method and results: the claim that GIBLy is 'architecture-agnostic' and integrates 'cleanly' into arbitrary backbones lacks evidence on whether insertion requires per-architecture hyperparameter retuning or produces hidden conflicts in feature scales or gradient flow. Without such checks the generality assertion remains untested.

minor comments (2)

[Abstract] Abstract and §1: performance numbers are stated without reference to exact baselines, data splits, number of runs, or statistical significance; adding these details would strengthen the claims even if the ablations above are the primary concern.
[Method] Notation: the description of 'learnable geometric priors' and how they are aligned with 'simple geometric shapes' would benefit from an explicit equation or diagram showing the prior parameterization and the feature-alignment operation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation of our contributions. We address each major point below and will revise the manuscript accordingly to strengthen the evidence for our claims.

read point-by-point responses

Referee: [Experiments] the central attribution of mIoU gains to the geometric inductive bias is not isolated from the effect of inserting any 58K-parameter module. No ablation compares GIBLy against a parameter-matched non-geometric control.

Authors: We agree that the current experiments do not include a parameter-matched non-geometric control (e.g., random features or plain MLP) at the identical insertion point. This is a valid concern for isolating the geometric prior's contribution. In the revised manuscript we will add such ablations across the reported backbones and datasets, along with frozen-weight controls, to directly address whether gains arise from the geometric alignment rather than added capacity. revision: yes
Referee: [Method and Results] the claim that GIBLy is 'architecture-agnostic' and integrates 'cleanly' lacks evidence on whether insertion requires per-architecture hyperparameter retuning or produces hidden conflicts in feature scales or gradient flow.

Authors: Our experiments already demonstrate integration into MLP-, CNN-, and transformer-based models with consistent gains using the same default hyperparameters and insertion strategy. However, we did not explicitly analyze retuning requirements or potential scale/gradient issues. We will add a dedicated subsection with gradient-norm statistics, feature-scale comparisons before/after insertion, and a note on whether any architecture-specific adjustments were needed, to better substantiate the agnostic claim. revision: yes

Circularity Check

0 steps flagged

No circularity: additive layer with empirical validation, no derivations or self-referential reductions

full rationale

The paper presents GIBLy as an architecture-agnostic additive layer that injects learnable geometric priors into existing 3D segmentation backbones (MLP, conv, transformer). No equations, uniqueness theorems, or derivation chains appear in the provided text that would reduce the reported mIoU gains or feature alignments to quantities defined by the method's own fitted parameters. Performance claims rest on benchmark experiments rather than any self-definitional or fitted-input-called-prediction structure. Self-citations, if present, are not load-bearing for the central claim, which remains an independent empirical proposal rather than a closed loop.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on one domain assumption about the utility of primitive geometric shapes and introduces learnable parameters inside the new layer; no invented physical entities are postulated.

free parameters (1)

learnable geometric priors
Parameters inside GIBLy are trained to align with simple shapes and are therefore fitted during optimization.

axioms (1)

domain assumption Features aligned with simple geometric shapes improve 3D semantic segmentation when added to existing models
The abstract states that providing such features yields consistent gains across architectures.

pith-pipeline@v0.9.1-grok · 5732 in / 1315 out tokens · 41343 ms · 2026-06-30T15:33:43.205243+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 4 canonical work pages · 1 internal anchor

[1]

3d seman- tic parsing of large-scale indoor spaces

Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioan- nis Brilakis, Martin Fischer, and Silvio Savarese. 3d seman- tic parsing of large-scale indoor spaces. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1534–1543, 2016. 2, 6, 7

2016
[2]

A model of inductive bias learning.Journal of artificial intelligence research, 12:149–198, 2000

Jonathan Baxter. A model of inductive bias learning.Journal of artificial intelligence research, 12:149–198, 2000. 3

2000
[3]

Se- mantickitti: A dataset for semantic scene understanding of lidar sequences

Jens Behley, Martin Garbade, Andres Milioto, Jan Quen- zel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. Se- mantickitti: A dataset for semantic scene understanding of lidar sequences. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 9297–9307,
[4]

Geometry-informed neural networks.arXiv preprint arXiv:2402.14009, 2024

Arturs Berzins, Andreas Radler, Eric V olkmann, Sebas- tian Sanokowski, Sepp Hochreiter, and Johannes Brandstet- ter. Geometry-informed neural networks.arXiv preprint arXiv:2402.14009, 2024. 3

work page arXiv 2024
[5]

Geneonet: A new machine learning paradigm based on group equivariant non-expansive operators

Giovanni Bocchi, Patrizio Frosini, Alessandra Micheletti, Alessandro Pedretti, Carmen Gratteri, Filippo Lunghini, An- drea Rosario Beccari, and Carmine Talarico. Geneonet: A new machine learning paradigm based on group equivariant non-expansive operators. an application to protein pocket de- tection.arXiv preprint arXiv:2202.00451, 2022. 3

work page arXiv 2022
[6]

nuscenes: A multi- modal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 2, 6, 7

2020
[7]

Non- parametric boundary geometry in physics informed deep learning.Advances in Neural Information Processing Sys- tems, 36, 2024

Scott Cameron, Arnu Pretorius, and Stephen Roberts. Non- parametric boundary geometry in physics informed deep learning.Advances in Neural Information Processing Sys- tems, 36, 2024. 3

2024
[8]

4d spatio-temporal convnets: Minkowski convolutional neural networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4d spatio-temporal convnets: Minkowski convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3075–3084,
[9]

Inductive Bias of Deep Convolutional Networks through Pooling Geometry

Nadav Cohen and Amnon Shashua. Inductive bias of deep convolutional networks through pooling geometry.arXiv preprint arXiv:1605.06743, 2016. 3

work page internal anchor Pith review Pith/arXiv arXiv 2016
[10]

Scannet: Richly-annotated 3d reconstructions of indoor scenes

Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. 2, 6, 7

2017
[11]

Inductive biases for deep learning of higher-level cognition.Proceedings of the Royal Society A, 478(2266):20210068, 2022

Anirudh Goyal and Yoshua Bengio. Inductive biases for deep learning of higher-level cognition.Proceedings of the Royal Society A, 478(2266):20210068, 2022. 3

2022
[12]

Flex-convolution: Million-scale point-cloud learning beyond grid-worlds

Fabian Groh, Patrick Wieschollek, and Hendrik PA Lensch. Flex-convolution: Million-scale point-cloud learning beyond grid-worlds. InAsian Conference on Computer Vision, pages 105–122. Springer, 2018. 2

2018
[13]

Meshcnn: a network with an edge.ACM Transactions on Graphics (ToG), 38(4):1–12,

Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. Meshcnn: a network with an edge.ACM Transactions on Graphics (ToG), 38(4):1–12,
[14]

Monte carlo convolution for learning on non-uniformly sampled point clouds.ACM Transactions On Graphics (TOG), 37(6):1–12, 2018

Pedro Hermosilla, Tobias Ritschel, Pere-Pau V ´azquez, `Alvar Vinacua, and Timo Ropinski. Monte carlo convolution for learning on non-uniformly sampled point clouds.ACM Transactions On Graphics (TOG), 37(6):1–12, 2018. 2

2018
[15]

Rethinking range view representation for lidar segmentation

Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, and Ziwei Liu. Rethinking range view representation for lidar segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 228–240, 2023. 2

2023
[16]

Stratified trans- former for 3d point cloud segmentation

Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, and Jiaya Jia. Stratified trans- former for 3d point cloud segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8500–8509, 2022. 1, 2

2022
[17]

Spherical transformer for lidar-based 3d recognition

Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, and Jiaya Jia. Spherical transformer for lidar-based 3d recognition. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17545–17555, 2023. 2

2023
[18]

Pointpillars: Fast encoders for object detection from point clouds

Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. 2

2019
[19]

Low-resource white-box semantic segmentation of supporting towers on 3d point clouds via signature shape identification.arXiv preprint arXiv:2306.07809, 2023

Diogo Lavado, Cl ´audia Soares, Alessandra Micheletti, Gio- vanni Bocchi, Alex Coronati, Manuel Silva, and Patrizio Frosini. Low-resource white-box semantic segmentation of supporting towers on 3d point clouds via signature shape identification.arXiv preprint arXiv:2306.07809, 2023. 3

work page arXiv 2023
[20]

Scene-net v2: Interpretable multiclass 3d scene understand- ing with geometric priors.PROCEEDINGS OF MACHINE LEARNING RESEARCH, 251:222–232, 2024

Diogo Lavado, Cl ´audia Soares, Alessandra Micheletti, et al. Scene-net v2: Interpretable multiclass 3d scene understand- ing with geometric priors.PROCEEDINGS OF MACHINE LEARNING RESEARCH, 251:222–232, 2024. 3

2024
[21]

Learning under noisy labels, spurious points, and diverse structures: Ts40k, a 3d point cloud dataset of rural terrain and electrical trans- mission systems

Diogo Lavado, Ricardo Santos, Andr ´e Coelho, Jo˜ao Santos, Alessandra Micheletti, and Claudia Soares. Learning under noisy labels, spurious points, and diverse structures: Ts40k, a 3d point cloud dataset of rural terrain and electrical trans- mission systems. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7326–7336. IE...

2025
[22]

Towards under- standing inductive bias in transformers: A view from infinity

Itay Lavie, Guy Gur-Ari, and Zohar Ringel. Towards under- standing inductive bias in transformers: A view from infinity. InProceedings of the 41st International Conference on Ma- chine Learning, pages 26043–26069. PMLR, 2024. 3 9

2024
[23]

Deep projective 3d semantic segmentation

Felix J ¨aremo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. Deep projective 3d semantic segmentation. InComputer Analysis of Images and Patterns: 17th International Confer- ence, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Pro- ceedings, Part I 17, pages 95–107. Springer, 2017. 2

2017
[24]

Pointgrid: A deep network for 3d shape understanding

Truc Le and Ye Duan. Pointgrid: A deep network for 3d shape understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 9204– 9214, 2018. 2

2018
[25]

Octree guided cnn with spherical kernels for 3d point clouds

Huan Lei, Naveed Akhtar, and Ajmal Mian. Octree guided cnn with spherical kernels for 3d point clouds. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9631–9640, 2019. 2

2019
[26]

Spherical kernel for efficient graph convolution on 3d point clouds.IEEE transactions on pattern analysis and machine intelligence, 43(10):3664–3680, 2020

Huan Lei, Naveed Akhtar, and Ajmal Mian. Spherical kernel for efficient graph convolution on 3d point clouds.IEEE transactions on pattern analysis and machine intelligence, 43(10):3664–3680, 2020. 2

2020
[27]

Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018. 1, 2, 4

2018
[28]

Geometry-informed neural operator for large-scale 3d pdes.Advances in Neural Information Processing Systems, 36, 2024

Zongyi Li, Nikola Kovachki, Chris Choy, Boyi Li, Jean Kossaifi, Shourya Otta, Mohammad Amin Nabian, Maxi- milian Stadler, Christian Hundt, Kamyar Azizzadenesheli, et al. Geometry-informed neural operator for large-scale 3d pdes.Advances in Neural Information Processing Systems, 36, 2024. 3

2024
[29]

Learning to segment 3d point clouds in 2d image space

Yecheng Lyu, Xinming Huang, and Ziming Zhang. Learning to segment 3d point clouds in 2d image space. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12255–12264, 2020. 2

2020
[30]

V oxnet: A 3d con- volutional neural network for real-time object recognition

Daniel Maturana and Sebastian Scherer. V oxnet: A 3d con- volutional neural network for real-time object recognition. In2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 922–928. IEEE, 2015. 1, 2

2015
[31]

Vv-net: V oxel vae net with group convolutions for point cloud segmentation

Hsien-Yu Meng, Lin Gao, Yu-Kun Lai, and Dinesh Manocha. Vv-net: V oxel vae net with group convolutions for point cloud segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 8500– 8508, 2019. 1, 2

2019
[32]

Geometry aware physics in- formed neural network surrogate for solving navier–stokes equation (gapinn).Advanced Modeling and Simulation in Engineering Sciences, 9(1):8, 2022

Jan Oldenburg, Finja Borowski, Alper ¨Oner, Klaus-Peter Schmitz, and Michael Stiehm. Geometry aware physics in- formed neural network surrogate for solving navier–stokes equation (gapinn).Advanced Modeling and Simulation in Engineering Sciences, 9(1):8, 2022. 3

2022
[33]

Fast point transformer

Chunghyun Park, Yoonwoo Jeong, Minsu Cho, and Jae- sik Park. Fast point transformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16949–16958, 2022. 2

2022
[34]

Oa-cnns: Omni- adaptive sparse cnns for 3d semantic segmentation

Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Heng- shuang Zhao, Zhuotao Tian, and Jiaya Jia. Oa-cnns: Omni- adaptive sparse cnns for 3d semantic segmentation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21305–21315, 2024. 1, 2

2024
[35]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660,
[36]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 1, 2, 6, 7

2017
[37]

Dynamic edge- conditioned filters in convolutional neural networks on graphs

Martin Simonovsky and Nikos Komodakis. Dynamic edge- conditioned filters in convolutional neural networks on graphs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3693–3702, 2017. 2

2017
[38]

Multi-view convolutional neural networks for 3d shape recognition

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. InProceedings of the IEEE in- ternational conference on computer vision, pages 945–953,
[39]

Canonical capsules: Self-supervised cap- sules in canonical pose.Advances in Neural information processing systems, 34:24993–25005, 2021

Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey E Hinton, and Kwang Moo Yi. Canonical capsules: Self-supervised cap- sules in canonical pose.Advances in Neural information processing systems, 34:24993–25005, 2021. 3

2021
[40]

Kpconv: Flexible and deformable convolution for point clouds

Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas J Guibas. Kpconv: Flexible and deformable convolution for point clouds. InProceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420, 2019. 1, 2, 6, 7

2019
[41]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 1, 2

2017
[42]

Graph attention convolution for point cloud se- mantic segmentation

Lei Wang, Yuchun Huang, Yaolin Hou, Shenman Zhang, and Jie Shan. Graph attention convolution for point cloud se- mantic segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 10296–10305, 2019. 2

2019
[43]

Octformer: Octree-based transformers for 3d point clouds.ACM Transactions on Graphics (TOG), 42(4):1–11, 2023

Peng-Shuai Wang. Octformer: Octree-based transformers for 3d point clouds.ACM Transactions on Graphics (TOG), 42(4):1–11, 2023. 1, 2

2023
[44]

O-cnn: Octree-based convolutional neu- ral networks for 3d shape analysis.ACM Transactions On Graphics (TOG), 36(4):1–11, 2017

Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. O-cnn: Octree-based convolutional neu- ral networks for 3d shape analysis.ACM Transactions On Graphics (TOG), 36(4):1–11, 2017. 2

2017
[45]

Deep parametric continu- ous convolutional neural networks

Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei Pokrovsky, and Raquel Urtasun. Deep parametric continu- ous convolutional neural networks. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 2589–2597, 2018. 2

2018
[46]

Theoretical analysis of the induc- tive biases in deep convolutional networks.Advances in Neu- ral Information Processing Systems, 36:74289–74338, 2023

Zihao Wang and Lei Wu. Theoretical analysis of the induc- tive biases in deep convolutional networks.Advances in Neu- ral Information Processing Systems, 36:74289–74338, 2023. 3

2023
[47]

Pointconv: Deep convolutional networks on 3d point clouds

Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 9621–9630, 2019. 1, 2, 4

2019
[48]

Point transformer v2: Grouped vector atten- tion and partition-based pooling

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Heng- shuang Zhao. Point transformer v2: Grouped vector atten- tion and partition-based pooling. InNeurIPS, 2022. 1, 2, 6, 7 10

2022
[49]

Point transformer v3: Simpler faster stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4840–4851, 2024. 1, 2, 6, 7, 8

2024
[50]

Permutation equivariance of trans- formers and its applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye, Dixi Yao, Pengzhi Chu, and Baochun Li. Permutation equivariance of trans- formers and its applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5987–5996, 2024. 3

2024
[51]

Spidercnn: Deep learning on point sets with parameterized convolutional filters

Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. Spidercnn: Deep learning on point sets with parameterized convolutional filters. InProceedings of the European con- ference on computer vision (ECCV), pages 87–102, 2018. 2, 4

2018
[52]

$SE(3)$ equivariant convolution and transformer in ray space

Yinshuang Xu, Jiahui Lei, and Kostas Daniilidis. $SE(3)$ equivariant convolution and transformer in ray space. In Thirty-seventh Conference on Neural Information Process- ing Systems, 2023. 3

2023
[53]

Learning relationships for multi- view 3d object recognition

Ze Yang and Liwei Wang. Learning relationships for multi- view 3d object recognition. InProceedings of the IEEE/CVF international conference on computer vision, pages 7505– 7514, 2019. 2

2019
[54]

Input-level inductive biases for 3d reconstruction

Wang Yifan, Carl Doersch, Relja Arandjelovi ´c, Joao Car- reira, and Andrew Zisserman. Input-level inductive biases for 3d reconstruction. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 6176–6186, 2022. 3

2022
[55]

Polarnet: An improved grid representation for online lidar point clouds se- mantic segmentation

Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Ze- rong Xi, Boqing Gong, and Hassan Foroosh. Polarnet: An improved grid representation for online lidar point clouds se- mantic segmentation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9601–9610, 2020. 2

2020
[56]

Point transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and Vladlen Koltun. Point transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021. 1, 2, 6, 7

2021
[57]

Cylindrical and asymmetrical 3d convolution networks for lidar-based perception.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(10):6807–6822, 2021

Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Wei Li, Yuexin Ma, Hongsheng Li, Ruigang Yang, and Dahua Lin. Cylindrical and asymmetrical 3d convolution networks for lidar-based perception.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(10):6807–6822, 2021. 2 11

2021

[1] [1]

3d seman- tic parsing of large-scale indoor spaces

Iro Armeni, Ozan Sener, Amir R Zamir, Helen Jiang, Ioan- nis Brilakis, Martin Fischer, and Silvio Savarese. 3d seman- tic parsing of large-scale indoor spaces. InProceedings of the IEEE conference on computer vision and pattern recog- nition, pages 1534–1543, 2016. 2, 6, 7

2016

[2] [2]

A model of inductive bias learning.Journal of artificial intelligence research, 12:149–198, 2000

Jonathan Baxter. A model of inductive bias learning.Journal of artificial intelligence research, 12:149–198, 2000. 3

2000

[3] [3]

Se- mantickitti: A dataset for semantic scene understanding of lidar sequences

Jens Behley, Martin Garbade, Andres Milioto, Jan Quen- zel, Sven Behnke, Cyrill Stachniss, and Jurgen Gall. Se- mantickitti: A dataset for semantic scene understanding of lidar sequences. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 9297–9307,

[4] [4]

Geometry-informed neural networks.arXiv preprint arXiv:2402.14009, 2024

Arturs Berzins, Andreas Radler, Eric V olkmann, Sebas- tian Sanokowski, Sepp Hochreiter, and Johannes Brandstet- ter. Geometry-informed neural networks.arXiv preprint arXiv:2402.14009, 2024. 3

work page arXiv 2024

[5] [5]

Geneonet: A new machine learning paradigm based on group equivariant non-expansive operators

Giovanni Bocchi, Patrizio Frosini, Alessandra Micheletti, Alessandro Pedretti, Carmen Gratteri, Filippo Lunghini, An- drea Rosario Beccari, and Carmine Talarico. Geneonet: A new machine learning paradigm based on group equivariant non-expansive operators. an application to protein pocket de- tection.arXiv preprint arXiv:2202.00451, 2022. 3

work page arXiv 2022

[6] [6]

nuscenes: A multi- modal dataset for autonomous driving

Holger Caesar, Varun Bankiti, Alex H Lang, Sourabh V ora, Venice Erin Liong, Qiang Xu, Anush Krishnan, Yu Pan, Gi- ancarlo Baldan, and Oscar Beijbom. nuscenes: A multi- modal dataset for autonomous driving. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11621–11631, 2020. 2, 6, 7

2020

[7] [7]

Non- parametric boundary geometry in physics informed deep learning.Advances in Neural Information Processing Sys- tems, 36, 2024

Scott Cameron, Arnu Pretorius, and Stephen Roberts. Non- parametric boundary geometry in physics informed deep learning.Advances in Neural Information Processing Sys- tems, 36, 2024. 3

2024

[8] [8]

4d spatio-temporal convnets: Minkowski convolutional neural networks

Christopher Choy, JunYoung Gwak, and Silvio Savarese. 4d spatio-temporal convnets: Minkowski convolutional neural networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3075–3084,

[9] [9]

Inductive Bias of Deep Convolutional Networks through Pooling Geometry

Nadav Cohen and Amnon Shashua. Inductive bias of deep convolutional networks through pooling geometry.arXiv preprint arXiv:1605.06743, 2016. 3

work page internal anchor Pith review Pith/arXiv arXiv 2016

[10] [10]

Scannet: Richly-annotated 3d reconstructions of indoor scenes

Angela Dai, Angel X Chang, Manolis Savva, Maciej Hal- ber, Thomas Funkhouser, and Matthias Nießner. Scannet: Richly-annotated 3d reconstructions of indoor scenes. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5828–5839, 2017. 2, 6, 7

2017

[11] [11]

Inductive biases for deep learning of higher-level cognition.Proceedings of the Royal Society A, 478(2266):20210068, 2022

Anirudh Goyal and Yoshua Bengio. Inductive biases for deep learning of higher-level cognition.Proceedings of the Royal Society A, 478(2266):20210068, 2022. 3

2022

[12] [12]

Flex-convolution: Million-scale point-cloud learning beyond grid-worlds

Fabian Groh, Patrick Wieschollek, and Hendrik PA Lensch. Flex-convolution: Million-scale point-cloud learning beyond grid-worlds. InAsian Conference on Computer Vision, pages 105–122. Springer, 2018. 2

2018

[13] [13]

Meshcnn: a network with an edge.ACM Transactions on Graphics (ToG), 38(4):1–12,

Rana Hanocka, Amir Hertz, Noa Fish, Raja Giryes, Shachar Fleishman, and Daniel Cohen-Or. Meshcnn: a network with an edge.ACM Transactions on Graphics (ToG), 38(4):1–12,

[14] [14]

Monte carlo convolution for learning on non-uniformly sampled point clouds.ACM Transactions On Graphics (TOG), 37(6):1–12, 2018

Pedro Hermosilla, Tobias Ritschel, Pere-Pau V ´azquez, `Alvar Vinacua, and Timo Ropinski. Monte carlo convolution for learning on non-uniformly sampled point clouds.ACM Transactions On Graphics (TOG), 37(6):1–12, 2018. 2

2018

[15] [15]

Rethinking range view representation for lidar segmentation

Lingdong Kong, Youquan Liu, Runnan Chen, Yuexin Ma, Xinge Zhu, Yikang Li, Yuenan Hou, Yu Qiao, and Ziwei Liu. Rethinking range view representation for lidar segmentation. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 228–240, 2023. 2

2023

[16] [16]

Stratified trans- former for 3d point cloud segmentation

Xin Lai, Jianhui Liu, Li Jiang, Liwei Wang, Hengshuang Zhao, Shu Liu, Xiaojuan Qi, and Jiaya Jia. Stratified trans- former for 3d point cloud segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8500–8509, 2022. 1, 2

2022

[17] [17]

Spherical transformer for lidar-based 3d recognition

Xin Lai, Yukang Chen, Fanbin Lu, Jianhui Liu, and Jiaya Jia. Spherical transformer for lidar-based 3d recognition. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 17545–17555, 2023. 2

2023

[18] [18]

Pointpillars: Fast encoders for object detection from point clouds

Alex H Lang, Sourabh V ora, Holger Caesar, Lubing Zhou, Jiong Yang, and Oscar Beijbom. Pointpillars: Fast encoders for object detection from point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12697–12705, 2019. 2

2019

[19] [19]

Low-resource white-box semantic segmentation of supporting towers on 3d point clouds via signature shape identification.arXiv preprint arXiv:2306.07809, 2023

Diogo Lavado, Cl ´audia Soares, Alessandra Micheletti, Gio- vanni Bocchi, Alex Coronati, Manuel Silva, and Patrizio Frosini. Low-resource white-box semantic segmentation of supporting towers on 3d point clouds via signature shape identification.arXiv preprint arXiv:2306.07809, 2023. 3

work page arXiv 2023

[20] [20]

Scene-net v2: Interpretable multiclass 3d scene understand- ing with geometric priors.PROCEEDINGS OF MACHINE LEARNING RESEARCH, 251:222–232, 2024

Diogo Lavado, Cl ´audia Soares, Alessandra Micheletti, et al. Scene-net v2: Interpretable multiclass 3d scene understand- ing with geometric priors.PROCEEDINGS OF MACHINE LEARNING RESEARCH, 251:222–232, 2024. 3

2024

[21] [21]

Learning under noisy labels, spurious points, and diverse structures: Ts40k, a 3d point cloud dataset of rural terrain and electrical trans- mission systems

Diogo Lavado, Ricardo Santos, Andr ´e Coelho, Jo˜ao Santos, Alessandra Micheletti, and Claudia Soares. Learning under noisy labels, spurious points, and diverse structures: Ts40k, a 3d point cloud dataset of rural terrain and electrical trans- mission systems. In2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), pages 7326–7336. IE...

2025

[22] [22]

Towards under- standing inductive bias in transformers: A view from infinity

Itay Lavie, Guy Gur-Ari, and Zohar Ringel. Towards under- standing inductive bias in transformers: A view from infinity. InProceedings of the 41st International Conference on Ma- chine Learning, pages 26043–26069. PMLR, 2024. 3 9

2024

[23] [23]

Deep projective 3d semantic segmentation

Felix J ¨aremo Lawin, Martin Danelljan, Patrik Tosteberg, Goutam Bhat, Fahad Shahbaz Khan, and Michael Felsberg. Deep projective 3d semantic segmentation. InComputer Analysis of Images and Patterns: 17th International Confer- ence, CAIP 2017, Ystad, Sweden, August 22-24, 2017, Pro- ceedings, Part I 17, pages 95–107. Springer, 2017. 2

2017

[24] [24]

Pointgrid: A deep network for 3d shape understanding

Truc Le and Ye Duan. Pointgrid: A deep network for 3d shape understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 9204– 9214, 2018. 2

2018

[25] [25]

Octree guided cnn with spherical kernels for 3d point clouds

Huan Lei, Naveed Akhtar, and Ajmal Mian. Octree guided cnn with spherical kernels for 3d point clouds. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9631–9640, 2019. 2

2019

[26] [26]

Spherical kernel for efficient graph convolution on 3d point clouds.IEEE transactions on pattern analysis and machine intelligence, 43(10):3664–3680, 2020

Huan Lei, Naveed Akhtar, and Ajmal Mian. Spherical kernel for efficient graph convolution on 3d point clouds.IEEE transactions on pattern analysis and machine intelligence, 43(10):3664–3680, 2020. 2

2020

[27] [27]

Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018

Yangyan Li, Rui Bu, Mingchao Sun, Wei Wu, Xinhan Di, and Baoquan Chen. Pointcnn: Convolution on x-transformed points.Advances in neural information processing systems, 31, 2018. 1, 2, 4

2018

[28] [28]

Geometry-informed neural operator for large-scale 3d pdes.Advances in Neural Information Processing Systems, 36, 2024

Zongyi Li, Nikola Kovachki, Chris Choy, Boyi Li, Jean Kossaifi, Shourya Otta, Mohammad Amin Nabian, Maxi- milian Stadler, Christian Hundt, Kamyar Azizzadenesheli, et al. Geometry-informed neural operator for large-scale 3d pdes.Advances in Neural Information Processing Systems, 36, 2024. 3

2024

[29] [29]

Learning to segment 3d point clouds in 2d image space

Yecheng Lyu, Xinming Huang, and Ziming Zhang. Learning to segment 3d point clouds in 2d image space. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 12255–12264, 2020. 2

2020

[30] [30]

V oxnet: A 3d con- volutional neural network for real-time object recognition

Daniel Maturana and Sebastian Scherer. V oxnet: A 3d con- volutional neural network for real-time object recognition. In2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pages 922–928. IEEE, 2015. 1, 2

2015

[31] [31]

Vv-net: V oxel vae net with group convolutions for point cloud segmentation

Hsien-Yu Meng, Lin Gao, Yu-Kun Lai, and Dinesh Manocha. Vv-net: V oxel vae net with group convolutions for point cloud segmentation. InProceedings of the IEEE/CVF international conference on computer vision, pages 8500– 8508, 2019. 1, 2

2019

[32] [32]

Geometry aware physics in- formed neural network surrogate for solving navier–stokes equation (gapinn).Advanced Modeling and Simulation in Engineering Sciences, 9(1):8, 2022

Jan Oldenburg, Finja Borowski, Alper ¨Oner, Klaus-Peter Schmitz, and Michael Stiehm. Geometry aware physics in- formed neural network surrogate for solving navier–stokes equation (gapinn).Advanced Modeling and Simulation in Engineering Sciences, 9(1):8, 2022. 3

2022

[33] [33]

Fast point transformer

Chunghyun Park, Yoonwoo Jeong, Minsu Cho, and Jae- sik Park. Fast point transformer. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 16949–16958, 2022. 2

2022

[34] [34]

Oa-cnns: Omni- adaptive sparse cnns for 3d semantic segmentation

Bohao Peng, Xiaoyang Wu, Li Jiang, Yukang Chen, Heng- shuang Zhao, Zhuotao Tian, and Jiaya Jia. Oa-cnns: Omni- adaptive sparse cnns for 3d semantic segmentation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 21305–21315, 2024. 1, 2

2024

[35] [35]

Pointnet: Deep learning on point sets for 3d classification and segmentation

Charles R Qi, Hao Su, Kaichun Mo, and Leonidas J Guibas. Pointnet: Deep learning on point sets for 3d classification and segmentation. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 652–660,

[36] [36]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017

Charles Ruizhongtai Qi, Li Yi, Hao Su, and Leonidas J Guibas. Pointnet++: Deep hierarchical feature learning on point sets in a metric space.Advances in neural information processing systems, 30, 2017. 1, 2, 6, 7

2017

[37] [37]

Dynamic edge- conditioned filters in convolutional neural networks on graphs

Martin Simonovsky and Nikos Komodakis. Dynamic edge- conditioned filters in convolutional neural networks on graphs. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3693–3702, 2017. 2

2017

[38] [38]

Multi-view convolutional neural networks for 3d shape recognition

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learned-Miller. Multi-view convolutional neural networks for 3d shape recognition. InProceedings of the IEEE in- ternational conference on computer vision, pages 945–953,

[39] [39]

Canonical capsules: Self-supervised cap- sules in canonical pose.Advances in Neural information processing systems, 34:24993–25005, 2021

Weiwei Sun, Andrea Tagliasacchi, Boyang Deng, Sara Sabour, Soroosh Yazdani, Geoffrey E Hinton, and Kwang Moo Yi. Canonical capsules: Self-supervised cap- sules in canonical pose.Advances in Neural information processing systems, 34:24993–25005, 2021. 3

2021

[40] [40]

Kpconv: Flexible and deformable convolution for point clouds

Hugues Thomas, Charles R Qi, Jean-Emmanuel Deschaud, Beatriz Marcotegui, Franc ¸ois Goulette, and Leonidas J Guibas. Kpconv: Flexible and deformable convolution for point clouds. InProceedings of the IEEE/CVF international conference on computer vision, pages 6411–6420, 2019. 1, 2, 6, 7

2019

[41] [41]

Attention is all you need.Advances in Neural Information Processing Systems, 2017

A Vaswani. Attention is all you need.Advances in Neural Information Processing Systems, 2017. 1, 2

2017

[42] [42]

Graph attention convolution for point cloud se- mantic segmentation

Lei Wang, Yuchun Huang, Yaolin Hou, Shenman Zhang, and Jie Shan. Graph attention convolution for point cloud se- mantic segmentation. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 10296–10305, 2019. 2

2019

[43] [43]

Octformer: Octree-based transformers for 3d point clouds.ACM Transactions on Graphics (TOG), 42(4):1–11, 2023

Peng-Shuai Wang. Octformer: Octree-based transformers for 3d point clouds.ACM Transactions on Graphics (TOG), 42(4):1–11, 2023. 1, 2

2023

[44] [44]

O-cnn: Octree-based convolutional neu- ral networks for 3d shape analysis.ACM Transactions On Graphics (TOG), 36(4):1–11, 2017

Peng-Shuai Wang, Yang Liu, Yu-Xiao Guo, Chun-Yu Sun, and Xin Tong. O-cnn: Octree-based convolutional neu- ral networks for 3d shape analysis.ACM Transactions On Graphics (TOG), 36(4):1–11, 2017. 2

2017

[45] [45]

Deep parametric continu- ous convolutional neural networks

Shenlong Wang, Simon Suo, Wei-Chiu Ma, Andrei Pokrovsky, and Raquel Urtasun. Deep parametric continu- ous convolutional neural networks. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 2589–2597, 2018. 2

2018

[46] [46]

Theoretical analysis of the induc- tive biases in deep convolutional networks.Advances in Neu- ral Information Processing Systems, 36:74289–74338, 2023

Zihao Wang and Lei Wu. Theoretical analysis of the induc- tive biases in deep convolutional networks.Advances in Neu- ral Information Processing Systems, 36:74289–74338, 2023. 3

2023

[47] [47]

Pointconv: Deep convolutional networks on 3d point clouds

Wenxuan Wu, Zhongang Qi, and Li Fuxin. Pointconv: Deep convolutional networks on 3d point clouds. InProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pages 9621–9630, 2019. 1, 2, 4

2019

[48] [48]

Point transformer v2: Grouped vector atten- tion and partition-based pooling

Xiaoyang Wu, Yixing Lao, Li Jiang, Xihui Liu, and Heng- shuang Zhao. Point transformer v2: Grouped vector atten- tion and partition-based pooling. InNeurIPS, 2022. 1, 2, 6, 7 10

2022

[49] [49]

Point transformer v3: Simpler faster stronger

Xiaoyang Wu, Li Jiang, Peng-Shuai Wang, Zhijian Liu, Xi- hui Liu, Yu Qiao, Wanli Ouyang, Tong He, and Hengshuang Zhao. Point transformer v3: Simpler faster stronger. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4840–4851, 2024. 1, 2, 6, 7, 8

2024

[50] [50]

Permutation equivariance of trans- formers and its applications

Hengyuan Xu, Liyao Xiang, Hangyu Ye, Dixi Yao, Pengzhi Chu, and Baochun Li. Permutation equivariance of trans- formers and its applications. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5987–5996, 2024. 3

2024

[51] [51]

Spidercnn: Deep learning on point sets with parameterized convolutional filters

Yifan Xu, Tianqi Fan, Mingye Xu, Long Zeng, and Yu Qiao. Spidercnn: Deep learning on point sets with parameterized convolutional filters. InProceedings of the European con- ference on computer vision (ECCV), pages 87–102, 2018. 2, 4

2018

[52] [52]

$SE(3)$ equivariant convolution and transformer in ray space

Yinshuang Xu, Jiahui Lei, and Kostas Daniilidis. $SE(3)$ equivariant convolution and transformer in ray space. In Thirty-seventh Conference on Neural Information Process- ing Systems, 2023. 3

2023

[53] [53]

Learning relationships for multi- view 3d object recognition

Ze Yang and Liwei Wang. Learning relationships for multi- view 3d object recognition. InProceedings of the IEEE/CVF international conference on computer vision, pages 7505– 7514, 2019. 2

2019

[54] [54]

Input-level inductive biases for 3d reconstruction

Wang Yifan, Carl Doersch, Relja Arandjelovi ´c, Joao Car- reira, and Andrew Zisserman. Input-level inductive biases for 3d reconstruction. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 6176–6186, 2022. 3

2022

[55] [55]

Polarnet: An improved grid representation for online lidar point clouds se- mantic segmentation

Yang Zhang, Zixiang Zhou, Philip David, Xiangyu Yue, Ze- rong Xi, Boqing Gong, and Hassan Foroosh. Polarnet: An improved grid representation for online lidar point clouds se- mantic segmentation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 9601–9610, 2020. 2

2020

[56] [56]

Point transformer

Hengshuang Zhao, Li Jiang, Jiaya Jia, Philip HS Torr, and Vladlen Koltun. Point transformer. InProceedings of the IEEE/CVF international conference on computer vision, pages 16259–16268, 2021. 1, 2, 6, 7

2021

[57] [57]

Cylindrical and asymmetrical 3d convolution networks for lidar-based perception.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(10):6807–6822, 2021

Xinge Zhu, Hui Zhou, Tai Wang, Fangzhou Hong, Wei Li, Yuexin Ma, Hongsheng Li, Ruigang Yang, and Dahua Lin. Cylindrical and asymmetrical 3d convolution networks for lidar-based perception.IEEE Transactions on Pattern Anal- ysis and Machine Intelligence, 44(10):6807–6822, 2021. 2 11

2021