arxiv: 2604.10716 · v2 · submitted 2026-04-12 · 💻 cs.AR

Recognition: unknown

L-PCN: A Point Cloud Accelerator Exploiting Spatial Locality through Octree-based Islandization

Yiming Gao , Jieming Yin , Yuxiang Wang , Xiangru Chen , Zhilei Chai , Bowen Jiang , Jiliang Zhang , Herman Lam

Authors on Pith no claims yet

Pith reviewed 2026-05-10 15:24 UTC · model grok-4.3

classification 💻 cs.AR

keywords point cloud networksacceleratorspatial localityoctreeFPGAdata structuringfeature computationislandization

0 comments

The pith

L-PCN partitions point clouds into octree islands to reuse overlapping subset data and cut repetitive feature operations in PCNs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Point cloud networks repeat many feature fetches and computations because the data structuring step gathers overlapping subsets of points. L-PCN adds an Islandization Unit that first builds an octree to group points into islands where subsets share strong spatial correlations, then applies hub-based scheduling to cache and reuse the repeated data inside each island. The unit plugs into existing PCN pipelines without altering their core steps. If the locality is real, this produces measured reductions in feature fetching of 55 to 94 percent and in computation of 45 to 81 percent. When added to prior accelerators on FPGA, it delivers 1.2x to 3.2x extra speedup across typical workloads.

Core claim

Octree-based Islandization partitions a point cloud so that point subsets inside the same island exhibit strong spatial correlation; Hub-based Scheduling then dynamically caches, updates, and reuses the repeated data within each island. Together these steps reduce feature fetching by 55.2 percent to 93.8 percent and feature computation by 45.4 percent to 80.6 percent during the full PCN process, and they deliver 1.2x to 3.2x additional speedup when the Islandization Unit is inserted as a plug-in into state-of-the-art PCN accelerators running on an Intel Arria 10 GX FPGA.

What carries the argument

The Islandization Unit, which performs Octree-based Islandization to create spatially correlated islands and Hub-based Scheduling to exploit intra-island data reuse.

If this is right

Existing PCN accelerators gain 1.2x to 3.2x speedup simply by adding the Islandization Unit as a plug-in.
Feature fetching and computation volumes drop by tens of percent without changing the underlying PCN algorithms.
The same island-level reuse pattern applies to common PCN tasks such as shape classification and part segmentation.
Hardware implementations on FPGA confirm that the theoretical savings appear in real execution time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same octree partitioning could reduce off-chip memory traffic for PCN inference on bandwidth-limited edge hardware.
Similar explicit spatial grouping might help accelerators for other irregular spatial data such as meshes or graphs.
If island size and overlap statistics vary widely across datasets, a dynamic island-size tuner could further stabilize gains.
The approach highlights that data-structuring locality in point clouds is a first-class target for co-designed accelerators.

Load-bearing premise

The spatial locality created by overlapping point subsets in data structuring is both large enough and stable enough that the added partitioning and scheduling overhead never offsets the reported savings.

What would settle it

Run the Islandization Unit on a point-cloud workload whose gathered subsets show far less overlap than the tested cases and check whether measured speedup drops below 1.2x or whether partitioning time dominates total runtime.

Figures

Figures reproduced from arXiv: 2604.10716 by Bowen Jiang, Herman Lam, Jieming Yin, Jiliang Zhang, Xiangru Chen, Yiming Gao, Yuxiang Wang, Zhilei Chai.

**Figure 2.** Figure 2: (a) L-PCN workflow: adding Islandization Steps to exploit spatial [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Details of the two steps in a basic PCN Building Block [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Detailed breakdown and overlap analysis of two major Set Ab [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 5.** Figure 5: Overall architecture of L-PCN The general architecture of L-PCN is shown in [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗

**Figure 6.** Figure 6: Input and output of Sampling Module. Neighbor Search Module: As in a standard PCN, after selecting the central points, the point subsets can be formed by searching the neighboring points around these central points. As shown in [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗

**Figure 5.** Figure 5: To calculate the distance and rank the nearest [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 8.** Figure 8: L-PCN workflow with Islandization Unit. Between Point-subset A and [PITH_FULL_IMAGE:figures/full_fig_p005_8.png] view at source ↗

**Figure 7.** Figure 7: Detailed operations in Pruning Module. Pruning Module: The second input to the DSU is the Input Octree, which is a spatial data structure to regularize the corresponding Input Point Cloud by subdividing the overall 3D space of the point cloud into voxels (i.e., cubic subdivisions) [41] [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 9.** Figure 9: (a) An example of picking five central points as Hub points (Hub P1 to Hub P5) from the Sampled Point Cloud. (b) Searching adjacent Octree nodes [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗

**Figure 11.** Figure 11: Workflow of Hub-based Scheduling. The remaining non-Hub point subsets are processed in a top-down order along the Island List, which naturally yields an inside-to-outside order within each island. As shown at the bottom of [PITH_FULL_IMAGE:figures/full_fig_p007_11.png] view at source ↗

**Figure 10.** Figure 10: Architecture of Partitioning Module. The Partitioning Module first randomly picks a fixed number of points from sampled point cloud to serve as Hub points. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_10.png] view at source ↗

**Figure 12.** Figure 12: Detailed architecture and workflow of the Overlap Detection Module [PITH_FULL_IMAGE:figures/full_fig_p008_12.png] view at source ↗

**Figure 13.** Figure 13: L-PCN architecture with active dataflow. [PITH_FULL_IMAGE:figures/full_fig_p008_13.png] view at source ↗

**Figure 14.** Figure 14: Example of Data Reusing Method with overlap detection. In this [PITH_FULL_IMAGE:figures/full_fig_p009_14.png] view at source ↗

**Figure 15.** Figure 15: Theoretical workload optimization [PITH_FULL_IMAGE:figures/full_fig_p010_15.png] view at source ↗

**Figure 17.** Figure 17: Feature Computation speedup of GDPCA, L-PCN, and Mesorasi. [PITH_FULL_IMAGE:figures/full_fig_p010_17.png] view at source ↗

**Figure 18.** Figure 18: Theoretical workload optimization for PointNeXt and PointVector. [PITH_FULL_IMAGE:figures/full_fig_p011_18.png] view at source ↗

**Figure 19.** Figure 19: Performance Comparison of L-PCN prototype and FractalCloud. [PITH_FULL_IMAGE:figures/full_fig_p011_19.png] view at source ↗

**Figure 20.** Figure 20: Accuracy comparison among traditional method, L-PCN, and [PITH_FULL_IMAGE:figures/full_fig_p011_20.png] view at source ↗

**Figure 21.** Figure 21: Empirical analysis shows that non-overlapping (usually boundary) [PITH_FULL_IMAGE:figures/full_fig_p012_21.png] view at source ↗

**Figure 23.** Figure 23: Detailed specifications and area/power breakdown of the L-PCN [PITH_FULL_IMAGE:figures/full_fig_p012_23.png] view at source ↗

**Figure 22.** Figure 22: Sensitivity Study of Islandization hyperparameters [PITH_FULL_IMAGE:figures/full_fig_p012_22.png] view at source ↗

read the original abstract

Existing Point Cloud Networks (PCNs) have proven to achieve great success in many point cloud tasks such as object part segmentation, shape classification, and so on. The most popular point-based PCNs are usually composed of two sequential steps: Data Structuring (DS) and Feature Computation (FC). In this paper, we first describe an important characteristic of the PCN-specific DS step that has not been addressed in existing PCN accelerators: the spatial locality resulting from overlapping points of the gathered point subsets. Using algorithm-hardware co-design, L-PCN (Locality-aware PCN) proposes two novel techniques to exploit this characteristic to reduce the large amount of repetitive operations in the overall PCN. The first of which is a point cloud partitioning technique, Octree-based Islandization. Using Octree-based adjacency gathering, a point cloud is partitioned into islands in L-PCN, where the point subsets inside the same island exhibit a strong spatial correlation. After partitioning, L-PCN performs the rest of PCN steps at the granularity of islands. The second method of L-PCN is scheduling the intra-island computation with a Hub-based Scheduling to exploit the intra-island data reuse by dynamically caching, updating, and reusing the repeated data. The two methods are implemented in an Islandization Unit, which can be seamlessly integrated into standard PCN workflow. Our evaluation shows that based on our methods for exploiting spatial locality, L-PCN achieves a theoretical reduction in feature fetching ranging from 55.2% to 93.8% and in feature computation ranging from 45.4% to 80.6% during the PCN process. For experimentation, prototype L-PCN accelerators are implemented on the Intel Arria 10 GX FPGA. Experimental results prove that with the Islandization Unit as a plug-in, state-of-the-art PCN accelerators can achieve an additional speedup ranging from 1.2x to 3.2x.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

L-PCN adds octree islandization and hub scheduling to cut repetitive work in point cloud network data structuring, with FPGA speedups, but the net benefit after partitioning overheads needs clearer evidence.

read the letter

The main takeaway is that this paper targets an overlooked spatial locality in the data structuring step of point cloud networks, where overlapping point subsets create repeated feature fetches and computations. It proposes octree-based islandization to partition the cloud into correlated groups and hub-based scheduling to cache and reuse data within those groups, packaged as a plug-in Islandization Unit for existing accelerators. They report theoretical cuts of 55-94% in fetching and 45-81% in computation, plus 1.2x-3.2x extra FPGA speedup on Arria 10 when added to prior designs. That combination of mechanisms is new and directly addresses a real inefficiency in PCN pipelines. The work is solid on the implementation side, with a working prototype that shows measurable gains without requiring changes to the core network layers. The FPGA results give a practical sense of what the locality exploit can deliver in hardware. The soft spot is the overhead accounting. The gains rest on the assumption that island partitioning and dynamic hub management cost less than the savings they produce, yet the abstract and reported numbers do not break out those costs separately or show how they scale with varying point density and neighbor radii across layers. Without that breakdown or workload-specific measurements, it is hard to judge whether the net speedup holds for all typical datasets or if it shrinks on denser clouds. The citation pattern looks clean and the claims avoid circular fitting. This paper is for hardware designers working on accelerators for irregular 3D data, particularly those already building FPGA or ASIC versions of point-based networks. A reader in that niche would pick up usable ideas on locality-aware partitioning. It deserves peer review because the prototype and concrete speedups make the contribution verifiable, even if revisions are needed to strengthen the overhead analysis.

Referee Report

3 major / 1 minor

Summary. The manuscript presents L-PCN, a locality-aware accelerator for point cloud networks (PCNs) consisting of data structuring (DS) and feature computation (FC) steps. It identifies spatial locality from overlapping point subsets in DS and proposes Octree-based Islandization to partition into correlated 'islands' and Hub-based Scheduling for intra-island reuse via dynamic caching. Implemented as a plug-in Islandization Unit, it claims theoretical reductions of 55.2%-93.8% in feature fetching and 45.4%-80.6% in feature computation, with FPGA experiments showing 1.2x-3.2x additional speedup on state-of-the-art PCN accelerators.

Significance. If the overheads of the proposed techniques prove smaller than the savings across typical workloads, this work could meaningfully advance hardware acceleration for PCNs by exploiting an under-addressed characteristic of the DS step. The co-design approach and plug-in compatibility with existing accelerators are positive aspects that could facilitate adoption. The FPGA implementation on Arria 10 GX provides a practical demonstration, though verification of net gains is needed.

major comments (3)

[Abstract] Abstract: The ranges for theoretical reductions in feature fetching (55.2%-93.8%) and feature computation (45.4%-80.6%) are stated without derivation details, assumptions (e.g., overlap factors, point density, neighbor radius, or cache hit rates), or workload statistics. These percentages are load-bearing for the central claim that locality exploitation yields net gains.
[Evaluation] Evaluation section: The reported speedups (1.2x-3.2x) lack a breakdown of execution time or resource usage for the Islandization Unit (octree construction, partitioning, hub scheduling, dynamic caching) versus the DS/FC savings. Without this, it cannot be verified that added costs do not offset benefits for varying point densities or PCN layers.
[Evaluation] Evaluation section: Workload details (point cloud sizes, datasets, PCN layer counts), baseline accelerator descriptions, and error bars/variance on speedup measurements are absent. These omissions undermine assessment of the robustness of the claimed speedups.

minor comments (1)

[Abstract] The abstract introduces the 'Islandization Unit' without a concise description of its integration point in the standard PCN workflow or a forward reference to the relevant figure or section.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will incorporate revisions to provide the requested details and breakdowns, strengthening the presentation of our results.

read point-by-point responses

Referee: [Abstract] Abstract: The ranges for theoretical reductions in feature fetching (55.2%-93.8%) and feature computation (45.4%-80.6%) are stated without derivation details, assumptions (e.g., overlap factors, point density, neighbor radius, or cache hit rates), or workload statistics. These percentages are load-bearing for the central claim that locality exploitation yields net gains.

Authors: We agree that the abstract would benefit from additional context on these ranges. The percentages are derived from an analytical model of point subset overlaps under octree-based partitioning, using standard PCN parameters (k-nearest neighbors with radius 0.2-0.5, point densities from ModelNet40 and ShapeNet, and cache hit rates based on intra-island correlation). In the revised version, we will add a concise derivation summary and key assumptions to the abstract while expanding the full formulas, workload statistics, and sensitivity analysis in Section 4 (Evaluation). revision: yes
Referee: [Evaluation] Evaluation section: The reported speedups (1.2x-3.2x) lack a breakdown of execution time or resource usage for the Islandization Unit (octree construction, partitioning, hub scheduling, dynamic caching) versus the DS/FC savings. Without this, it cannot be verified that added costs do not offset benefits for varying point densities or PCN layers.

Authors: This is a fair point for verifying net gains. The current manuscript reports aggregate speedups on the Arria 10 GX but does not isolate Islandization Unit overheads. We will revise the evaluation section to include a detailed breakdown: cycle counts and resource utilization (LUTs, DSPs, BRAMs) for octree construction, partitioning, and hub scheduling; net speedup after overhead subtraction; and results across varying point densities (1024-8192 points) and PCN layers. This will confirm that DS/FC savings exceed the added costs in the evaluated cases. revision: yes
Referee: [Evaluation] Evaluation section: Workload details (point cloud sizes, datasets, PCN layer counts), baseline accelerator descriptions, and error bars/variance on speedup measurements are absent. These omissions undermine assessment of the robustness of the claimed speedups.

Authors: We acknowledge these omissions limit robustness assessment. The revised evaluation will explicitly list: point cloud sizes and datasets (ModelNet40, ShapeNet with 1024-4096 points), PCN architectures and layer counts (PointNet, PointNet++, DGCNN), baseline accelerator configurations (e.g., prior FPGA designs from cited works), and error bars with standard deviation from 5-10 runs per configuration to demonstrate measurement variance and consistency. revision: yes

Circularity Check

0 steps flagged

No circularity: claims rest on proposed algorithms and direct FPGA measurements

full rationale

The paper introduces Octree-based Islandization and Hub-based Scheduling as algorithmic techniques to exploit spatial locality in the DS step of PCNs. The reported theoretical reductions (55.2%-93.8% fetching, 45.4%-80.6% computation) and measured speedups (1.2x-3.2x) are presented as outcomes of evaluation on FPGA prototypes with the Islandization Unit as a plug-in. No equations, fitted parameters, or self-citations are shown that reduce the central claims to tautological inputs by construction. The derivation chain is self-contained against external benchmarks (hardware implementation and workload measurements), with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on one domain assumption about spatial locality in point gathering and introduces one new hardware component; no free parameters or additional invented entities are described in the abstract.

axioms (1)

domain assumption Point subsets gathered during PCN data structuring exhibit substantial spatial locality due to overlapping points.
Described as an important unaddressed characteristic of the DS step.

invented entities (1)

Islandization Unit no independent evidence
purpose: Seamless plug-in module that performs octree partitioning and hub scheduling inside existing PCN accelerators.
New architectural component proposed to realize the two techniques.

pith-pipeline@v0.9.0 · 5687 in / 1371 out tokens · 33727 ms · 2026-05-10T15:24:32.734362+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching
cs.LG 2026-04 accept novelty 7.0

FlashFPS accelerates FPS via candidate/iteration pruning and inter-layer caching, delivering 5.16x GPU speedup and 2.69x on accelerators with negligible accuracy loss.

Reference graph

Works this paper leans on

55 extracted references · 4 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Bit-pragmatic deep neural network computing,

J. Albericio, A. Delm ´as, P. Judd, S. Sharify, G. O’Leary, R. Genov, and A. Moshovos, “Bit-pragmatic deep neural network computing,” in Proceedings of the 50th annual IEEE/ACM international symposium on microarchitecture, 2017, pp. 382–394

2017
[2]

Semantickitti: A dataset for semantic scene understanding of lidar sequences,

J. Behley, M. Garbade, A. Milioto, J. Quenzel, S. Behnke, C. Stachniss, and J. Gall, “Semantickitti: A dataset for semantic scene understanding of lidar sequences,” inProceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 9297–9307

2019
[3]

Efficient radius neighbor search in three-dimensional point clouds,

J. Behley, V . Steinhage, and A. B. Cremers, “Efficient radius neighbor search in three-dimensional point clouds,” in2015 IEEE international conference on robotics and automation (ICRA). IEEE, 2015, pp. 3625– 3630

2015
[4]

ShapeNet: An Information-Rich 3D Model Repository

A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. Huang, Z. Li, S. Savarese, M. Savva, S. Song, H. Suet al., “Shapenet: An information- rich 3d model repository,”arXiv preprint arXiv:1512.03012, 2015

work page internal anchor Pith review arXiv 2015
[5]

Point cloud acceleration by exploiting geometric similarity,

C. Chen, X. Zou, H. Shao, Y . Li, and K. Li, “Point cloud acceleration by exploiting geometric similarity,” inProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture, 2023, pp. 1135–1147

2023
[6]

Parallelnn: A parallel octree-based nearest neighbor search accelerator for 3d point clouds,

F. Chen, R. Ying, J. Xue, F. Wen, and P. Liu, “Parallelnn: A parallel octree-based nearest neighbor search accelerator for 3d point clouds,” in2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2023, pp. 403–414

2023
[7]

Rubik: A hierarchical architecture for efficient graph learning,

X. Chen, Y . Wang, X. Xie, X. Hu, A. Basak, L. Liang, M. Yan, L. Deng, Y . Ding, Z. Duet al., “Rubik: A hierarchical architecture for efficient graph learning,”arXiv preprint arXiv:2009.12495, 2020

work page arXiv 2009
[8]

Multi-view 3d object detection network for autonomous driving,

X. Chen, H. Ma, J. Wan, B. Li, and T. Xia, “Multi-view 3d object detection network for autonomous driving,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 1907–1915

2017
[9]

Pointcim: A computing-in- memory architecture for accelerating deep point cloud analytics,

X.-J. Chen, H.-P. Chen, and C.-L. Yang, “Pointcim: A computing-in- memory architecture for accelerating deep point cloud analytics,” in 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024, pp. 1309–1322

2024
[10]

Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural net- works,

Y .-H. Chen, T. Krishna, J. S. Emer, and V . Sze, “Eyeriss: An energy- efficient reconfigurable accelerator for deep convolutional neural net- works,”IEEE journal of solid-state circuits, vol. 52, no. 1, pp. 127–138, 2016

2016
[11]

Scannet: Richly-annotated 3d reconstructions of indoor scenes,

A. Dai, A. X. Chang, M. Savva, M. Halber, T. Funkhouser, and M. Nießner, “Scannet: Richly-annotated 3d reconstructions of indoor scenes,” inProc. Computer Vision and Pattern Recognition (CVPR), IEEE, 2017

2017
[12]

Pointvector: a vector representation in point cloud analysis,

X. Deng, W. Zhang, Q. Ding, and X. Zhang, “Pointvector: a vector representation in point cloud analysis,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 9455– 9465

2023
[13]

Neural accel- eration for general-purpose approximate programs,

H. Esmaeilzadeh, A. Sampson, L. Ceze, and D. Burger, “Neural accel- eration for general-purpose approximate programs,”Communications of the ACM, vol. 58, no. 1, pp. 105–115, 2014

2014
[14]

Crescent: taming memory irregularities for accelerating deep point cloud analytics,

Y . Feng, G. Hammonds, Y . Gan, and Y . Zhu, “Crescent: taming memory irregularities for accelerating deep point cloud analytics,” inProceedings of the 49th Annual International Symposium on Computer Architecture, 2022, pp. 962–977

2022
[15]

Lumina: Real-time mobile neural rendering by exploiting computational redundancy,

Y . Feng, W. Lin, Y . Cheng, Z. Liu, J. Leng, M. Guo, C. Chen, S. Sun, and Y . Zhu, “Lumina: Real-time mobile neural rendering by exploiting computational redundancy,”arXiv preprint arXiv:2506.05682, 2025

work page arXiv 2025
[16]

Mesorasi: Architecture support for point cloud analytics via delayed-aggregation,

Y . Feng, B. Tian, T. Xu, P. Whatmough, and Y . Zhu, “Mesorasi: Architecture support for point cloud analytics via delayed-aggregation,” in2020 53rd Annual IEEE/ACM International Symposium on Microar- chitecture (MICRO). IEEE, 2020, pp. 1037–1050

2020
[17]

Simple and efficient traversal methods for quadtrees and octrees,

S. F. Frisken and R. N. Perry, “Simple and efficient traversal methods for quadtrees and octrees,”Journal of Graphics Tools, vol. 7, no. 3, pp. 1–11, 2002

2002
[18]

Fractalcloud: A fractal-inspired architecture for efficient large-scale point cloud processing,

Y . Fu, C. Zhou, H. Ye, B. Duan, Q. Huang, C. Wei, C. Guo, H. Li, Y . Chenet al., “Fractalcloud: A fractal-inspired architecture for efficient large-scale point cloud processing,”arXiv preprint arXiv:2511.07665, 2025

work page arXiv 2025
[19]

Hgpcn: A heterogeneous architecture for e2e embedded point cloud inference,

Y . Gao, C. Jiang, W. Piard, X. Chen, B. Patel, and H. Lam, “Hgpcn: A heterogeneous architecture for e2e embedded point cloud inference,” in2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2024, pp. 1588–1600

2024
[20]

I-gcn: A graph convolutional network accelerator with runtime locality enhancement through islandization,

T. Geng, C. Wu, Y . Zhang, C. Tan, C. Xie, H. You, M. Herbordt, Y . Lin, and A. Li, “I-gcn: A graph convolutional network accelerator with runtime locality enhancement through islandization,” inMICRO-54: 54th annual IEEE/ACM international symposium on microarchitecture, 2021, pp. 1051–1063

2021
[21]

3d semantic segmen- tation with submanifold sparse convolutional networks,

B. Graham, M. Engelcke, and L. Van Der Maaten, “3d semantic segmen- tation with submanifold sparse convolutional networks,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9224–9232

2018
[22]

Deep learning for 3d point clouds: A survey,

Y . Guo, H. Wang, Q. Hu, H. Liu, L. Liu, and M. Bennamoun, “Deep learning for 3d point clouds: A survey,”IEEE transactions on pattern analysis and machine intelligence, vol. 43, no. 12, pp. 4338–4364, 2020

2020
[23]

Pointisa: Isa-extensions for efficient point cloud analytics via architecture and algorithm co-design,

M. Han, L. Wang, L. Xiao, H. Zhang, B. Jiang, X. Xie, J. Zhu, S. Wei, and L. Liu, “Pointisa: Isa-extensions for efficient point cloud analytics via architecture and algorithm co-design,” inProceedings of the 58th IEEE/ACM International Symposium on Microarchitecture, 2025, pp. 1867–1881

2025
[24]

Splatonic: Architectural support for 3d gaussian splatting slam via sparse processing,

X. Huang, H. Zhu, T. Ma, Y . Xiong, F. Liu, Z. He, Y . Gan, Z. Liu, J. Leng, Y . Fenget al., “Splatonic: Architectural support for 3d gaussian splatting slam via sparse processing,” in2026 IEEE International Sym- posium on High Performance Computer Architecture (HPCA). IEEE, 2026, pp. 1–14

2026
[25]

A 3d-deep-learning- based augmented reality calibration method for robotic environments using depth sensor data,

L. K ¨astner, V . C. Frasineanu, and J. Lambrecht, “A 3d-deep-learning- based augmented reality calibration method for robotic environments using depth sensor data,” in2020 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2020, pp. 1135–1141

2020
[26]

Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable intercon- nects,

H. Kwon, A. Samajdar, and T. Krishna, “Maeri: Enabling flexible dataflow mapping over dnn accelerators via reconfigurable intercon- nects,”ACM Sigplan Notices, vol. 53, no. 2, pp. 461–475, 2018

2018
[27]

Pointpillars: Fast encoders for object detection from point clouds,

A. H. Lang, S. V ora, H. Caesar, L. Zhou, J. Yang, and O. Beijbom, “Pointpillars: Fast encoders for object detection from point clouds,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 697–12 705

2019
[28]

Pointgrid: A deep network for 3d shape under- standing,

T. Le and Y . Duan, “Pointgrid: A deep network for 3d shape under- standing,” inProceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 9204–9214

2018
[29]

Simdiff: Point cloud acceleration by utilizing spatial similarity and differential execution,

Y . Li, M. Li, C. Chen, X. Zou, H. Shao, F. Tang, and K. Li, “Simdiff: Point cloud acceleration by utilizing spatial similarity and differential execution,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 44, no. 2, pp. 568–581, 2024

2024
[30]

Metasapiens: Real-time neural rendering with efficiency-aware pruning and accelerated foveated rendering,

W. Lin, Y . Feng, and Y . Zhu, “Metasapiens: Real-time neural rendering with efficiency-aware pruning and accelerated foveated rendering,” in Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2025, pp. 669–682

2025
[31]

Pointacc: Efficient point cloud accelerator,

Y . Lin, Z. Zhang, H. Tang, H. Wang, and S. Han, “Pointacc: Efficient point cloud accelerator,” inMICRO-54: 54th Annual IEEE/ACM Inter- national Symposium on Microarchitecture, 2021, pp. 449–461

2021
[32]

Towards accurate and efficient 3d object detection for autonomous driving: A mixture of experts computing system on edge,

L. Liu, B. Su, J. Jiang, G. Wu, C. Guo, C. Xu, and H. F. Yang, “Towards accurate and efficient 3d object detection for autonomous driving: A mixture of experts computing system on edge,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 25 903–25 913

2025
[33]

Gpu octrees and optimized search,

D. Madeira, A. Montenegro, E. Clua, and T. Lewiner, “Gpu octrees and optimized search,” inProceedings of VIII Brazilian Symposium on Games and Digital Entertainment, 2009, pp. 73–76

2009
[34]

Diffy: A d ´ej`a vu-free differ- ential deep neural network accelerator,

M. Mahmoud, K. Siu, and A. Moshovos, “Diffy: A d ´ej`a vu-free differ- ential deep neural network accelerator,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2018, pp. 134–147

2018
[35]

A computer oriented geodetic data base and a new technique in file sequencing,

G. M. Morton, “A computer oriented geodetic data base and a new technique in file sequencing,” 1966

1966
[36]

Exploiting locality in graph analytics through hardware-accelerated traversal scheduling,

A. Mukkara, N. Beckmann, M. Abeydeera, X. Ma, and D. Sanchez, “Exploiting locality in graph analytics through hardware-accelerated traversal scheduling,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). IEEE, 2018, pp. 1–14

2018
[37]

Cache-guided scheduling: Exploiting caches to maximize locality in graph processing,

A. Mukkara, N. Beckmann, and D. Sanchez, “Cache-guided scheduling: Exploiting caches to maximize locality in graph processing,”AGP’17, 2017

2017
[38]

Quicknn: Memory and perfor- mance optimization of kd tree based nearest neighbor search for 3d point clouds,

R. Pinkham, S. Zeng, and Z. Zhang, “Quicknn: Memory and perfor- mance optimization of kd tree based nearest neighbor search for 3d point clouds,” in2020 IEEE International symposium on high performance computer architecture (HPCA). IEEE, 2020, pp. 180–192

2020
[39]

Pointnet++: Deep hierarchical feature learning on point sets in a metric space,

C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “Pointnet++: Deep hierarchical feature learning on point sets in a metric space,”Advances in neural information processing systems, vol. 30, 2017

2017
[40]

Pointnext: Revisiting pointnet++ with improved training and scaling strategies,

G. Qian, Y . Li, H. Peng, J. Mai, H. Hammoud, M. Elhoseiny, and B. Ghanem, “Pointnext: Revisiting pointnet++ with improved training and scaling strategies,”Advances in neural information processing systems, vol. 35, pp. 23 192–23 204, 2022

2022
[41]

Octree-based indexing for 3d pointclouds within an oracle spatial dbms,

B. Sch ¨on, A. S. M. Mosa, D. F. Laefer, and M. Bertolotto, “Octree-based indexing for 3d pointclouds within an oracle spatial dbms,”Computers & Geosciences, vol. 51, pp. 430–438, 2013

2013
[42]

Deep learning 3d shape surfaces using geometry images,

A. Sinha, J. Bai, and K. Ramani, “Deep learning 3d shape surfaces using geometry images,” inComputer Vision–ECCV 2016: 14th Euro- pean Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14. Springer, 2016, pp. 223–240

2016
[43]

Drq: dynamic region-based quantization for deep neural network ac- celeration,

Z. Song, B. Fu, F. Wu, Z. Jiang, L. Jiang, N. Jing, and X. Liang, “Drq: dynamic region-based quantization for deep neural network ac- celeration,” in2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). IEEE, 2020, pp. 1010–1021

2020
[44]

Visualization and labeling of point clouds in virtual reality,

J. D. Stets, Y . Sun, W. Corning, and S. W. Greenwald, “Visualization and labeling of point clouds in virtual reality,” inSIGGRAPH Asia 2017 Posters, 2017, pp. 1–2

2017
[45]

Modelnet40-c: Arobustness benchmark for 3d point cloud recognition under corruption

J. Sun, Q. Zhang, B. Kailkhura, Z. Yu, C. Xiao, and Z. M. Mao, “Modelnet40-c: Arobustness benchmark for 3d point cloud recognition under corruption.”
[46]

Paver: Locality graph-based thread block scheduling for gpus,

D. Tripathy, A. Abdolrashidi, L. N. Bhuyan, L. Zhou, and D. Wong, “Paver: Locality graph-based thread block scheduling for gpus,”ACM Transactions on Architecture and Code Optimization (TACO), vol. 18, no. 3, pp. 1–26, 2021

2021
[47]

Local spectral graph convolution for point set feature learning,

C. Wang, B. Samari, and K. Siddiqi, “Local spectral graph convolution for point set feature learning,” inProceedings of the European confer- ence on computer vision (ECCV), 2018, pp. 52–66

2018
[48]

Dynamic graph cnn for learning on point clouds,

Y . Wang, Y . Sun, Z. Liu, S. E. Sarma, M. M. Bronstein, and J. M. Solomon, “Dynamic graph cnn for learning on point clouds,”Acm Transactions On Graphics (tog), vol. 38, no. 5, pp. 1–12, 2019

2019
[49]

Point cloud and visual feature-based tracking method for an augmented reality-aided mechanical assembly system,

Y . Wang, S. Zhang, B. Wan, W. He, and X. Bai, “Point cloud and visual feature-based tracking method for an augmented reality-aided mechanical assembly system,”The International Journal of Advanced Manufacturing Technology, vol. 99, no. 9, pp. 2341–2352, 2018

2018
[50]

Tigris: Architecture and algorithms for 3d perception in point clouds,

T. Xu, B. Tian, and Y . Zhu, “Tigris: Architecture and algorithms for 3d perception in point clouds,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 2019, pp. 629–642

2019
[51]

Bingogcn: Towards scalable and efficient gnn accelera- tion with fine-grained partitioning and slt,

J. Yan, H. Ito, Y . Nagahara, K. Kawamura, M. Motomura, T. Van Chu, and D. Fujiki, “Bingogcn: Towards scalable and efficient gnn accelera- tion with fine-grained partitioning and slt,” inProceedings of the 52nd Annual International Symposium on Computer Architecture, 2025, pp. 1910–1924

2025
[52]

Pixor: Real-time 3d object detection from point clouds,

B. Yang, W. Luo, and R. Urtasun, “Pixor: Real-time 3d object detection from point clouds,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2018, pp. 7652–7660

2018
[53]

Edgepc: Efficient deep learning analytics for point clouds on edge devices,

Z. Ying, S. Bhuyan, Y . Kang, Y . Zhang, M. T. Kandemir, and C. R. Das, “Edgepc: Efficient deep learning analytics for point clouds on edge devices,” inProceedings of the 50th Annual International Symposium on Computer Architecture, 2023, pp. 1–14

2023
[54]

An energy- efficient 3d point cloud neural network accelerator with efficient filter pruning, mlp fusion, and dual-stream sampling,

C. Zhou, Y . Fu, M. Liu, S. Qiu, G. Li, Y . He, and H. Jiao, “An energy- efficient 3d point cloud neural network accelerator with efficient filter pruning, mlp fusion, and dual-stream sampling,” in2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD). IEEE, 2023, pp. 1–9

2023
[55]

23.4 nebula: A 28nm 109.8 tops/w 3d pnn accelerator fea- turing adaptive partition, multi-skipping, and block-wise aggregation,

C. Zhou, T. Huang, Y . Ma, Y . Fu, X. Song, S. Qiu, J. Sun, M. Liu, G. Li, Y . Heet al., “23.4 nebula: A 28nm 109.8 tops/w 3d pnn accelerator fea- turing adaptive partition, multi-skipping, and block-wise aggregation,” in2025 IEEE International Solid-State Circuits Conference (ISSCC), vol. 68. IEEE, 2025, pp. 412–414

2025