pith. sign in

arxiv: 2512.08089 · v2 · pith:AUTNV3EAnew · submitted 2025-12-08 · 💻 cs.AR

Efficient and Accurate Graph Classification with Hyperdimensional Computing on FPGA

Pith reviewed 2026-05-21 17:04 UTC · model grok-4.3

classification 💻 cs.AR
keywords graph classificationhyperdimensional computingFPGA acceleratorNyström approximationedge computingsparse matrix-vector multiplicationdeterminantal point processeshardware optimization
0
0 comments X

The pith

HyperX accelerates Nyström-based hyperdimensional graph classification on FPGA with major speed and energy gains plus higher accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that hyperdimensional computing for graph classification becomes practical on edge FPGAs once four specific hardware bottlenecks are removed. It does this by pairing uniform sampling with determinantal point processes for landmark choice, streaming the projection matrix from off-chip memory, using a perfect hash for fast codebook access, and statically balancing sparse matrix-vector work. A sympathetic reader would care because edge devices could then run graph tasks locally in real time with far less power than CPUs or GPUs while also seeing modestly better results. If the approach holds, it would let battery-powered systems handle graph-structured data without constant cloud round-trips.

Core claim

HyperX is the first complete FPGA accelerator for Nyström-enhanced hyperdimensional graph classification at the edge. It combines a hybrid landmark selection strategy that mixes uniform sampling with determinantal point processes, a streaming architecture that maximizes external memory bandwidth for the projection matrix, a minimal-perfect-hash engine that maps keys in constant time, and sparsity-aware SpMV engines that use static load balancing. On an AMD Zynq UltraScale+ ZCU104 device the design reports 6.85× speedup and 169× better energy efficiency versus optimized CPU code, 4.32× speedup and 314× energy efficiency versus optimized GPU code, and a 3.4 % average accuracy lift across TUDat

What carries the argument

HyperX's four co-designed optimizations: hybrid landmark selection via uniform sampling plus determinantal point processes, streaming Nyström projection, minimal-perfect-hash lookup, and static-load-balanced sparse matrix-vector engines.

If this is right

  • Graph classification inference becomes feasible in real time on power-constrained edge FPGAs.
  • Energy use for the same task falls by two orders of magnitude relative to CPU or GPU baselines.
  • Classification accuracy rises by roughly 3.4 % on standard graph benchmarks.
  • Load imbalance from irregular graph sparsity is removed by the static balancing scheme.
  • Memory pressure from large Nyström matrices is eased by the streaming projection path.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same streaming-plus-hash pattern could be reused for other kernel approximations on FPGA.
  • Hybrid sampling with determinantal point processes may improve accuracy in non-graph HDC tasks.
  • The design points toward future ASIC versions that could push energy efficiency still higher.
  • Extending the static load balancer to dynamic graphs would be a direct next hardware step.

Load-bearing premise

The hybrid landmark selection that mixes uniform sampling with determinantal point processes reduces redundancy in the chosen samples, raises accuracy, and does so without creating large extra memory or compute cost on the FPGA fabric.

What would settle it

A side-by-side run on the same ZCU104 FPGA and TUDataset graphs that replaces the hybrid landmark selector with plain uniform sampling and measures whether accuracy drops below the reported 3.4 % gain while FPGA latency or power rises noticeably.

Figures

Figures reproduced from arXiv: 2512.08089 by Dhruv Parikh, Jebacyril Arockiaraj, Viktor Prasanna.

Figure 1
Figure 1. Figure 1: Overview of the NysX FPGA accelerator. NysX comprises [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Locality Sensitive Hashing Unit (LSHU). It integrates a DenseMV unit (for [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 4
Figure 4. Figure 4: Streaming Nyström Encoding Engine architecture. [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: NysX compute flow across hops: input graph [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Speedup over CPU baseline (no DPP) across datasets. See [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effect of static load balancing in SpMV stages (LSHU/KSE): [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: Classification accuracy (%) on TU datasets. [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
read the original abstract

Real-time, energy-efficient inference on edge devices is essential for graph classification across a range of applications. Hyperdimensional Computing (HDC) is a brain-inspired computing paradigm that encodes input features into low-precision, high-dimensional vectors with simple element-wise operations, making it well-suited for resource-constrained edge platforms. Recent work enhances HDC accuracy for graph classification via Nystr\"om kernel approximations. Edge acceleration of such methods faces several challenges: (i) redundancy among (landmark) samples selected via uniform sampling, (ii) storing the Nystr\"om projection matrix under limited on-chip memory, (iii) expensive, contention-prone codebook lookups, and (iv) load imbalance due to irregular sparsity in SpMV. To address these challenges, we propose HyperX, the first end-to-end FPGA accelerator for Nystr\"om-based HDC graph classification at the edge. HyperX integrates four key optimizations: (i) a hybrid landmark selection strategy combining uniform sampling with determinantal point processes (DPPs) to reduce redundancy while improving accuracy; (ii) a streaming architecture for Nystr\"om projection matrix maximizing external memory bandwidth utilization; (iii) a minimal-perfect-hash lookup engine enabling $O(1)$ key-to-index mapping; and (iv) sparsity-aware SpMV engines with static load balancing. Implemented on an AMD Zynq UltraScale+ (ZCU104) FPGA, HyperX achieves $6.85\times$ ($4.32\times$) speedup and $169\times$ ($314\times$) energy efficiency gains over optimized CPU (GPU) baselines, while improving classification accuracy by $3.4\%$ on average across TUDataset benchmarks, a widely used standard for graph classification.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents HyperX, the first end-to-end FPGA accelerator for Nyström-based Hyperdimensional Computing (HDC) for graph classification on edge devices. It addresses challenges in landmark redundancy, memory for the projection matrix, codebook lookups, and SpMV load imbalance via four optimizations: hybrid uniform+DPP landmark selection, streaming Nyström projection, minimal-perfect-hash lookup, and sparsity-aware SpMV with static balancing. Implemented on AMD Zynq UltraScale+ ZCU104, it reports 6.85× (4.32×) speedup and 169× (314×) energy efficiency over optimized CPU (GPU) baselines, plus 3.4% average accuracy gain across TUDataset benchmarks.

Significance. If the performance and accuracy claims hold under the reported conditions, the work demonstrates a concrete, measured path to energy-efficient edge inference for graph classification using HDC with Nyström approximations. The end-to-end FPGA implementation with external baselines and standard TUDataset benchmarks provides a useful data point for hardware acceleration of approximate kernel methods in resource-constrained settings.

major comments (2)
  1. [Section 3.2 (Hybrid Landmark Selection)] The hybrid landmark selection (uniform sampling + DPP) is presented as reducing redundancy while improving accuracy without prohibitive on-FPGA overhead, and the 3.4% average accuracy gain is attributed in part to this choice. No ablation isolating DPP versus uniform-only is reported for either accuracy on TUDataset or for on-chip resource usage (LUT/BRAM) and latency; if DPP computation is entirely off-chip, its contribution to inference efficiency is limited to training. This is load-bearing for the accuracy and overall efficiency claims.
  2. [Section 5 (Evaluation)] The experimental results section reports concrete speed-up, energy, and accuracy deltas but omits full baseline implementation details (exact CPU/GPU libraries, optimization flags, and code versions), dataset splits or cross-validation protocol for TUDataset, and error bars or run-to-run variance for the reported metrics. These details are needed to assess reproducibility of the 6.85×/169× and 4.32×/314× gains.
minor comments (2)
  1. [Abstract] The abstract states TUDataset is 'a widely used standard'; adding a short citation or reference to the dataset paper would improve clarity for readers unfamiliar with the benchmark.
  2. [Section 4 (Architecture)] Notation for the Nyström projection matrix and the minimal-perfect-hash function could be introduced earlier with a small table of symbols to aid readers following the streaming architecture description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments in detail below, offering clarifications and outlining the revisions we will make to enhance the paper.

read point-by-point responses
  1. Referee: [Section 3.2 (Hybrid Landmark Selection)] The hybrid landmark selection (uniform sampling + DPP) is presented as reducing redundancy while improving accuracy without prohibitive on-FPGA overhead, and the 3.4% average accuracy gain is attributed in part to this choice. No ablation isolating DPP versus uniform-only is reported for either accuracy on TUDataset or for on-chip resource usage (LUT/BRAM) and latency; if DPP computation is entirely off-chip, its contribution to inference efficiency is limited to training. This is load-bearing for the accuracy and overall efficiency claims.

    Authors: We appreciate the referee pointing out the need for an ablation study. The hybrid landmark selection strategy, which combines uniform sampling with DPP, is applied during the training phase to select a more diverse set of landmarks. This selection process occurs off-chip prior to inference, and the resulting landmarks are used to construct the Nyström projection matrix for on-FPGA execution. Consequently, the DPP computation does not contribute to on-chip resource consumption or inference latency. The accuracy improvement stems from the higher quality of the selected landmarks, which enhances the kernel approximation. To address this comment, we will add an ablation study in the revised version of the manuscript. Specifically, we will report the classification accuracy on TUDataset benchmarks using uniform sampling alone compared to the hybrid approach, thereby isolating the contribution of DPP to the observed 3.4% average accuracy gain. We will also explicitly state in Section 3.2 that landmark selection is an offline process. revision: yes

  2. Referee: [Section 5 (Evaluation)] The experimental results section reports concrete speed-up, energy, and accuracy deltas but omits full baseline implementation details (exact CPU/GPU libraries, optimization flags, and code versions), dataset splits or cross-validation protocol for TUDataset, and error bars or run-to-run variance for the reported metrics. These details are needed to assess reproducibility of the 6.85×/169× and 4.32×/314× gains.

    Authors: We agree that providing comprehensive implementation details is essential for reproducibility. In the revised manuscript, we will augment Section 5 with the following information: detailed descriptions of the CPU and GPU baseline implementations, including the specific libraries (e.g., optimized BLAS implementations for CPU and cuBLAS for GPU), compiler optimization flags, and software versions used; the exact dataset splitting and cross-validation protocol employed for the TUDataset benchmarks, which adheres to standard practices; and error bars or standard deviations for the reported performance and accuracy metrics, obtained by repeating experiments over multiple random seeds. These additions will allow readers to better evaluate the reported speedups and efficiency gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity: claims rest on direct hardware measurements and external benchmarks

full rationale

The paper's core results—6.85×/4.32× speedups, 169×/314× energy gains, and 3.4% accuracy lift—are presented as direct empirical measurements from FPGA implementation on ZCU104 against separate CPU/GPU baselines and TUDataset graphs. The hybrid landmark selection (uniform + DPP) is described as an engineering optimization to address redundancy, not as a mathematical derivation or fitted parameter that is then renamed as a prediction. No equations, self-definitional loops, or load-bearing self-citations appear in the provided text that would reduce the reported gains to the inputs by construction. The work is self-contained against external baselines and standard datasets.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The design rests on standard FPGA memory-bandwidth and sparsity assumptions plus prior HDC and Nyström literature; no new physical constants, ad-hoc fitted scalars, or invented particles are introduced.

pith-pipeline@v0.9.0 · 5852 in / 1297 out tokens · 105338 ms · 2026-05-21T17:04:27.636091+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 3 internal anchors

  1. [1]

    Advanced Micro Devices, Inc. 2023. AXI SmartConnect LogiCORE IP Product Guide (PG247). https://docs.amd.com/r/en-US/pg247-smartconnect Version 4.1

  2. [2]

    Advanced Micro Devices, Inc. 2024. AMD Vitis Unified IDE, Version 2024.2. https://docs.amd.com/r/en-US/Vitis-Tutorials-AI-Engine-Development/ Vitis-Unified-Software-Development-Platform-2024.2-Documentation Release 2024.2

  3. [3]

    2024.Vitis High-Level Synthesis User Guide (UG1399)

    Advanced Micro Devices, Inc. 2024.Vitis High-Level Synthesis User Guide (UG1399). https://docs.amd.com/r/en-US/ug1399-vitis-hls Version 2024.2

  4. [4]

    AMD Xilinx. [n. d.]. Zynq UltraScale+ MPSoC ZCU104 Evaluation Kit. https: //www.xilinx.com/products/boards-and-kits/zcu104.html. Accessed: Sep. 25, 2025

  5. [5]

    Karsten M Borgwardt, Cheng Soon Ong, Stefan Schönauer, SVN Vishwanathan, Alex J Smola, and Hans-Peter Kriegel. 2005. Protein function prediction via graph kernels. Bioinformatics 21, suppl_1 (2005), i47–i56

  6. [6]

    Cheng-Yang Chang, Yu-Chuan Chuang, Chi-Tse Huang, and An-Yeu Wu. 2023. Recent Progress and Development of Hyperdimensional Computing (HDC) for Edge Intelligence. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 13, 1 (2023), 119–136. https://doi.org/10.1109/JETCAS.2023.3242767

  7. [7]

    Moses S Charikar. 2002. Similarity estimation techniques from rounding algo- rithms. In Proceedings of the thiry-fourth annual ACM symposium on Theory of computing. 380–388

  8. [8]

    Hanning Chen, Yang Ni, Ali Zakeri, Zhuowen Zou, Sanggeon Yun, Fei Wen, Behnam Khaleghi, Narayan Srinivasa, Hugo Latapie, and Mohsen Imani. 2024. HDReason: Algorithm-Hardware Codesign for Hyperdimensional Knowledge Graph Reasoning. arXiv:2403.05763 [cs.AR] https://arxiv.org/abs/2403.05763

  9. [9]

    Hanning Chen, Ali Zakeri, Fei Wen, Hamza Barkam, and Mohsen Imani. 2023. HyperGRAF: Hyperdimensional Graph-Based Reasoning Acceleration on FPGA. 34–41. https://doi.org/10.1109/FPL60245.2023.00013

  10. [10]

    Sohum Datta, Ryan AG Antonio, Aldrin RS Ison, and Jan M Rabaey. 2019. A programmable hyper-dimensional processor architecture for human-centric IoT. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 9, 3 (2019), 439–452

  11. [11]

    Asim Kumar Debnath, Rosa L Lopez de Compadre, Gargi Debnath, Alan J Shus- terman, and Corwin Hansch. 1991. Structure-activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular or- bital energies and hydrophobicity. Journal of medicinal chemistry 34, 2 (1991), 786–797

  12. [12]

    Paul D Dobson and Andrew J Doig. 2003. Distinguishing enzyme structures from non-enzymes without alignments. Journal of molecular biology 330, 4 (2003), 771–783

  13. [13]

    Arpan Dutta, Saransh Gupta, Behnam Khaleghi, Rishikanth Chandrasekaran, Weihong Xu, and Tajana Rosing. 2022. HDnn-PIM: Efficient in Memory Design of Hyperdimensional Computing with Feature Extraction. In Proceedings of the Great Lakes Symposium on VLSI 2022 (Irvine, CA, USA) (GLSVLSI ’22). Association for Computing Machinery, New York, NY, USA, 281–286. h...

  14. [14]

    Federico Errica, Marco Podda, Davide Bacciu, and Alessio Micheli. 2022. A Fair Comparison of Graph Neural Networks for Graph Classification. arXiv:1912.09893 [cs.LG] https://arxiv.org/abs/1912.09893

  15. [15]

    Lulu Ge and Keshab K Parhi. 2021. Seizure detection using power spectral density via hyperdimensional computing. In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) . IEEE, 7858–7862

  16. [16]

    Alejandro Hernández-Cano, Namiko Matsumoto, Eric Ping, and Mohsen Imani

  17. [17]

    In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE)

    Onlinehd: Robust, efficient, and single-pass online learning using hyper- dimensional system. In 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 56–61

  18. [18]

    Binbin Hu, Zhiqiang Zhang, Chuan Shi, Jun Zhou, Xiaolong Li, and Yuan Qi

  19. [19]

    Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul

    Cash-Out User Detection Based on Attributed Heterogeneous Information Network with a Hierarchical Attention Mechanism. Proceedings of the AAAI Conference on Artificial Intelligence 33, 01 (Jul. 2019), 946–953. https://doi.org/10. 1609/aaai.v33i01.3301946

  20. [20]

    Mohsen Imani, Yeseong Kim, Sadegh Riazi, John Messerly, Patric Liu, Farinaz Koushanfar, and Tajana Rosing. 2019. A framework for collaborative learning in secure high-dimensional space. In 2019 IEEE 12th International Conference on Cloud Computing (CLOUD). IEEE, 435–446

  21. [21]

    Mohsen Imani, Justin Morris, John Messerly, Helen Shu, Yaobang Deng, and Tajana Rosing. 2019. Bric: Locality-based encoding for energy-efficient brain- inspired hyperdimensional computing. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6

  22. [22]

    Mohsen Imani, Sahand Salamat, Saransh Gupta, Jiani Huang, and Tajana Rosing

  23. [23]

    Association for Computing Ma- chinery, New York, NY, USA, 493–498

    FACH: FPGA-based acceleration of hyperdimensional computing by re- ducing computational complexity (ASPDAC ’19). Association for Computing Ma- chinery, New York, NY, USA, 493–498. https://doi.org/10.1145/3287624.3287667

  24. [24]

    Aditya Joshi, Johan T Halseth, and Pentti Kanerva. 2016. Language geometry using random indexing. In International Symposium on Quantum Interaction . Springer, 265–274

  25. [25]

    Pentti Kanerva. 2009. Hyperdimensional computing: An introduction to com- puting in distributed representation with high-dimensional random vectors. Cognitive computation 1, 2 (2009), 139–159

  26. [26]

    Jaeyoung Kang, Behnam Khaleghi, Yeseong Kim, and Tajana Rosing. 2022. Xcelhd: An efficient gpu-powered hyperdimensional computing with parallelized training. In 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC) . IEEE, 220–225

  27. [28]

    In Proceedings of the 59th ACM/IEEE Design Automation Conference (San Francisco, California) (DAC ’22)

    GENERIC: highly efficient learning engine on edge using hyperdimensional computing. In Proceedings of the 59th ACM/IEEE Design Automation Conference (San Francisco, California) (DAC ’22). Association for Computing Machinery, New York, NY, USA, 1117–1122. https://doi.org/10.1145/3489517.3530669

  28. [29]

    Behnam Khaleghi, Jaeyoung Kang, Hanyang Xu, Justin Morris, and Tajana Rosing

  29. [30]

    In Proceedings of the 59th ACM/IEEE Design Automation Conference

    Generic: highly efficient learning engine on edge using hyperdimensional computing. In Proceedings of the 59th ACM/IEEE Design Automation Conference . 1117–1122

  30. [31]

    Yeseong Kim, Mohsen Imani, Niema Moshiri, and Tajana Rosing. 2020. Geniehd: Efficient dna pattern matching accelerator using hyperdimensional computing. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) . IEEE, 115–120

  31. [32]

    Yeseong Kim, Mohsen Imani, and Tajana S Rosing. 2018. Efficient human ac- tivity recognition using hyperdimensional computing. In Proceedings of the 8th International Conference on the Internet of Things . 1–6

  32. [33]

    Denis Kleyko, Dmitri Rachkovskij, Evgeny Osipov, and Abbas Rahimi. 2023. A survey on hyperdimensional computing aka vector symbolic architectures, part ii: Applications, cognitive models, and challenges. Comput. Surveys 55, 9 (2023), 1–52

  33. [34]

    Alex Kulesza. 2012. Determinantal Point Processes for Machine Learning. Foundations and Trends ® in Machine Learning 5, 2–3 (2012), 123–286. https: //doi.org/10.1561/2200000044

  34. [35]

    Sanjiv Kumar, Mehryar Mohri, and Ameet Talwalkar. 2012. Sampling methods for the Nyström method. The Journal of Machine Learning Research 13, 1 (2012), 981–1006

  35. [36]

    Liangzhen Lai and Naveen Suda. 2018. Enabling deep learning at the IoT edge (ICCAD ’18). Association for Computing Machinery, New York, NY, USA, Article 135, 6 pages. https://doi.org/10.1145/3240765.3243473

  36. [37]

    Chengtao Li, Stefanie Jegelka, and Suvrit Sra. 2016. Fast DPP Sampling for Nyström with Application to Kernel Methods. arXiv:1603.06052 [cs.LG] https: //arxiv.org/abs/1603.06052

  37. [38]

    Chengtao Li, Stefanie Jegelka, and Suvrit Sra. 2016. Fast DPP Sampling for Nyström with Application to Kernel Methods. https://doi.org/10.48550/arXiv. 1603.06052

  38. [39]

    Dehua Liang, Jun Shiomi, Noriyuki Miura, and Hiromitsu Awano. 2022. DistriHD: a memory efficient distributed binary hyperdimensional computing architecture for image classification. In 2022 27th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 43–49

  39. [40]

    Dehua Liang, Jun Shiomi, Noriyuki Miura, and Hiromitsu Awano. 2022. Dis- triHD: A Memory Efficient Distributed Binary Hyperdimensional Computing Architecture for Image Classification. 43–49. https://doi.org/10.1109/ASP- DAC52403.2022.9712589

  40. [41]

    Antoine Limasset, Guillaume Rizk, Rayan Chikhi, and Pierre Peterlongo. 2017. Fast and scalable minimal perfect hashing for massive key sets. arXiv preprint arXiv:1702.03154 (2017)

  41. [42]

    Fangxin Liu, Haomin Li, Ning Yang, Yichi Chen, Zongwu Wang, Tao Yang, and Li Jiang. 2024. PAAP-HD: PIM-Assisted Approximation for Efficient Hyper- Dimensional Computing. In 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC). 46–51. https://doi.org/10.1109/ASP-DAC58780.2024. 10473823

  42. [43]

    Alec Lu, Zhenman Fang, Weihua Liu, and Lesley Shannon. 2021. Demystifying the Memory System of Modern Datacenter FPGAs for Software Programmers through Microbenchmarking. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Virtual Event, USA) (FPGA ’21). Association for Computing Machinery, New York, NY, USA, 105–115. https://...

  43. [44]

    Tudataset: A collection of benchmark datasets for learning with graphs

    Christopher Morris, Nils M. Kriege, Franka Bause, Kristian Kersting, Petra Mutzel, and Marion Neumann. 2020. TUDataset: A collection of benchmark datasets for learning with graphs. arXiv:2007.08663 [cs.LG] https://arxiv.org/abs/2007.08663

  44. [45]

    Peer Neubert and Stefan Schubert. 2021. Hyperdimensional computing as a framework for systematic aggregation of image descriptors. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition . 16938–16947

  45. [46]

    Marion Neumann, Roman Garnett, Christian Bauckhage, and Kristian Kersting

  46. [47]

    Machine learning 102, 2 (2016), 209–245

    Propagation kernels: efficient graph kernels from propagated information. Machine learning 102, 2 (2016), 209–245

  47. [48]

    Tony Nowatzki, Vinay Gangadhar, Newsha Ardalani, and Karthikeyan Sankar- alingam. 2017. Stream-Dataflow Acceleration. 45, 2 (June 2017), 416–429. https://doi.org/10.1145/3140659.3080255

  48. [49]

    Igor Nunes, Mike Heddes, Tony Givargis, Alexandru Nicolau, and Alex Veiden- baum. 2022. GraphHD: efficient graph classification using hyperdimensional Conference’17, Washington, DC, USA, , Jebacyril Arockiaraj, Dhruv Parikh, and Viktor Prasanna computing. In Proceedings of the 2022 Conference & Exhibition on Design, Au- tomation & Test in Europe (Antwerp,...

  49. [50]

    NVIDIA Corporation. [n. d.]. NVIDIA System Management Interface (nvidia-smi). https://developer.nvidia.com/nvidia-system-management-interface. Accessed: Sept. 2025

  50. [51]

    Tony A Plate. 1995. Holographic reduced representations. IEEE Transactions on Neural networks 6, 3 (1995), 623–641

  51. [52]

    Ali Rahimi and Benjamin Recht. 2007. Random features for large-scale kernel machines. Advances in neural information processing systems 20 (2007)

  52. [53]

    Kaspar Riesen and Horst Bunke. 2008. IAM graph database repository for graph based pattern recognition and machine learning. In Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR) . Springer, 287–297

  53. [54]

    Mahboobe Sadeghipour Roodsari, Jonas Krautter, Vincent Meyers, and Mehdi Tahoori. 2024. E3HDC: Energy Efficient Encoding for Hyper-Dimensional Computing on Edge Devices. In 2024 34th International Conference on Field- Programmable Logic and Applications (FPL) . 274–280. https://doi.org/10.1109/ FPL64840.2024.00045

  54. [55]

    Sahand Salamat, Mohsen Imani, Behnam Khaleghi, and Tajana Rosing. 2019. F5-hd: Fast flexible fpga-based framework for refreshing hyperdimensional com- puting. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays. 53–62

  55. [56]

    Jiawei Shao, Haowei Zhang, Yuyi Mao, and Jun Zhang. 2023. Branchy-GNN: a Device-Edge Co-Inference Framework for Efficient Point Cloud Processing. arXiv:2011.02422 [cs.DC] https://arxiv.org/abs/2011.02422

  56. [57]

    Guy L Steele Jr and Sebastiano Vigna. 2022. Computationally easy, spectrally good multipliers for congruential pseudorandom number generators. Software: Practice and Experience 52, 2 (2022), 443–458

  57. [58]

    Jeffrey J Sutherland, Lee A O’brien, and Donald F Weaver. 2003. Spline-fitting with a genetic algorithm: A method for developing classification structure- activity relationships. Journal of chemical information and computer sciences 43, 6 (2003), 1906–1915

  58. [59]

    Anthony Thomas, Sanjoy Dasgupta, and Tajana Rosing. 2021. A theoretical perspective on hyperdimensional computing. Journal of Artificial Intelligence Research 72 (2021), 215–249

  59. [60]

    Nikil Wale, Ian Watson, and George Karypis. 2008. Comparison of Descriptor Spaces for Chemical Compound Retrieval and Classification. Knowl. Inf. Syst. 14 (03 2008), 347–375. https://doi.org/10.1109/ICDM.2006.39

  60. [61]

    Nikil Wale, Ian A Watson, and George Karypis. 2008. Comparison of descrip- tor spaces for chemical compound retrieval and classification. Knowledge and Information Systems 14, 3 (2008), 347–375

  61. [62]

    Junyao Wang, Sitao Huang, and Mohsen Imani. 2023. DistHD: A Learner- Aware Dynamic Encoding Method for Hyperdimensional Classification. arXiv:2304.05503 [cs.LG] https://arxiv.org/abs/2304.05503

  62. [63]

    Thomas Wang. 1997. Integer Hash Functions. https://web.archive.org/web/ 20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm. Ac- cessed: 2025-09-12

  63. [64]

    Christopher Williams and Matthias Seeger. 2000. Using the Nyström method to speed up kernel machines. Advances in neural information processing systems 13 (2000)

  64. [65]

    Christopher K. I. Williams and Matthias W. Seeger. 2000. Using the Nyström Method to Speed Up Kernel Machines. In Neural Information Processing Systems . https://api.semanticscholar.org/CorpusID:42041158

  65. [66]

    Samuel Williams, Andrew Waterman, and David Patterson. 2009. Roofline: An Insightful Visual Performance Model for Multicore Architectures.Commun. ACM 52, 4 (April 2009), 65–76. https://doi.org/10.1145/1498765.1498785

  66. [67]

    Vishwanathan

    Pinar Yanardag and S.V.N. Vishwanathan. 2015. Deep Graph Kernels. In Proceed- ings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (Sydney, NSW, Australia)(KDD ’15). Association for Computing Machinery, New York, NY, USA, 1365–1374. https://doi.org/10.1145/2783258. 2783417

  67. [68]

    Tao Yu, Yichi Zhang, Zhiru Zhang, and Christopher De Sa. 2023. Un- derstanding Hyperdimensional Computing for Parallel Single-Pass Learning. arXiv:2202.04805 [cs.LG] https://arxiv.org/abs/2202.04805

  68. [69]

    Quanling Zhao, Kai Lee, Jeffrey Liu, Muhammad Huzaifa, Xiaofan Yu, and Tajana Rosing. 2022. FedHD: federated learning with hyperdimensional computing. In Proceedings of the 28th Annual International Conference on Mobile Computing And Networking (Sydney, NSW, Australia)(MobiCom ’22). Association for Comput- ing Machinery, New York, NY, USA, 791–793. https:...

  69. [70]

    Quanling Zhao, Anthony Hitchcock Thomas, Ari Brin, Xiaofan Yu, and Tajana Rosing. 2025. Bridging the Gap Between Hyperdimensional Computing and Kernel Methods via the Nyström Method. Proceedings of the AAAI Conference on Artificial Intelligence 39, 21 (Apr. 2025), 22813–22821. https://doi.org/10.1609/ aaai.v39i21.34442

  70. [71]

    Ao Zhou, Jianlei Yang, Tong Qiao, Yingjie Qi, Zhi Yang, Weisheng Zhao, and Chunming Hu. 2024. Graph Neural Networks Automated Design and Deployment on Device-Edge Co-Inference Systems. InProceedings of the 61st ACM/IEEE Design Automation Conference (San Francisco, CA, USA) (DAC ’24). Association for Computing Machinery, New York, NY, USA, Article 187, 6 p...

  71. [72]

    Zhuowen Zou, Yeseong Kim, Farhad Imani, Haleh Alimohamadi, Rosario Cam- marota, and Mohsen Imani. 2021. Scalable edge-based hyperdimensional learning system with brain-like neural adaptation. In Proceedings of the International Con- ference for High Performance Computing, Networking, Storage and Analysis . 1–15