arxiv: 2604.21280 · v1 · submitted 2026-04-23 · 💻 cs.CV

Recognition: unknown

ImageHD: Energy-Efficient On-Device Continual Learning of Visual Representations via Hyperdimensional Computing

Jebacyril Arockiaraj , Dhruv Parikh , Viktor Prasanna

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:31 UTC · model grok-4.3

classification 💻 cs.CV

keywords continual learninghyperdimensional computingFPGA acceleratoron-device AIenergy efficiencyvisual representationsedge computingstreaming data

0 comments

The pith

ImageHD uses hyperdimensional computing on an FPGA to deliver up to 40x speedup and 383x energy savings for on-device continual learning of visual streams under tight memory limits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ImageHD, a complete FPGA system that pairs a quantized CNN front-end with hyperdimensional computing to support streaming continual learning of images. At the algorithm level it bounds memory use through a single exemplar store and a hardware-tuned cluster merge step, then maps the entire pipeline to a streaming dataflow architecture that performs all operations with packed binary hypervectors. This design avoids backpropagation, multi-level memory hierarchies, and iterative solvers, targeting the exact constraints of edge hardware. A reader would care because the approach turns continual adaptation into a low-power, low-latency operation that can run locally on cameras or sensors without cloud round-trips or battery drain.

Core claim

ImageHD implements a streaming dataflow architecture on the AMD Zynq ZCU104 that integrates HDC encoding, similarity search, and bounded cluster management using word-packed binary hypervectors for massively parallel bitwise computation. Combined with a compact quantized CNN for feature extraction and a unified exemplar memory plus hardware-efficient merging strategy, the system supports non-iterative online updates while staying inside strict on-chip memory and latency budgets. On the CORe50 dataset this yields up to 40.4x speedup and 383x energy efficiency over optimized CPU baselines and 4.84x speedup with 105.1x better energy use versus GPU baselines.

What carries the argument

The hardware-aware cluster merging strategy together with a fixed unified exemplar memory bound, executed via word-packed binary hypervectors that enable parallel bitwise operations inside the FPGA dataflow pipeline.

If this is right

Continual learning becomes feasible for real-time visual streams on hardware that cannot store large exemplar sets or run gradient steps.
Energy consumption drops enough to enable always-on adaptation in battery-powered edge cameras and sensors.
The non-iterative HDC update path removes the latency spikes typical of backpropagation-based continual learners.
A single on-chip memory budget suffices for both feature extraction and class representation, simplifying hardware design.
Binary hypervector operations map directly to efficient FPGA bitwise logic, keeping resource use low.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same bounded-memory HDC pattern could be tested on other sensor streams such as audio or IMU data if appropriate encoding functions are supplied.
Larger on-chip memory in future FPGAs would reduce the frequency of merges and potentially raise accuracy without changing the algorithm.
Replacing the current quantized CNN with an even lighter extractor might trade a small accuracy drop for further energy gains on the most constrained devices.

Load-bearing premise

The cluster merging and exemplar bounding will keep enough representative samples to maintain usable classification accuracy as new visual data arrives, without forcing extra off-chip accesses or post-hoc fixes.

What would settle it

Measure top-1 accuracy on CORe50 after a long sequence of new classes or distribution shifts while enforcing the stated on-chip memory ceiling; if accuracy falls sharply below the reported levels or the design requires off-chip traffic, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2604.21280 by Dhruv Parikh, Jebacyril Arockiaraj, Viktor Prasanna.

**Figure 1.** Figure 1: illustrates the overall architecture of ImageHD, an FPGA accelerator for on-device continual visual learning. ImageHD comprises five specialized compute units: (i) Pointwise Convolution Unit (PCU); (ii) Depthwise Convolution Unit (DCU); (iii) Hyperdimensional Encoding Unit (HEU); (iv) Hyperdimensional Classifier Unit (HCU); and (v) Cluster Merge Unit (CMU). We cover each unit in detail below [PITH_FULL_I… view at source ↗

**Figure 2.** Figure 2: Streaming Architecture of Inverted Residual Block [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 4.** Figure 4: Cluster Merge Unit (CMU) D. Hyperdimensional Learning Unit (HLU) The HLU performs streaming similarity evaluation and online cluster updates on hypervector chunks produced by the HEU. It is organized as a pk × (pw × pb) PE mesh, enabling parallel processing across pk clusters per chunk. For each chunk, the HLU computes bitwise XORs between the encoded sample and stored cluster hypervectors, followed by pop… view at source ↗

**Figure 5.** Figure 5: Compute Flow of our On-Device Continual Learning Accelerator [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Execution time comparison of Spectral clustering (GPU) and [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

On-device continual learning (CL) is critical for edge AI systems operating on non-stationary data streams, but most existing methods rely on backpropagation or exemplar-heavy classifiers, incurring substantial compute, memory, and latency overheads. Hyperdimensional computing (HDC) offers a lightweight alternative through fast, non-iterative online updates. Combined with a compact convolutional neural network (CNN) feature extractor, HDC enables efficient on-device adaptation with strong visual representations. However, prior HDC-based CL systems often depend on multi-tier memory hierarchies and complex cluster management, limiting deployability on resource-constrained hardware. We present ImageHD, an FPGA accelerator for on-device continual learning of visual data based on HDC. ImageHD targets streaming CL under strict latency and on-chip memory constraints, avoiding costly iterative optimization. At the algorithmic level, we introduce a hardware-aware CL method that bounds class exemplars through a unified exemplar memory and a hardware-efficient cluster merging strategy, while incorporating a quantized CNN front-end to reduce deployment overhead without sacrificing accuracy. At the system level, ImageHD is implemented as a streaming dataflow architecture on the AMD Zynq ZCU104 FPGA, integrating HDC encoding, similarity search, and bounded cluster management using word-packed binary hypervectors for massively parallel bitwise computation within tight on-chip resource budgets. On CORe50, ImageHD achieves up to 40.4x (4.84x) speedup and 383x (105.1x) energy efficiency over optimized CPU (GPU) baselines, demonstrating the practicality of HDC-enabled continual learning for real-time edge AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ImageHD shows solid FPGA engineering for HDC continual learning with big reported efficiency gains, but the lack of accuracy and forgetting metrics leaves the core practicality claim unverified.

read the letter

This paper introduces ImageHD, an FPGA accelerator for continual learning of visual representations using hyperdimensional computing paired with a quantized CNN front-end. The core idea is to handle non-stationary data streams on edge hardware without heavy backprop or large memory. What is new is the hardware-aware continual learning method that uses a unified bounded exemplar memory and a hardware-efficient cluster merging strategy. They implement this as a streaming dataflow architecture on the AMD Zynq ZCU104, with word-packed binary hypervectors for fast bitwise operations. This setup aims to stay within on-chip memory and latency limits. The paper does well on the system side. The reported results on CORe50 show up to 40.4 times speedup and 383 times better energy efficiency compared to optimized CPU baselines, with still solid gains over GPU. The focus on avoiding multi-tier memory and complex cluster management is a practical step for real deployment. The main soft spot is the absence of accuracy, forgetting rates, or ablation studies. The efficiency claims rest on the assumption that the bounded memory and merging preserve enough performance on non-stationary streams like CORe50, but without those numbers or comparisons to unbounded HDC or other CL methods, it's difficult to assess if the system is truly viable. No error bars are mentioned either, which is a minor but noticeable gap for experimental claims. This work is for hardware engineers and researchers focused on edge AI and efficient continual learning. Readers looking for FPGA implementations of HDC would find the dataflow details useful. It deserves a serious referee because the implementation is specific and the efficiency numbers are substantial, even if the accuracy side needs more evidence. I recommend engaging with it in peer review, but requiring the authors to add the missing accuracy metrics and ablations to strengthen the central argument.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ImageHD, an FPGA accelerator for on-device continual learning of visual representations. It combines a quantized CNN front-end with hyperdimensional computing (HDC) for fast, non-iterative updates, proposing a hardware-aware continual learning method that uses a unified exemplar memory and hardware-efficient cluster merging to bound on-chip memory usage under streaming non-stationary data. The system is implemented as a streaming dataflow architecture on the AMD Zynq ZCU104 FPGA using word-packed binary hypervectors. On the CORe50 dataset, the work claims up to 40.4x speedup and 383x energy efficiency over optimized CPU baselines (and 4.84x / 105.1x over GPU), while satisfying strict latency and on-chip memory constraints for real-time edge AI.

Significance. If the accuracy and forgetting metrics under the bounded-exemplar strategy are shown to be competitive with unbounded HDC or standard CL baselines, the result would be significant for resource-constrained edge devices. The concrete FPGA implementation, use of massively parallel bitwise operations, and quantified efficiency gains over both CPU and GPU baselines represent concrete engineering contributions that could be directly useful for deploying continual learning on FPGAs.

major comments (2)

[Abstract] Abstract and experimental claims: the central performance numbers (40.4x speedup, 383x energy) are presented without any accompanying classification accuracy, average forgetting rate, or ablation results comparing the unified exemplar memory + cluster merging strategy against unbounded HDC or standard replay-based CL methods. Because the hardware-aware bounding strategy is load-bearing for the on-device practicality claim, the absence of these metrics leaves the strongest claim unsupported.
[Experimental Results] Experimental section (inferred from abstract): no error bars, standard deviations, or multiple-run statistics are referenced for the reported speedups and energy figures; likewise, no description of the exact CPU/GPU baseline implementations (e.g., whether they use the same quantized CNN or full-precision models) is provided, preventing assessment of whether the efficiency gains are robust or baseline-dependent.

minor comments (2)

[Abstract] The abstract states concrete speedup and energy numbers yet provides no forward reference to the table or figure containing the corresponding accuracy results; adding such a pointer would improve readability.
Notation for hypervector operations and cluster merging could be clarified with a small pseudocode block or explicit definition of the merging threshold, as the current description leaves the exact hardware-efficient strategy somewhat underspecified.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and outline the revisions we will make to strengthen the presentation of our results while preserving the core contributions of the work.

read point-by-point responses

Referee: [Abstract] Abstract and experimental claims: the central performance numbers (40.4x speedup, 383x energy) are presented without any accompanying classification accuracy, average forgetting rate, or ablation results comparing the unified exemplar memory + cluster merging strategy against unbounded HDC or standard replay-based CL methods. Because the hardware-aware bounding strategy is load-bearing for the on-device practicality claim, the absence of these metrics leaves the strongest claim unsupported.

Authors: We agree that pairing the efficiency claims with accuracy and forgetting metrics in the abstract would better support the on-device practicality argument. The full experimental evaluation in the manuscript reports classification accuracy and average forgetting on CORe50 for the bounded-exemplar configuration. To address the concern directly, we will revise the abstract to include representative accuracy and forgetting figures alongside the speedup and energy numbers. We will also add a short statement summarizing the ablation comparing the unified exemplar memory and cluster-merging strategy to unbounded HDC, confirming that the bounded approach preserves competitive accuracy under the reported memory constraints. revision: yes
Referee: [Experimental Results] Experimental section (inferred from abstract): no error bars, standard deviations, or multiple-run statistics are referenced for the reported speedups and energy figures; likewise, no description of the exact CPU/GPU baseline implementations (e.g., whether they use the same quantized CNN or full-precision models) is provided, preventing assessment of whether the efficiency gains are robust or baseline-dependent.

Authors: We acknowledge that the absence of error bars and explicit baseline details reduces the ability to assess robustness. In the revised manuscript we will report standard deviations obtained from multiple runs for all speedup and energy figures. We will also expand the experimental setup section with a precise description of the CPU and GPU baselines, clarifying that both use the identical quantized CNN front-end and HDC encoding pipeline as the FPGA implementation to ensure a fair comparison. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical FPGA benchmarks or implementation claims

full rationale

The paper presents an FPGA accelerator implementation for HDC-based continual learning with direct experimental measurements on the CORe50 dataset, reporting concrete speedups (40.4x/4.84x) and energy gains (383x/105.1x) versus CPU/GPU baselines. These results are obtained from hardware execution rather than any derivation, prediction, or first-principles result that reduces to fitted parameters or self-referential definitions by construction. No equations, uniqueness theorems, or ansatzes are invoked that equate outputs to inputs; the central claims rest on measured on-chip performance under bounded memory, with no load-bearing self-citations or renaming of known results. The work is self-contained as a systems contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the central claims rest on empirical implementation results rather than mathematical derivations.

pith-pipeline@v0.9.0 · 5605 in / 1203 out tokens · 40167 ms · 2026-05-09T22:31:12.042782+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 7 canonical work pages

[1]

Chen and B

Z. Chen and B. Liu,Lifelong Machine Learning, ser. Synthesis Lectures on Artificial Intelligence and Machine Learning. Morgan & Claypool Publishers, 2018, vol. 12, no. 3

2018
[2]

Expert gate: Lifelong learning with a network of experts,

R. Aljundi, P. Chakravarty, and T. Tuytelaars, “Expert gate: Lifelong learning with a network of experts,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 3366–3375

2017
[3]

Efficient lifelong learning with A-GEM,

A. Chaudhry, M. Ranzato, M. Rohrbach, and M. Elhoseiny, “Efficient lifelong learning with A-GEM,” inProceedings of the International Conference on Learning Representations (ICLR), 2018

2018
[4]

Continual lifelong learning with neural networks: A review,

G. I. Parisi, R. Kemker, J. L. Part, C. Kanan, and S. Wermter, “Continual lifelong learning with neural networks: A review,”Neural Networks, vol. 113, pp. 54–71, 2019

2019
[5]

The task rehearsal method of lifelong learning: Overcoming impoverished data,

D. L. Silver and R. E. Mercer, “The task rehearsal method of lifelong learning: Overcoming impoverished data,” inProceedings of the Confer- ence of the Canadian Society for Computational Studies of Intelligence, 2002, pp. 90–101

2002
[6]

Embracing change: Continual learning in deep neural networks,

R. Hadsell, D. Rao, A. A. Rusu, and R. Pascanu, “Embracing change: Continual learning in deep neural networks,”Trends in Cognitive Sci- ences, vol. 24, no. 12, pp. 1028–1040, 2020

2020
[7]

Lifelong machine learning with deep streaming linear discriminant analysis,

T. L. Hayes and C. Kanan, “Lifelong machine learning with deep streaming linear discriminant analysis,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020, pp. 220–221

2020
[8]

Ailis: effective hardware acceler- ator for incremental learning with intelligent selection in classification,

N. HosseinpourFardi and B. Alizadeh, “Ailis: effective hardware acceler- ator for incremental learning with intelligent selection in classification,” The Journal of Supercomputing, vol. 81, 02 2025

2025
[9]

Self-supervised models are continual learners,

E. Fini, V . G. T. da Costa, X. Alameda-Pineda, E. Ricci, K. Alahari, and J. Mairal, “Self-supervised models are continual learners,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 9621–9630

2022
[10]

Representational continuity for unsupervised continual learning,

D. Madaan, J. Yoon, Y . Li, Y . Liu, and S. J. Hwang, “Representational continuity for unsupervised continual learning,” inInternational Con- ference on Learning Representations (ICLR), 2022

2022
[11]

Lifelong intelligence beyond the edge using hyperdimensional computing,

X. Yu, L. Gutierrez, A. Thomas, I. G. Moreno, and T. ˇS. Rosing, “Lifelong intelligence beyond the edge using hyperdimensional computing,” inProceedings of the 23rd ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN). IEEE, 2024, pp. 1–12. [Online]. Available: https://github.com/Orienfish/LifeHD

2024
[12]

Hyperdimensional computing: An introduction to comput- ing in distributed representation with high-dimensional random vectors,

P. Kanerva, “Hyperdimensional computing: An introduction to comput- ing in distributed representation with high-dimensional random vectors,” Cognitive Computation, vol. 1, no. 2, pp. 139–159, 2009

2009
[13]

J. Wang, S. Huang, and M. Imani,DistHD: A Learner- Aware Dynamic Encoding Method for Hyperdimensional Classification. IEEE Press, 2025, p. 1–6. [Online]. Available: https://doi.org/10.1109/DAC56929.2023.10247876

work page doi:10.1109/dac56929.2023.10247876 2025
[14]

Visionhd: Towards efficient and privacy-preserved hyperdimensional computing for image data,

F. Asgarinejad, J. Morris, T. Rosing, and B. Aksanli, “Visionhd: Towards efficient and privacy-preserved hyperdimensional computing for image data,” inProceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design, ser. ISLPED ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1–6. [Online]. Available: htt...

work page doi:10.1145/3665314.3670852 2024
[15]

Hyperdimensional computing vs. neural networks: Comparing architecture and learning process,

D. Ma, C. Hao, and X. Jiao, “Hyperdimensional computing vs. neural networks: Comparing architecture and learning process,” in2024 25th International Symposium on Quality Electronic Design (ISQED). IEEE, 2024, pp. 1–5

2024
[16]

Hypergraf: Hyperdimensional graph-based reasoning acceleration on fpga,

H. Chen, A. Zakeri, F. Wen, H. E. Barkam, and M. Imani, “Hypergraf: Hyperdimensional graph-based reasoning acceleration on fpga,” inPro- ceedings of the 33rd International Conference on Field-Programmable Logic and Applications (FPL). Gothenburg, Sweden: IEEE, 2023, pp. 1–9

2023
[17]

A survey of on-device machine learning: An algorithms and learning theory perspective,

S. Dhar, J. Guo, J. J. Liu, S. Tripathi, U. Kurup, and M. Shah, “A survey of on-device machine learning: An algorithms and learning theory perspective,”ACM Trans. Internet Things, vol. 2, no. 3, Jul
[18]

Available: https://doi.org/10.1145/3450494

[Online]. Available: https://doi.org/10.1145/3450494

work page doi:10.1145/3450494
[19]

Online continual learning for embedded devices,

T. L. Hayes and C. Kanan, “Online continual learning for embedded devices,” inConference on Lifelong Learning Agents (CoLLAs). PMLR, 2022

2022
[20]

Edge intelligence: The confluence of edge computing and artificial intelligence,

S. Deng, H. Zhao, W. Fang, J. Yin, and A. Zomaya, “Edge intelligence: The confluence of edge computing and artificial intelligence,”IEEE Internet of Things Journal, vol. PP, pp. 1–1, 04 2020

2020
[21]

Edge computing for autonomous driving: Opportunities and challenges,

S. Liu, L. Liu, J. Tang, B. Yu, Y . Wang, and W. Shi, “Edge computing for autonomous driving: Opportunities and challenges,”Proceedings of the IEEE, vol. PP, pp. 1–20, 06 2019

2019
[22]

Online continual learning in image classification: An empirical survey,

Z. Mai, R. Li, J. Jeong, D. Quispe, H. Kim, and S. Sanner, “Online continual learning in image classification: An empirical survey,”Neuro- computing, vol. 469, pp. 28–51, 2022

2022
[23]

arXiv preprint arXiv:2506.16884 , year=

J. Graldi, A. Breccia, G. Lanzillotta, T. Hofmann, and L. Noci, “The importance of being lazy: Scaling limits of continual learning,”arXiv preprint arXiv:2506.16884, 2025

work page arXiv 2025
[24]

Mixture of experts meets prompt-based continual learning,

M. Le, A. Nguyen, H. Nguyen, T. Nguyen, T. Pham, L. Van Ngo, and N. Ho, “Mixture of experts meets prompt-based continual learning,”Ad- vances in Neural Information Processing Systems, vol. 37, pp. 119 025– 119 062, 2024

2024
[25]

On the transferability of parameter- efficient continual learning for vision transformers,

L. Ackermann and V .-L. Nguyen, “On the transferability of parameter- efficient continual learning for vision transformers,” inNeurIPS 2024 Workshop on Fine-Tuning in Modern Machine Learning: Principles and Scalability, 2024

2024
[26]

Online-lora: Task-free online continual learning via low rank adaptation,

X. Wei, G. Li, and R. Marculescu, “Online-lora: Task-free online continual learning via low rank adaptation,” in2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025, pp. 6634–6645

2025
[27]

Interactive continual learning: Fast and slow thinking,

B. Qi, X. Chen, J. Gao, D. Li, J. Liu, L. Wu, and B. Zhou, “Interactive continual learning: Fast and slow thinking,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 12 882–12 892

2024
[28]

C-clip: Multimodal continual learning for vision-language model,

W. Liu, F. Zhu, L. Wei, and Q. Tian, “C-clip: Multimodal continual learning for vision-language model,” inThe Thirteenth International Conference on Learning Representations, 2025

2025
[29]

Continual learning for vlms: A survey and taxonomy beyond forgetting.arXiv preprint arXiv:2508.04227, 2025

Y . Liu, Q. Hong, L. Huang, A. Gomez-Villa, D. Goswami, X. Liu, J. van de Weijer, and Y . Tian, “Continual learning for vlms: A survey and taxonomy beyond forgetting,”arXiv preprint arXiv:2508.04227, 2025

work page arXiv 2025
[30]

Language guided concept bottleneck models for interpretable continual learning,

L. Yu, H. Han, Z. Tao, H. Yao, and C. Xu, “Language guided concept bottleneck models for interpretable continual learning,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), June 2025, pp. 14 976–14 986

2025
[31]

Gradient episodic memory for contin- ual learning,

D. Lopez-Paz and M. Ranzato, “Gradient episodic memory for contin- ual learning,” inAdvances in Neural Information Processing Systems, vol. 30, 2017

2017
[32]

Enabling real-time inference in online continual learning via device-cloud collaboration,

H. Liu, C. Gong, Z. Zheng, S. Liu, and F. Wu, “Enabling real-time inference in online continual learning via device-cloud collaboration,” inProceedings of the ACM on Web Conference 2025, ser. WWW ’25. New York, NY , USA: Association for Computing Machinery, 2025, p. 2043–2052. [Online]. Available: https://doi.org/10.1145/3696410.3714796

work page doi:10.1145/3696410.3714796 2025
[33]

Comparing energy efficiency of cpu, gpu and fpga implementations for vision kernels,

M. Qasaimeh, K. Denolf, J. Lo, K. Vissers, J. Zambreno, and P. Jones, “Comparing energy efficiency of cpu, gpu and fpga implementations for vision kernels,” 05 2019

2019
[34]

Laplace-hdc: Understanding the geometry of binary hyperdimensional computing,

S. Pourmand, W. D. Whiting, A. Aghasi, and N. F. Marshall, “Laplace-hdc: Understanding the geometry of binary hyperdimensional computing,”J. Artif. Intell. Res., vol. 82, pp. 1293–1323, 2024. [Online]. Available: https://api.semanticscholar.org/CorpusID:269157456

2024
[35]

Factorhd: A hyperdimensional computing model for multi-object multi-class representation and factorization,

Y . Zhou, X. Huang, C. Ni, M. Zhou, Z. Yan, X. Yin, and C. Zhuo, “Factorhd: A hyperdimensional computing model for multi-object multi-class representation and factorization,” 2025. [Online]. Available: https://arxiv.org/abs/2507.12366

work page arXiv 2025
[36]

Zynq ultrascale+ mpsoc zcu104 evaluation kit,

AMD Xilinx, “Zynq ultrascale+ mpsoc zcu104 evaluation kit,” https://www.xilinx.com/products/boards-and-kits/zcu104.html, accessed: Sep. 25, 2025

2025
[37]

Amd vitis unified ide, version 2024.2,

Advanced Micro Devices, Inc., “Amd vitis unified ide, version 2024.2,” 2024, release 2024.2

2024
[38]

Accelerating continual learn- ing on edge fpga,

D. Piyasena, S.-K. Lam, and M. Wu, “Accelerating continual learn- ing on edge fpga,” in2021 31st International Conference on Field- Programmable Logic and Applications (FPL), 2021, pp. 294–300

2021
[39]

Budget-restricted incremental learning with pre-trained convolutional neural networks and binary associative memories,

G. Boukli Hacene, V . Gripon, N. Farrugia, M. Arzel, and M. Jezequel, “Budget-restricted incremental learning with pre-trained convolutional neural networks and binary associative memories,”Journal of Signal Processing Systems, vol. 91, pp. 1063–1073, 2019

2019
[40]

Learning multiple layers of features from tiny images,

A. Krizhevsky, “Learning multiple layers of features from tiny images,” Technical Report, University of Toronto, 2009

2009
[41]

Core50: a new dataset and benchmark for continual object recognition,

V . Lomonaco and D. Maltoni, “Core50: a new dataset and benchmark for continual object recognition,” inProceedings of the 1st Annual Conference on Robot Learning, ser. Proceedings of Machine Learning Research, vol. 78. PMLR, 2017, pp. 17–26

2017
[42]

Unsupervised deep embedding for clustering analysis,

J. Xie, R. Girshick, and A. Farhadi, “Unsupervised deep embedding for clustering analysis,” inProceedings of The 33rd International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, M. F. Balcan and K. Q. Weinberger, Eds., vol. 48. New York, New York, USA: PMLR, 20–22 Jun 2016, pp. 478–487

2016
[43]

Onnx quantization,

“Onnx quantization,” https://onnxruntime.ai/docs/performance/
[44]

Quantization and training of neural networks for efficient integer-arithmetic-only inference,

B. Jacob, S. Kligys, B. Chen, M. Zhu, M. Tang, A. Howard, H. Adam, and D. Kalenichenko, “Quantization and training of neural networks for efficient integer-arithmetic-only inference,” in2018 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, 2018, pp. 2704–2713

2018
[45]

Working memory,

A. Baddeley, “Working memory,”Science, vol. 255, no. 5044, pp. 556– 559, 1992

1992
[46]

Catastrophic interference in connec- tionist networks: The sequential learning problem,

M. McCloskey and N. J. Cohen, “Catastrophic interference in connec- tionist networks: The sequential learning problem,” inPsychology of Learning and Motivation. Elsevier, 1989, vol. 24, pp. 109–165

1989
[47]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. C. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

2017
[48]

Continual learning through synaptic intelligence,

F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” pp. 3987–3995, 2017

2017
[49]

icarl: Incremental classifier and representation learning,

S.-A. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “icarl: Incremental classifier and representation learning,” inProceedings of the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 2001–2010

2017
[50]

A continual learning survey: Defying forgetting in classification tasks,

M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,”IEEE transactions on pattern analysis and machine intelligence, vol. 44, no. 7, pp. 3366–3385, 2021

2021
[51]

A comprehensive survey of continual learning: Theory, method and application,

L. Wang, X. Zhang, H. Su, and J. Zhu, “A comprehensive survey of continual learning: Theory, method and application,”IEEE transactions on pattern analysis and machine intelligence, vol. 46, no. 8, pp. 5362– 5383, 2024

2024
[52]

On-device transfer learning based on mixed precision partitioning

I. Topko, F. Kreß, A. Serdyuk, M. Stammler, T. Harbaum, and J. Becker, “On-device transfer learning based on mixed precision partitioning.”
[53]

Life- learner: Hardware-aware meta continual learning system for embedded computing platforms,

Y . D. Kwon, J. Chauhan, H. Jia, S. I. Venieris, and C. Mascolo, “Life- learner: Hardware-aware meta continual learning system for embedded computing platforms,” inProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems, 2023, pp. 138–151

2023
[54]

[dl] a survey of fpga-based neural network inference accelerators,

K. Guo, S. Zeng, J. Yu, Y . Wang, and H. Yang, “[dl] a survey of fpga-based neural network inference accelerators,”ACM Transactions on Reconfigurable Technology and Systems (TRETS), vol. 12, no. 1, pp. 1–26, 2019

2019
[55]

Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks,

C. Zhang, Z. Fang, P. Zhou, P. Pan, and J. Cong, “Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks,” inProceedings of the 35th International Conference on Computer-Aided Design, 2016, pp. 1–8

2016