Surrogate Neural Architecture Codesign Package (SNAC-Pack)
Pith reviewed 2026-05-20 19:29 UTC · model grok-4.3
The pith
SNAC-Pack automates search for neural networks that match accuracy on physics tasks while using fewer FPGA resources.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SNAC-Pack discovers compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.
What carries the argument
A hardware surrogate model that supplies per-trial estimates of FPGA resource utilization and latency, allowing multi-objective search to explore many architectures without repeated full synthesis.
If this is right
- Selected architectures can undergo additional quantization-aware training and pruning to meet tighter resource budgets before synthesis.
- Parallel search across multiple compute nodes becomes practical because the surrogate removes the dominant synthesis cost from the loop.
- The same pipeline applies to new datasets through configuration files without changes to the core code.
- Final FPGA firmware achieves the target accuracy within a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency.
Where Pith is reading between the lines
- The same surrogate-guided search could extend to other hardware targets where full evaluation is costly, such as ASICs or custom accelerators.
- Adding power or thermal predictions to the surrogate would allow optimization for a broader set of deployment constraints.
- Running the method on additional high-energy physics or sensing tasks could test whether the time savings generalize beyond the two demonstrated cases.
Load-bearing premise
The hardware surrogate model produces sufficiently accurate per-trial estimates of FPGA resource utilization and latency that correlate well with post-synthesis results.
What would settle it
Synthesize the top architectures selected by the search and measure their actual FPGA resource counts and latency; large mismatches with the surrogate predictions or worse final performance would falsify the approach.
Figures
read the original abstract
Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture Codesign Package (SNAC-Pack), an open-source AutoML framework for hardware-aware neural architecture codesign and end-to-end FPGA deployment. SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II, loading trials to a shared SQLite store that enables parallel workers across compute nodes. A hardware surrogate model outputs per-trial resource and latency estimates, avoiding the synthesis cost that would otherwise dominate the search loop. A local search stage then applies quantization-aware training (QAT) together with iterative magnitude pruning in a combined compression loop, after which the final model is synthesized to FPGA firmware via the hls4ml Python library. A YAML configuration and an optional agentic frontend let users run the pipeline on new datasets without modifying the framework. We demonstrate SNAC-Pack on jet classification at the Large Hadron Collider and superconducting qubit readout, discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents SNAC-Pack, an open-source AutoML framework for hardware-aware neural architecture codesign targeting FPGA deployment. It performs multi-objective global search using Optuna and NSGA-II over a shared SQLite store for parallel execution, employs a hardware surrogate model to estimate per-trial FPGA resources (LUTs, DSPs, BRAM, FFs) and latency without repeated synthesis, applies a local compression stage combining quantization-aware training and iterative magnitude pruning, and generates final firmware via hls4ml. YAML configuration and an optional agentic frontend support reuse on new datasets. Demonstrations on LHC jet classification and superconducting qubit readout claim compact architectures that match or exceed baselines on task metrics while reducing resource utilization and, for the qubit case, shrinking manual design exploration from months to hours of automated search.
Significance. If the surrogate predictions prove reliable, SNAC-Pack could meaningfully accelerate hardware-software codesign for scientific edge applications by replacing dominant synthesis costs with fast surrogate estimates inside a multi-objective loop. The open-source release, parallel worker support via SQLite, end-to-end pipeline from search to hls4ml firmware, and user-configurable YAML interface are concrete strengths that aid reproducibility and adoption. The work usefully bridges accuracy-only NAS with multi-dimensional FPGA budgeting on two practically relevant tasks.
major comments (2)
- [Hardware Surrogate Model] Hardware Surrogate Model section: no table or figure reports quantitative validation metrics (MAE, R², or Spearman rank correlation) between surrogate predictions of LUT/DSP/BRAM/FF/latency and post-synthesis results on a held-out set of architectures or on the final Pareto front. This validation is load-bearing for the central claim that surrogate-guided search yields genuine resource reductions rather than artifacts of surrogate bias.
- [Qubit Readout Experiments] Qubit readout results subsection: the reduction of design-space exploration from months of manual fine-tuning to hours of automated search is stated without reporting search-space cardinality, number of trials executed, wall-clock time per trial, or a quantified baseline manual process, preventing assessment of the claimed time savings.
minor comments (2)
- [Abstract] Abstract: the phrase 'positive outcomes on two tasks' would be strengthened by naming the concrete task metrics (e.g., accuracy or AUC) and the magnitude of resource or latency improvements.
- [Figures] Figure captions for Pareto fronts: adding a second set of points or error bands showing surrogate versus post-synthesis values for the selected architectures would improve interpretability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below and indicate the revisions planned for the next version of the paper.
read point-by-point responses
-
Referee: Hardware Surrogate Model section: no table or figure reports quantitative validation metrics (MAE, R², or Spearman rank correlation) between surrogate predictions of LUT/DSP/BRAM/FF/latency and post-synthesis results on a held-out set of architectures or on the final Pareto front. This validation is load-bearing for the central claim that surrogate-guided search yields genuine resource reductions rather than artifacts of surrogate bias.
Authors: We agree that quantitative validation metrics for the hardware surrogate are necessary to substantiate the central claims. In the revised manuscript we will add a table (or figure) in the Hardware Surrogate Model section reporting MAE, R², and Spearman rank correlation for each predicted quantity (LUTs, DSPs, BRAM, FFs, and latency). Metrics will be shown both on an independent held-out set of architectures and on the final Pareto-front models after post-synthesis verification. This addition will directly address the concern about possible surrogate bias. revision: yes
-
Referee: Qubit readout results subsection: the reduction of design-space exploration from months of manual fine-tuning to hours of automated search is stated without reporting search-space cardinality, number of trials executed, wall-clock time per trial, or a quantified baseline manual process, preventing assessment of the claimed time savings.
Authors: We acknowledge that the time-savings statement requires additional quantitative context. In the revision we will report the cardinality of the explored search space, the exact number of trials executed, and the average wall-clock time per trial (including the surrogate evaluation time). We will also expand the description of the manual baseline to list the typical steps and approximate effort required in our prior manual design process for the qubit readout task. While a precise side-by-side measurement of the manual effort was not recorded, the added details will make the efficiency comparison more transparent and assessable. revision: partial
Circularity Check
No circularity: empirical framework description
full rationale
The paper describes an open-source AutoML framework (SNAC-Pack) that combines Optuna/NSGA-II search, a hardware surrogate for resource/latency estimates, QAT+pruning, and hls4ml synthesis. All claims rest on empirical demonstration across two tasks (jet classification and qubit readout), with reported outcomes (compact architectures, resource reductions, search-time speedup) obtained by running the pipeline rather than by any equation or fitted parameter that reduces to its own inputs by construction. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the work is self-contained against external benchmarks via post-synthesis verification and manual-tuning comparisons.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II... A hardware surrogate model outputs per-trial resource and latency estimates
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Machine Learning: Science and Technology , abstract =
Weitz, Jason and Demler, Dmitri and McDermott, Luke and Tran, Nhan and Duarte, Javier , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/adede1 , url =
-
[2]
Machine Learning: Science and Technology , abstract =
Rahimifar, Mohammad Mehdi and Rahali, Hamza Ezzaoui and Therrien, Audrey C , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/ada71c , url =
-
[3]
wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation , author=. 2025 , eprint=
work page 2025
-
[4]
Odagiu, Patrick and Que, Zhiqiang and Duarte, Javier and Haller, Johannes and Kasieczka, Gregor and Lobanov, Artur and Loncar, Vladimir and Luk, Wayne and Ngadiuba, Jennifer and Pierini, Maurizio and Rincke, Philipp and Seksaria, Arpita and Summers, Sioni and Sznajder, Andre and Tapper, Alexander and Årrestad, Thea K , title =. Mach. Learn.: Sci. Technol....
-
[5]
Deb, K. and Pratap, A. and Agarwal, S. and Meyarivan, T. , journal=. A fast and elitist multiobjective genetic algorithm:. 2002 , volume=
work page 2002
-
[6]
hls4mlLHC jet dataset (150 particles)
Pierini, Maurizio and Duarte, Javier Mauricio and Tran, Nhan and Freytsis, Marat , title =. doi:10.5281/zenodo.3602260 , url =
-
[7]
Neural network quantization for efficient inference: A survey , author=. 2112.06126 , year=
-
[8]
Liu, Yuqiao and Sun, Yanan and Xue, Bing and Zhang, Mengjie and Yen, Gary G. and Tan, Kay Chen , journal=. A Survey on Evolutionary Neural Architecture Search , year=
-
[9]
Neural Architecture Search with Reinforcement Learning
Zoph, Barret and Le, Quoc V , title =. 1611.01578 , year =
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =
Wu, Nan and Yang, Hang and Xie, Yuan and Li, Pan and Hao, Cong , title =. Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =. 2022 , isbn =. doi:10.1145/3489517.3530408 , abstract =
-
[11]
Proceedings of the International Conference on Computer-Aided Design , articleno =
O'Neal, Kenneth and Liu, Mitch and Tang, Hans and Kalantar, Amin and DeRenard, Kennen and Brisk, Philip , title =. Proceedings of the International Conference on Computer-Aided Design , articleno =. 2018 , doi =
work page 2018
-
[12]
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding , author=. 2016 , eprint=
work page 2016
-
[13]
International conference on machine learning , pages=
Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=
work page 2015
-
[14]
doi:10.1145/3801979 , eprint =
-
[15]
doi:10.5281/zenodo.1201549 , url =
fastmachinelearning/hls4ml , year = 2024, publisher =. doi:10.5281/zenodo.1201549 , url =
-
[16]
doi:10.5281/zenodo.19202843 , url =
Dmitri Demler and Jason Weitz and Javier Duarte and Daniel Cummings and Luke McDermott , title =. doi:10.5281/zenodo.19202843 , url =
-
[17]
Roy and Michael Carbin , title =
Jonathan Frankle and Gintare Karolina Dziugaite and Daniel M. Roy and Michael Carbin , title =. International Conference on Machine Learning , pages =. 1912.05671 , year =
-
[18]
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks
The lottery ticket hypothesis: Finding sparse, trainable neural networks , author=. 1803.03635 , booktitle =
work page internal anchor Pith review Pith/arXiv arXiv
-
[19]
International Conference on Learning Representations , year=
Liu, Hanxiao and Simonyan, Karen and Yang, Yiming , eprint=. International Conference on Learning Representations , year=
-
[20]
International Conference on Machine Learning , pages=
Efficient neural architecture search via parameters sharing , author=. International Conference on Machine Learning , pages=. 2018 , editor =
work page 2018
-
[21]
Lu, Zhichao and Whalen, Ian and Boddeti, Vishnu and Dhebar, Yashesh and Deb, Kalyanmoy and Goodman, Erik and Banzhaf, Wolfgang , booktitle=
-
[22]
Efficient multi-objective neural architecture search via
Elsken, Thomas and Metzen, Jan Hendrik and Hutter, Frank , eprint=. Efficient multi-objective neural architecture search via. International Conference on Learning Representations , year=
-
[23]
Tan, Mingxing and Chen, Bo and Pang, Ruoming and Vasudevan, Vijay and Sandler, Mark and Howard, Andrew and Le, Quoc V , booktitle=
-
[24]
International Conference on Learning Representations , year=
Cai, Han and Zhu, Ligeng and Han, Song , eprint=. International Conference on Learning Representations , year=
-
[25]
Wu, Bichen and Dai, Xiaoliang and Zhang, Peizhao and Wang, Yanghan and Sun, Fei and Wu, Yiming and Tian, Yuandong and Vajda, Peter and Jia, Yangqing and Keutzer, Kurt , booktitle=
-
[26]
Once-for-all: Train one network and specialize it for efficient deployment , author=. 1908.09791 , year=
-
[27]
efficiency: Achieving both through fpga-implementation aware neural architecture search , author=
Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search , author=. Proceedings of the 56th Annual Design Automation Conference 2019 , pages=
work page 2019
-
[28]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Apq: Joint search for network architecture, pruning and quantization policy , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
- [29]
-
[30]
International Conference on Learning Representations , year=
Li, Chaojian and Yu, Zhongzhi and Fu, Yonggan and Zhang, Yongan and Zhao, Yang and You, Haoran and Yu, Qixuan and Wang, Yue and Lin, Yingyan Celine , eprint=. International Conference on Learning Representations , year=
-
[31]
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
Multi-objective hardware aware neural architecture search using hardware cost diversity , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
-
[32]
Zhang, Li Lyna and Han, Shihao and Wei, Jianyu and Zheng, Ningxin and Cao, Ting and Yang, Yuqing and Liu, Yunxin , booktitle=. 2021 , doi =
work page 2021
-
[33]
Dudziak, Lukasz and Chau, Thomas and Abdelfattah, Mohamed and Lee, Royson and Kim, Hyeji and Lane, Nicholas , journal=
-
[34]
Proceedings of Machine Learning and Systems , volume=
On latency predictors for neural architecture search , author=. Proceedings of Machine Learning and Systems , volume=
-
[35]
Advances in neural information processing systems , volume=
Efficient and robust automated machine learning , author=. Advances in neural information processing systems , volume=
-
[36]
Jin, Haifeng and Chollet, Fran. J. Mach. Learn. Res. , volume=. 2023 , url =
work page 2023
-
[37]
Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl , author=. IEEE Trans. Pattern Anal. Mach. Intell. , volume=. 2021 , publisher=
work page 2021
-
[38]
Proceedings of machine learning and systems , volume=
Flaml: A fast and lightweight automl library , author=. Proceedings of machine learning and systems , volume=
-
[39]
Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=
-
[40]
Fast inference of deep neural networks in FPGAs for particle physics
Duarte, Javier and others. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrum. 2018. doi:10.1088/1748-0221/13/07/P07027. arXiv:1804.06913
-
[41]
Tang, Zhiqiang and Fang, Haoyang and Zhou, Su and Yang, Taojiannan and Zhong, Zihan and Hu, Tony and Kirchhoff, Katrin and Karypis, George , eprint=. 2024 , booktitle =
work page 2024
-
[42]
Yang, Zekang and Zeng, Wang and Jin, Sheng and Qian, Chen and Luo, Ping and Liu, Wentao , booktitle=
-
[43]
End-to-end workflow for machine learning-based qubit readout with. IEEE Trans. Quantum Eng. 2025. doi:10.1109/TQE.2025.3604712. arXiv:2501.14663 , primaryclass =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.