pith. sign in

arxiv: 2605.16138 · v1 · pith:W2Q2SF6Znew · submitted 2026-05-15 · 💻 cs.LG · cs.AI· hep-ex

Surrogate Neural Architecture Codesign Package (SNAC-Pack)

Pith reviewed 2026-05-20 19:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AIhep-ex
keywords neural architecture searchhardware-aware optimizationFPGA deploymentsurrogate modeljet classificationqubit readoutmodel compressionmulti-objective search
0
0 comments X

The pith

SNAC-Pack automates search for neural networks that match accuracy on physics tasks while using fewer FPGA resources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an automated framework for finding neural network designs that work well on tasks like particle classification while fitting within the limited resources of FPGAs. Standard neural architecture search often focuses only on accuracy or uses rough proxies for hardware cost that do not reflect real FPGA requirements such as lookup tables, DSP blocks, and latency. A surrogate model supplies quick estimates of those costs for each candidate design, letting the search evaluate many options without running expensive full synthesis on every trial. After the search identifies promising architectures, further compression steps refine them before final conversion to FPGA firmware. Tests on jet classification and qubit readout produce models that perform at or above baseline levels with lower hardware demands and far less manual effort than traditional design processes.

Core claim

SNAC-Pack discovers compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.

What carries the argument

A hardware surrogate model that supplies per-trial estimates of FPGA resource utilization and latency, allowing multi-objective search to explore many architectures without repeated full synthesis.

If this is right

  • Selected architectures can undergo additional quantization-aware training and pruning to meet tighter resource budgets before synthesis.
  • Parallel search across multiple compute nodes becomes practical because the surrogate removes the dominant synthesis cost from the loop.
  • The same pipeline applies to new datasets through configuration files without changes to the core code.
  • Final FPGA firmware achieves the target accuracy within a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same surrogate-guided search could extend to other hardware targets where full evaluation is costly, such as ASICs or custom accelerators.
  • Adding power or thermal predictions to the surrogate would allow optimization for a broader set of deployment constraints.
  • Running the method on additional high-energy physics or sensing tasks could test whether the time savings generalize beyond the two demonstrated cases.

Load-bearing premise

The hardware surrogate model produces sufficiently accurate per-trial estimates of FPGA resource utilization and latency that correlate well with post-synthesis results.

What would settle it

Synthesize the top architectures selected by the search and measure their actual FPGA resource counts and latency; large mismatches with the surrogate predictions or worse final performance would falsify the approach.

Figures

Figures reproduced from arXiv: 2605.16138 by Aaron Wang, Benjamin Hawks, Dmitri Demler, Jason Weitz, Javier Duarte, Nhan Tran.

Figure 1
Figure 1. Figure 1: YAML (or optional MCP) drives global multi-objective search with surrogate hardware scores, local QAT and iterative magnitude pruning, and hls4ml synthesis. scores global-search trials with learned surrogate estimates of utilization and latency (Rahimifar et al., 2025) instead of relying on BOPs alone, while preserving multi-objective Optuna (Akiba et al., 2019) search with NSGA-II (Deb et al., 2002), loca… view at source ↗
Figure 2
Figure 2. Figure 2: Pareto fronts obtained during jet classification global search. SNAC-Pack optimizes estimated hardware-aware objectives and accuracy, while NAC optimizes BOPs and accuracy. Each point represents a sampled architecture [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Pareto fronts obtained during qubit readout global search. SNAC-Pack optimizes estimated hardware-aware objectives, readout fidelity, and BOPs. Each point represents a sampled architecture. 5 Conclusion This work introduced SNAC-Pack, an open-source AutoML pipeline for hardware-aware neural architecture search, model compression, and FPGA deployment. Built on Optuna and parallel trial workers, SNAC-Pack ex… view at source ↗
read the original abstract

Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture Codesign Package (SNAC-Pack), an open-source AutoML framework for hardware-aware neural architecture codesign and end-to-end FPGA deployment. SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II, loading trials to a shared SQLite store that enables parallel workers across compute nodes. A hardware surrogate model outputs per-trial resource and latency estimates, avoiding the synthesis cost that would otherwise dominate the search loop. A local search stage then applies quantization-aware training (QAT) together with iterative magnitude pruning in a combined compression loop, after which the final model is synthesized to FPGA firmware via the hls4ml Python library. A YAML configuration and an optional agentic frontend let users run the pipeline on new datasets without modifying the framework. We demonstrate SNAC-Pack on jet classification at the Large Hadron Collider and superconducting qubit readout, discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents SNAC-Pack, an open-source AutoML framework for hardware-aware neural architecture codesign targeting FPGA deployment. It performs multi-objective global search using Optuna and NSGA-II over a shared SQLite store for parallel execution, employs a hardware surrogate model to estimate per-trial FPGA resources (LUTs, DSPs, BRAM, FFs) and latency without repeated synthesis, applies a local compression stage combining quantization-aware training and iterative magnitude pruning, and generates final firmware via hls4ml. YAML configuration and an optional agentic frontend support reuse on new datasets. Demonstrations on LHC jet classification and superconducting qubit readout claim compact architectures that match or exceed baselines on task metrics while reducing resource utilization and, for the qubit case, shrinking manual design exploration from months to hours of automated search.

Significance. If the surrogate predictions prove reliable, SNAC-Pack could meaningfully accelerate hardware-software codesign for scientific edge applications by replacing dominant synthesis costs with fast surrogate estimates inside a multi-objective loop. The open-source release, parallel worker support via SQLite, end-to-end pipeline from search to hls4ml firmware, and user-configurable YAML interface are concrete strengths that aid reproducibility and adoption. The work usefully bridges accuracy-only NAS with multi-dimensional FPGA budgeting on two practically relevant tasks.

major comments (2)
  1. [Hardware Surrogate Model] Hardware Surrogate Model section: no table or figure reports quantitative validation metrics (MAE, R², or Spearman rank correlation) between surrogate predictions of LUT/DSP/BRAM/FF/latency and post-synthesis results on a held-out set of architectures or on the final Pareto front. This validation is load-bearing for the central claim that surrogate-guided search yields genuine resource reductions rather than artifacts of surrogate bias.
  2. [Qubit Readout Experiments] Qubit readout results subsection: the reduction of design-space exploration from months of manual fine-tuning to hours of automated search is stated without reporting search-space cardinality, number of trials executed, wall-clock time per trial, or a quantified baseline manual process, preventing assessment of the claimed time savings.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'positive outcomes on two tasks' would be strengthened by naming the concrete task metrics (e.g., accuracy or AUC) and the magnitude of resource or latency improvements.
  2. [Figures] Figure captions for Pareto fronts: adding a second set of points or error bands showing surrogate versus post-synthesis values for the selected architectures would improve interpretability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below and indicate the revisions planned for the next version of the paper.

read point-by-point responses
  1. Referee: Hardware Surrogate Model section: no table or figure reports quantitative validation metrics (MAE, R², or Spearman rank correlation) between surrogate predictions of LUT/DSP/BRAM/FF/latency and post-synthesis results on a held-out set of architectures or on the final Pareto front. This validation is load-bearing for the central claim that surrogate-guided search yields genuine resource reductions rather than artifacts of surrogate bias.

    Authors: We agree that quantitative validation metrics for the hardware surrogate are necessary to substantiate the central claims. In the revised manuscript we will add a table (or figure) in the Hardware Surrogate Model section reporting MAE, R², and Spearman rank correlation for each predicted quantity (LUTs, DSPs, BRAM, FFs, and latency). Metrics will be shown both on an independent held-out set of architectures and on the final Pareto-front models after post-synthesis verification. This addition will directly address the concern about possible surrogate bias. revision: yes

  2. Referee: Qubit readout results subsection: the reduction of design-space exploration from months of manual fine-tuning to hours of automated search is stated without reporting search-space cardinality, number of trials executed, wall-clock time per trial, or a quantified baseline manual process, preventing assessment of the claimed time savings.

    Authors: We acknowledge that the time-savings statement requires additional quantitative context. In the revision we will report the cardinality of the explored search space, the exact number of trials executed, and the average wall-clock time per trial (including the surrogate evaluation time). We will also expand the description of the manual baseline to list the typical steps and approximate effort required in our prior manual design process for the qubit readout task. While a precise side-by-side measurement of the manual effort was not recorded, the added details will make the efficiency comparison more transparent and assessable. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical framework description

full rationale

The paper describes an open-source AutoML framework (SNAC-Pack) that combines Optuna/NSGA-II search, a hardware surrogate for resource/latency estimates, QAT+pruning, and hls4ml synthesis. All claims rest on empirical demonstration across two tasks (jet classification and qubit readout), with reported outcomes (compact architectures, resource reductions, search-time speedup) obtained by running the pipeline rather than by any equation or fitted parameter that reduces to its own inputs by construction. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the work is self-contained against external benchmarks via post-synthesis verification and manual-tuning comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering framework paper rather than a derivation; the central claims rest on implementation choices and empirical results rather than unstated mathematical axioms or new postulated entities.

pith-pipeline@v0.9.0 · 5813 in / 1235 out tokens · 35887 ms · 2026-05-20T19:29:00.380881+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    Machine Learning: Science and Technology , abstract =

    Weitz, Jason and Demler, Dmitri and McDermott, Luke and Tran, Nhan and Duarte, Javier , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/adede1 , url =

  2. [2]

    Machine Learning: Science and Technology , abstract =

    Rahimifar, Mohammad Mehdi and Rahali, Hamza Ezzaoui and Therrien, Audrey C , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/ada71c , url =

  3. [3]

    2025 , eprint=

    wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation , author=. 2025 , eprint=

  4. [4]

    Odagiu, Patrick and Que, Zhiqiang and Duarte, Javier and Haller, Johannes and Kasieczka, Gregor and Lobanov, Artur and Loncar, Vladimir and Luk, Wayne and Ngadiuba, Jennifer and Pierini, Maurizio and Rincke, Philipp and Seksaria, Arpita and Summers, Sioni and Sznajder, Andre and Tapper, Alexander and Årrestad, Thea K , title =. Mach. Learn.: Sci. Technol....

  5. [5]

    and Pratap, A

    Deb, K. and Pratap, A. and Agarwal, S. and Meyarivan, T. , journal=. A fast and elitist multiobjective genetic algorithm:. 2002 , volume=

  6. [6]

    HLS4ML LHC jet dataset (150 particles)

    Pierini, Maurizio and Duarte, Javier Mauricio and Tran, Nhan and Freytsis, Marat , title =. doi:10.5281/zenodo.3602260 , url =

  7. [7]

    2112.06126 , year=

    Neural network quantization for efficient inference: A survey , author=. 2112.06126 , year=

  8. [8]

    and Tan, Kay Chen , journal=

    Liu, Yuqiao and Sun, Yanan and Xue, Bing and Zhang, Mengjie and Yen, Gary G. and Tan, Kay Chen , journal=. A Survey on Evolutionary Neural Architecture Search , year=

  9. [9]

    Neural Architecture Search with Reinforcement Learning

    Zoph, Barret and Le, Quoc V , title =. 1611.01578 , year =

  10. [10]

    Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =

    Wu, Nan and Yang, Hang and Xie, Yuan and Li, Pan and Hao, Cong , title =. Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =. 2022 , isbn =. doi:10.1145/3489517.3530408 , abstract =

  11. [11]

    Proceedings of the International Conference on Computer-Aided Design , articleno =

    O'Neal, Kenneth and Liu, Mitch and Tang, Hans and Kalantar, Amin and DeRenard, Kennen and Brisk, Philip , title =. Proceedings of the International Conference on Computer-Aided Design , articleno =. 2018 , doi =

  12. [12]

    2016 , eprint=

    Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding , author=. 2016 , eprint=

  13. [13]

    International conference on machine learning , pages=

    Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

  14. [14]

    doi:10.1145/3801979 , eprint =

  15. [15]

    doi:10.5281/zenodo.1201549 , url =

    fastmachinelearning/hls4ml , year = 2024, publisher =. doi:10.5281/zenodo.1201549 , url =

  16. [16]

    doi:10.5281/zenodo.19202843 , url =

    Dmitri Demler and Jason Weitz and Javier Duarte and Daniel Cummings and Luke McDermott , title =. doi:10.5281/zenodo.19202843 , url =

  17. [17]

    Roy and Michael Carbin , title =

    Jonathan Frankle and Gintare Karolina Dziugaite and Daniel M. Roy and Michael Carbin , title =. International Conference on Machine Learning , pages =. 1912.05671 , year =

  18. [18]

    The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

    The lottery ticket hypothesis: Finding sparse, trainable neural networks , author=. 1803.03635 , booktitle =

  19. [19]

    International Conference on Learning Representations , year=

    Liu, Hanxiao and Simonyan, Karen and Yang, Yiming , eprint=. International Conference on Learning Representations , year=

  20. [20]

    International Conference on Machine Learning , pages=

    Efficient neural architecture search via parameters sharing , author=. International Conference on Machine Learning , pages=. 2018 , editor =

  21. [21]

    Lu, Zhichao and Whalen, Ian and Boddeti, Vishnu and Dhebar, Yashesh and Deb, Kalyanmoy and Goodman, Erik and Banzhaf, Wolfgang , booktitle=

  22. [22]

    Efficient multi-objective neural architecture search via

    Elsken, Thomas and Metzen, Jan Hendrik and Hutter, Frank , eprint=. Efficient multi-objective neural architecture search via. International Conference on Learning Representations , year=

  23. [23]

    Tan, Mingxing and Chen, Bo and Pang, Ruoming and Vasudevan, Vijay and Sandler, Mark and Howard, Andrew and Le, Quoc V , booktitle=

  24. [24]

    International Conference on Learning Representations , year=

    Cai, Han and Zhu, Ligeng and Han, Song , eprint=. International Conference on Learning Representations , year=

  25. [25]

    Wu, Bichen and Dai, Xiaoliang and Zhang, Peizhao and Wang, Yanghan and Sun, Fei and Wu, Yiming and Tian, Yuandong and Vajda, Peter and Jia, Yangqing and Keutzer, Kurt , booktitle=

  26. [26]

    1908.09791 , year=

    Once-for-all: Train one network and specialize it for efficient deployment , author=. 1908.09791 , year=

  27. [27]

    efficiency: Achieving both through fpga-implementation aware neural architecture search , author=

    Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search , author=. Proceedings of the 56th Annual Design Automation Conference 2019 , pages=

  28. [28]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Apq: Joint search for network architecture, pruning and quantization policy , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  29. [29]

    ACM Trans

    Multi-objective hardware-aware neural architecture search with Pareto rank-preserving surrogate models , author=. ACM Trans. Archit. Code Optim. , volume=

  30. [30]

    International Conference on Learning Representations , year=

    Li, Chaojian and Yu, Zhongzhi and Fu, Yonggan and Zhang, Yongan and Zhao, Yang and You, Haoran and Yu, Qixuan and Wang, Yue and Lin, Yingyan Celine , eprint=. International Conference on Learning Representations , year=

  31. [31]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Multi-objective hardware aware neural architecture search using hardware cost diversity , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  32. [32]

    2021 , doi =

    Zhang, Li Lyna and Han, Shihao and Wei, Jianyu and Zheng, Ningxin and Cao, Ting and Yang, Yuqing and Liu, Yunxin , booktitle=. 2021 , doi =

  33. [33]

    Dudziak, Lukasz and Chau, Thomas and Abdelfattah, Mohamed and Lee, Royson and Kim, Hyeji and Lane, Nicholas , journal=

  34. [34]

    Proceedings of Machine Learning and Systems , volume=

    On latency predictors for neural architecture search , author=. Proceedings of Machine Learning and Systems , volume=

  35. [35]

    Advances in neural information processing systems , volume=

    Efficient and robust automated machine learning , author=. Advances in neural information processing systems , volume=

  36. [36]

    Jin, Haifeng and Chollet, Fran. J. Mach. Learn. Res. , volume=. 2023 , url =

  37. [37]

    IEEE Trans

    Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl , author=. IEEE Trans. Pattern Anal. Mach. Intell. , volume=. 2021 , publisher=

  38. [38]

    Proceedings of machine learning and systems , volume=

    Flaml: A fast and lightweight automl library , author=. Proceedings of machine learning and systems , volume=

  39. [39]

    Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

    Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

  40. [40]

    Fast inference of deep neural networks in FPGAs for particle physics

    Duarte, Javier and others. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrum. 2018. doi:10.1088/1748-0221/13/07/P07027. arXiv:1804.06913

  41. [41]

    2024 , booktitle =

    Tang, Zhiqiang and Fang, Haoyang and Zhou, Su and Yang, Taojiannan and Zhong, Zihan and Hu, Tony and Kirchhoff, Katrin and Karypis, George , eprint=. 2024 , booktitle =

  42. [42]

    Yang, Zekang and Zeng, Wang and Jin, Sheng and Qian, Chen and Luo, Ping and Liu, Wentao , booktitle=

  43. [43]

    IEEE Trans

    End-to-end workflow for machine learning-based qubit readout with. IEEE Trans. Quantum Eng. 2025. doi:10.1109/TQE.2025.3604712. arXiv:2501.14663 , primaryclass =