Surrogate Neural Architecture Codesign Package (SNAC-Pack)

Aaron Wang; Benjamin Hawks; Dmitri Demler; Jason Weitz; Javier Duarte; Nhan Tran

arxiv: 2605.16138 · v1 · pith:W2Q2SF6Znew · submitted 2026-05-15 · 💻 cs.LG · cs.AI· hep-ex

Surrogate Neural Architecture Codesign Package (SNAC-Pack)

Jason Weitz , Dmitri Demler , Benjamin Hawks , Aaron Wang , Nhan Tran , Javier Duarte This is my paper

Pith reviewed 2026-05-20 19:29 UTC · model grok-4.3

classification 💻 cs.LG cs.AIhep-ex

keywords neural architecture searchhardware-aware optimizationFPGA deploymentsurrogate modeljet classificationqubit readoutmodel compressionmulti-objective search

0 comments

The pith

SNAC-Pack automates search for neural networks that match accuracy on physics tasks while using fewer FPGA resources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an automated framework for finding neural network designs that work well on tasks like particle classification while fitting within the limited resources of FPGAs. Standard neural architecture search often focuses only on accuracy or uses rough proxies for hardware cost that do not reflect real FPGA requirements such as lookup tables, DSP blocks, and latency. A surrogate model supplies quick estimates of those costs for each candidate design, letting the search evaluate many options without running expensive full synthesis on every trial. After the search identifies promising architectures, further compression steps refine them before final conversion to FPGA firmware. Tests on jet classification and qubit readout produce models that perform at or above baseline levels with lower hardware demands and far less manual effort than traditional design processes.

Core claim

SNAC-Pack discovers compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.

What carries the argument

A hardware surrogate model that supplies per-trial estimates of FPGA resource utilization and latency, allowing multi-objective search to explore many architectures without repeated full synthesis.

If this is right

Selected architectures can undergo additional quantization-aware training and pruning to meet tighter resource budgets before synthesis.
Parallel search across multiple compute nodes becomes practical because the surrogate removes the dominant synthesis cost from the loop.
The same pipeline applies to new datasets through configuration files without changes to the core code.
Final FPGA firmware achieves the target accuracy within a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same surrogate-guided search could extend to other hardware targets where full evaluation is costly, such as ASICs or custom accelerators.
Adding power or thermal predictions to the surrogate would allow optimization for a broader set of deployment constraints.
Running the method on additional high-energy physics or sensing tasks could test whether the time savings generalize beyond the two demonstrated cases.

Load-bearing premise

The hardware surrogate model produces sufficiently accurate per-trial estimates of FPGA resource utilization and latency that correlate well with post-synthesis results.

What would settle it

Synthesize the top architectures selected by the search and measure their actual FPGA resource counts and latency; large mismatches with the surrogate predictions or worse final performance would falsify the approach.

Figures

Figures reproduced from arXiv: 2605.16138 by Aaron Wang, Benjamin Hawks, Dmitri Demler, Jason Weitz, Javier Duarte, Nhan Tran.

**Figure 1.** Figure 1: YAML (or optional MCP) drives global multi-objective search with surrogate hardware scores, local QAT and iterative magnitude pruning, and hls4ml synthesis. scores global-search trials with learned surrogate estimates of utilization and latency (Rahimifar et al., 2025) instead of relying on BOPs alone, while preserving multi-objective Optuna (Akiba et al., 2019) search with NSGA-II (Deb et al., 2002), loca… view at source ↗

**Figure 2.** Figure 2: Pareto fronts obtained during jet classification global search. SNAC-Pack optimizes estimated hardware-aware objectives and accuracy, while NAC optimizes BOPs and accuracy. Each point represents a sampled architecture [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Pareto fronts obtained during qubit readout global search. SNAC-Pack optimizes estimated hardware-aware objectives, readout fidelity, and BOPs. Each point represents a sampled architecture. 5 Conclusion This work introduced SNAC-Pack, an open-source AutoML pipeline for hardware-aware neural architecture search, model compression, and FPGA deployment. Built on Optuna and parallel trial workers, SNAC-Pack ex… view at source ↗

read the original abstract

Neural architecture search (NAS) is a powerful approach for automating model design, but existing methods often optimize for accuracy alone or rely on proxy metrics such as bit operations (BOPs) that correlate poorly with hardware cost. This gap is particularly large for FPGA deployment, where cost is dominated by a multi-dimensional budget of lookup tables, DSPs, flip-flops, BRAM, and latency. We present the Surrogate Neural Architecture Codesign Package (SNAC-Pack), an open-source AutoML framework for hardware-aware neural architecture codesign and end-to-end FPGA deployment. SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II, loading trials to a shared SQLite store that enables parallel workers across compute nodes. A hardware surrogate model outputs per-trial resource and latency estimates, avoiding the synthesis cost that would otherwise dominate the search loop. A local search stage then applies quantization-aware training (QAT) together with iterative magnitude pruning in a combined compression loop, after which the final model is synthesized to FPGA firmware via the hls4ml Python library. A YAML configuration and an optional agentic frontend let users run the pipeline on new datasets without modifying the framework. We demonstrate SNAC-Pack on jet classification at the Large Hadron Collider and superconducting qubit readout, discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization and, in the qubit readout case, reducing the design space exploration process from months of manual fine-tuning to hours of automated search.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SNAC-Pack packages Optuna, NSGA-II, a hardware surrogate, and hls4ml into a parallel end-to-end FPGA NAS pipeline that can cut manual tuning time, but the resource savings rest on unshown surrogate accuracy.

read the letter

SNAC-Pack ties together existing pieces—Optuna for the search loop, NSGA-II for multi-objective tradeoffs on accuracy versus FPGA resources, a surrogate to skip synthesis on every trial, and hls4ml for the final firmware step—plus a shared SQLite store so workers can run in parallel across nodes. After the global search it adds a local stage of quantization-aware training and iterative pruning before deployment. The YAML setup and optional frontend make it straightforward to point at a new dataset without rewriting code. On the jet classification and qubit readout examples it reports compact models that hold task performance while lowering LUT, DSP, BRAM, and latency use, and it turns what had been months of manual iteration into hours of automated search. That integration and the parallel execution detail are the concrete additions over prior separate tools. The main soft spot is the surrogate. The speedup and resource claims only hold if its per-trial estimates of hardware cost track actual post-synthesis numbers closely enough that the Pareto front survives real implementation. The abstract gives no correlation plots, error statistics, or validation against synthesized designs, so it is hard to judge how much the reported gains might shrink once the surrogate is checked. This is aimed at people already doing ML deployment on FPGAs in high-energy physics or quantum readout who want a configurable starting point rather than building the loop from scratch. A reader who needs a working open-source workflow for hardware-aware search would get usable code and a clear pipeline. It has enough practical substance and domain focus to merit a serious referee rather than a desk reject, though the authors should be asked to add surrogate validation numbers in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript presents SNAC-Pack, an open-source AutoML framework for hardware-aware neural architecture codesign targeting FPGA deployment. It performs multi-objective global search using Optuna and NSGA-II over a shared SQLite store for parallel execution, employs a hardware surrogate model to estimate per-trial FPGA resources (LUTs, DSPs, BRAM, FFs) and latency without repeated synthesis, applies a local compression stage combining quantization-aware training and iterative magnitude pruning, and generates final firmware via hls4ml. YAML configuration and an optional agentic frontend support reuse on new datasets. Demonstrations on LHC jet classification and superconducting qubit readout claim compact architectures that match or exceed baselines on task metrics while reducing resource utilization and, for the qubit case, shrinking manual design exploration from months to hours of automated search.

Significance. If the surrogate predictions prove reliable, SNAC-Pack could meaningfully accelerate hardware-software codesign for scientific edge applications by replacing dominant synthesis costs with fast surrogate estimates inside a multi-objective loop. The open-source release, parallel worker support via SQLite, end-to-end pipeline from search to hls4ml firmware, and user-configurable YAML interface are concrete strengths that aid reproducibility and adoption. The work usefully bridges accuracy-only NAS with multi-dimensional FPGA budgeting on two practically relevant tasks.

major comments (2)

[Hardware Surrogate Model] Hardware Surrogate Model section: no table or figure reports quantitative validation metrics (MAE, R², or Spearman rank correlation) between surrogate predictions of LUT/DSP/BRAM/FF/latency and post-synthesis results on a held-out set of architectures or on the final Pareto front. This validation is load-bearing for the central claim that surrogate-guided search yields genuine resource reductions rather than artifacts of surrogate bias.
[Qubit Readout Experiments] Qubit readout results subsection: the reduction of design-space exploration from months of manual fine-tuning to hours of automated search is stated without reporting search-space cardinality, number of trials executed, wall-clock time per trial, or a quantified baseline manual process, preventing assessment of the claimed time savings.

minor comments (2)

[Abstract] Abstract: the phrase 'positive outcomes on two tasks' would be strengthened by naming the concrete task metrics (e.g., accuracy or AUC) and the magnitude of resource or latency improvements.
[Figures] Figure captions for Pareto fronts: adding a second set of points or error bands showing surrogate versus post-synthesis values for the selected architectures would improve interpretability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below and indicate the revisions planned for the next version of the paper.

read point-by-point responses

Referee: Hardware Surrogate Model section: no table or figure reports quantitative validation metrics (MAE, R², or Spearman rank correlation) between surrogate predictions of LUT/DSP/BRAM/FF/latency and post-synthesis results on a held-out set of architectures or on the final Pareto front. This validation is load-bearing for the central claim that surrogate-guided search yields genuine resource reductions rather than artifacts of surrogate bias.

Authors: We agree that quantitative validation metrics for the hardware surrogate are necessary to substantiate the central claims. In the revised manuscript we will add a table (or figure) in the Hardware Surrogate Model section reporting MAE, R², and Spearman rank correlation for each predicted quantity (LUTs, DSPs, BRAM, FFs, and latency). Metrics will be shown both on an independent held-out set of architectures and on the final Pareto-front models after post-synthesis verification. This addition will directly address the concern about possible surrogate bias. revision: yes
Referee: Qubit readout results subsection: the reduction of design-space exploration from months of manual fine-tuning to hours of automated search is stated without reporting search-space cardinality, number of trials executed, wall-clock time per trial, or a quantified baseline manual process, preventing assessment of the claimed time savings.

Authors: We acknowledge that the time-savings statement requires additional quantitative context. In the revision we will report the cardinality of the explored search space, the exact number of trials executed, and the average wall-clock time per trial (including the surrogate evaluation time). We will also expand the description of the manual baseline to list the typical steps and approximate effort required in our prior manual design process for the qubit readout task. While a precise side-by-side measurement of the manual effort was not recorded, the added details will make the efficiency comparison more transparent and assessable. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical framework description

full rationale

The paper describes an open-source AutoML framework (SNAC-Pack) that combines Optuna/NSGA-II search, a hardware surrogate for resource/latency estimates, QAT+pruning, and hls4ml synthesis. All claims rest on empirical demonstration across two tasks (jet classification and qubit readout), with reported outcomes (compact architectures, resource reductions, search-time speedup) obtained by running the pipeline rather than by any equation or fitted parameter that reduces to its own inputs by construction. No self-definitional loops, fitted-input predictions, or load-bearing self-citations appear in the derivation chain; the work is self-contained against external benchmarks via post-synthesis verification and manual-tuning comparisons.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an engineering framework paper rather than a derivation; the central claims rest on implementation choices and empirical results rather than unstated mathematical axioms or new postulated entities.

pith-pipeline@v0.9.0 · 5813 in / 1235 out tokens · 35887 ms · 2026-05-20T19:29:00.380881+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SNAC-Pack runs a multi-objective global search with Optuna and NSGA-II... A hardware surrogate model outputs per-trial resource and latency estimates
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

discovering compact architectures that match or exceed strong baselines on the task metric while reducing FPGA resource utilization

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

[1]

Machine Learning: Science and Technology , abstract =

Weitz, Jason and Demler, Dmitri and McDermott, Luke and Tran, Nhan and Duarte, Javier , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/adede1 , url =

work page doi:10.1088/2632-2153/adede1 2025
[2]

Machine Learning: Science and Technology , abstract =

Rahimifar, Mohammad Mehdi and Rahali, Hamza Ezzaoui and Therrien, Audrey C , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/ada71c , url =

work page doi:10.1088/2632-2153/ada71c 2025
[3]

2025 , eprint=

wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation , author=. 2025 , eprint=

work page 2025
[4]

Odagiu, Patrick and Que, Zhiqiang and Duarte, Javier and Haller, Johannes and Kasieczka, Gregor and Lobanov, Artur and Loncar, Vladimir and Luk, Wayne and Ngadiuba, Jennifer and Pierini, Maurizio and Rincke, Philipp and Seksaria, Arpita and Summers, Sioni and Sznajder, Andre and Tapper, Alexander and Årrestad, Thea K , title =. Mach. Learn.: Sci. Technol....

work page doi:10.1088/2632-2153/ad5f10
[5]

and Pratap, A

Deb, K. and Pratap, A. and Agarwal, S. and Meyarivan, T. , journal=. A fast and elitist multiobjective genetic algorithm:. 2002 , volume=

work page 2002
[6]

HLS4ML LHC jet dataset (150 particles)

Pierini, Maurizio and Duarte, Javier Mauricio and Tran, Nhan and Freytsis, Marat , title =. doi:10.5281/zenodo.3602260 , url =

work page doi:10.5281/zenodo.3602260
[7]

2112.06126 , year=

Neural network quantization for efficient inference: A survey , author=. 2112.06126 , year=

work page arXiv
[8]

and Tan, Kay Chen , journal=

Liu, Yuqiao and Sun, Yanan and Xue, Bing and Zhang, Mengjie and Yen, Gary G. and Tan, Kay Chen , journal=. A Survey on Evolutionary Neural Architecture Search , year=

work page
[9]

Neural Architecture Search with Reinforcement Learning

Zoph, Barret and Le, Quoc V , title =. 1611.01578 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[10]

Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =

Wu, Nan and Yang, Hang and Xie, Yuan and Li, Pan and Hao, Cong , title =. Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =. 2022 , isbn =. doi:10.1145/3489517.3530408 , abstract =

work page doi:10.1145/3489517.3530408 2022
[11]

Proceedings of the International Conference on Computer-Aided Design , articleno =

O'Neal, Kenneth and Liu, Mitch and Tang, Hans and Kalantar, Amin and DeRenard, Kennen and Brisk, Philip , title =. Proceedings of the International Conference on Computer-Aided Design , articleno =. 2018 , doi =

work page 2018
[12]

2016 , eprint=

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding , author=. 2016 , eprint=

work page 2016
[13]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015
[14]

doi:10.1145/3801979 , eprint =

work page doi:10.1145/3801979
[15]

doi:10.5281/zenodo.1201549 , url =

fastmachinelearning/hls4ml , year = 2024, publisher =. doi:10.5281/zenodo.1201549 , url =

work page doi:10.5281/zenodo.1201549 2024
[16]

doi:10.5281/zenodo.19202843 , url =

Dmitri Demler and Jason Weitz and Javier Duarte and Daniel Cummings and Luke McDermott , title =. doi:10.5281/zenodo.19202843 , url =

work page doi:10.5281/zenodo.19202843
[17]

Roy and Michael Carbin , title =

Jonathan Frankle and Gintare Karolina Dziugaite and Daniel M. Roy and Michael Carbin , title =. International Conference on Machine Learning , pages =. 1912.05671 , year =

work page arXiv 1912
[18]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

The lottery ticket hypothesis: Finding sparse, trainable neural networks , author=. 1803.03635 , booktitle =

work page internal anchor Pith review Pith/arXiv arXiv
[19]

International Conference on Learning Representations , year=

Liu, Hanxiao and Simonyan, Karen and Yang, Yiming , eprint=. International Conference on Learning Representations , year=

work page
[20]

International Conference on Machine Learning , pages=

Efficient neural architecture search via parameters sharing , author=. International Conference on Machine Learning , pages=. 2018 , editor =

work page 2018
[21]

Lu, Zhichao and Whalen, Ian and Boddeti, Vishnu and Dhebar, Yashesh and Deb, Kalyanmoy and Goodman, Erik and Banzhaf, Wolfgang , booktitle=

work page
[22]

Efficient multi-objective neural architecture search via

Elsken, Thomas and Metzen, Jan Hendrik and Hutter, Frank , eprint=. Efficient multi-objective neural architecture search via. International Conference on Learning Representations , year=

work page
[23]

Tan, Mingxing and Chen, Bo and Pang, Ruoming and Vasudevan, Vijay and Sandler, Mark and Howard, Andrew and Le, Quoc V , booktitle=

work page
[24]

International Conference on Learning Representations , year=

Cai, Han and Zhu, Ligeng and Han, Song , eprint=. International Conference on Learning Representations , year=

work page
[25]

Wu, Bichen and Dai, Xiaoliang and Zhang, Peizhao and Wang, Yanghan and Sun, Fei and Wu, Yiming and Tian, Yuandong and Vajda, Peter and Jia, Yangqing and Keutzer, Kurt , booktitle=

work page
[26]

1908.09791 , year=

Once-for-all: Train one network and specialize it for efficient deployment , author=. 1908.09791 , year=

work page arXiv 1908
[27]

efficiency: Achieving both through fpga-implementation aware neural architecture search , author=

Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search , author=. Proceedings of the 56th Annual Design Automation Conference 2019 , pages=

work page 2019
[28]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Apq: Joint search for network architecture, pruning and quantization policy , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[29]

ACM Trans

Multi-objective hardware-aware neural architecture search with Pareto rank-preserving surrogate models , author=. ACM Trans. Archit. Code Optim. , volume=

work page
[30]

International Conference on Learning Representations , year=

Li, Chaojian and Yu, Zhongzhi and Fu, Yonggan and Zhang, Yongan and Zhao, Yang and You, Haoran and Yu, Qixuan and Wang, Yue and Lin, Yingyan Celine , eprint=. International Conference on Learning Representations , year=

work page
[31]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Multi-objective hardware aware neural architecture search using hardware cost diversity , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[32]

2021 , doi =

Zhang, Li Lyna and Han, Shihao and Wei, Jianyu and Zheng, Ningxin and Cao, Ting and Yang, Yuqing and Liu, Yunxin , booktitle=. 2021 , doi =

work page 2021
[33]

Dudziak, Lukasz and Chau, Thomas and Abdelfattah, Mohamed and Lee, Royson and Kim, Hyeji and Lane, Nicholas , journal=

work page
[34]

Proceedings of Machine Learning and Systems , volume=

On latency predictors for neural architecture search , author=. Proceedings of Machine Learning and Systems , volume=

work page
[35]

Advances in neural information processing systems , volume=

Efficient and robust automated machine learning , author=. Advances in neural information processing systems , volume=

work page
[36]

Jin, Haifeng and Chollet, Fran. J. Mach. Learn. Res. , volume=. 2023 , url =

work page 2023
[37]

IEEE Trans

Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl , author=. IEEE Trans. Pattern Anal. Mach. Intell. , volume=. 2021 , publisher=

work page 2021
[38]

Proceedings of machine learning and systems , volume=

Flaml: A fast and lightweight automl library , author=. Proceedings of machine learning and systems , volume=

work page
[39]

Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

work page
[40]

Fast inference of deep neural networks in FPGAs for particle physics

Duarte, Javier and others. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrum. 2018. doi:10.1088/1748-0221/13/07/P07027. arXiv:1804.06913

work page doi:10.1088/1748-0221/13/07/p07027 2018
[41]

2024 , booktitle =

Tang, Zhiqiang and Fang, Haoyang and Zhou, Su and Yang, Taojiannan and Zhong, Zihan and Hu, Tony and Kirchhoff, Katrin and Karypis, George , eprint=. 2024 , booktitle =

work page 2024
[42]

Yang, Zekang and Zeng, Wang and Jin, Sheng and Qian, Chen and Luo, Ping and Liu, Wentao , booktitle=

work page
[43]

IEEE Trans

End-to-end workflow for machine learning-based qubit readout with. IEEE Trans. Quantum Eng. 2025. doi:10.1109/TQE.2025.3604712. arXiv:2501.14663 , primaryclass =

work page doi:10.1109/tqe.2025.3604712 2025

[1] [1]

Machine Learning: Science and Technology , abstract =

Weitz, Jason and Demler, Dmitri and McDermott, Luke and Tran, Nhan and Duarte, Javier , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/adede1 , url =

work page doi:10.1088/2632-2153/adede1 2025

[2] [2]

Machine Learning: Science and Technology , abstract =

Rahimifar, Mohammad Mehdi and Rahali, Hamza Ezzaoui and Therrien, Audrey C , title =. Machine Learning: Science and Technology , abstract =. 2025 , month =. doi:10.1088/2632-2153/ada71c , url =

work page doi:10.1088/2632-2153/ada71c 2025

[3] [3]

2025 , eprint=

wa-hls4ml: A Benchmark and Surrogate Models for hls4ml Resource and Latency Estimation , author=. 2025 , eprint=

work page 2025

[4] [4]

Odagiu, Patrick and Que, Zhiqiang and Duarte, Javier and Haller, Johannes and Kasieczka, Gregor and Lobanov, Artur and Loncar, Vladimir and Luk, Wayne and Ngadiuba, Jennifer and Pierini, Maurizio and Rincke, Philipp and Seksaria, Arpita and Summers, Sioni and Sznajder, Andre and Tapper, Alexander and Årrestad, Thea K , title =. Mach. Learn.: Sci. Technol....

work page doi:10.1088/2632-2153/ad5f10

[5] [5]

and Pratap, A

Deb, K. and Pratap, A. and Agarwal, S. and Meyarivan, T. , journal=. A fast and elitist multiobjective genetic algorithm:. 2002 , volume=

work page 2002

[6] [6]

HLS4ML LHC jet dataset (150 particles)

Pierini, Maurizio and Duarte, Javier Mauricio and Tran, Nhan and Freytsis, Marat , title =. doi:10.5281/zenodo.3602260 , url =

work page doi:10.5281/zenodo.3602260

[7] [7]

2112.06126 , year=

Neural network quantization for efficient inference: A survey , author=. 2112.06126 , year=

work page arXiv

[8] [8]

and Tan, Kay Chen , journal=

Liu, Yuqiao and Sun, Yanan and Xue, Bing and Zhang, Mengjie and Yen, Gary G. and Tan, Kay Chen , journal=. A Survey on Evolutionary Neural Architecture Search , year=

work page

[9] [9]

Neural Architecture Search with Reinforcement Learning

Zoph, Barret and Le, Quoc V , title =. 1611.01578 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =

Wu, Nan and Yang, Hang and Xie, Yuan and Li, Pan and Hao, Cong , title =. Proceedings of the 59th ACM/IEEE Design Automation Conference , pages =. 2022 , isbn =. doi:10.1145/3489517.3530408 , abstract =

work page doi:10.1145/3489517.3530408 2022

[11] [11]

Proceedings of the International Conference on Computer-Aided Design , articleno =

O'Neal, Kenneth and Liu, Mitch and Tang, Hans and Kalantar, Amin and DeRenard, Kennen and Brisk, Philip , title =. Proceedings of the International Conference on Computer-Aided Design , articleno =. 2018 , doi =

work page 2018

[12] [12]

2016 , eprint=

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding , author=. 2016 , eprint=

work page 2016

[13] [13]

International conference on machine learning , pages=

Batch normalization: Accelerating deep network training by reducing internal covariate shift , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015

[14] [14]

doi:10.1145/3801979 , eprint =

work page doi:10.1145/3801979

[15] [15]

doi:10.5281/zenodo.1201549 , url =

fastmachinelearning/hls4ml , year = 2024, publisher =. doi:10.5281/zenodo.1201549 , url =

work page doi:10.5281/zenodo.1201549 2024

[16] [16]

doi:10.5281/zenodo.19202843 , url =

Dmitri Demler and Jason Weitz and Javier Duarte and Daniel Cummings and Luke McDermott , title =. doi:10.5281/zenodo.19202843 , url =

work page doi:10.5281/zenodo.19202843

[17] [17]

Roy and Michael Carbin , title =

Jonathan Frankle and Gintare Karolina Dziugaite and Daniel M. Roy and Michael Carbin , title =. International Conference on Machine Learning , pages =. 1912.05671 , year =

work page arXiv 1912

[18] [18]

The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks

The lottery ticket hypothesis: Finding sparse, trainable neural networks , author=. 1803.03635 , booktitle =

work page internal anchor Pith review Pith/arXiv arXiv

[19] [19]

International Conference on Learning Representations , year=

Liu, Hanxiao and Simonyan, Karen and Yang, Yiming , eprint=. International Conference on Learning Representations , year=

work page

[20] [20]

International Conference on Machine Learning , pages=

Efficient neural architecture search via parameters sharing , author=. International Conference on Machine Learning , pages=. 2018 , editor =

work page 2018

[21] [21]

Lu, Zhichao and Whalen, Ian and Boddeti, Vishnu and Dhebar, Yashesh and Deb, Kalyanmoy and Goodman, Erik and Banzhaf, Wolfgang , booktitle=

work page

[22] [22]

Efficient multi-objective neural architecture search via

Elsken, Thomas and Metzen, Jan Hendrik and Hutter, Frank , eprint=. Efficient multi-objective neural architecture search via. International Conference on Learning Representations , year=

work page

[23] [23]

Tan, Mingxing and Chen, Bo and Pang, Ruoming and Vasudevan, Vijay and Sandler, Mark and Howard, Andrew and Le, Quoc V , booktitle=

work page

[24] [24]

International Conference on Learning Representations , year=

Cai, Han and Zhu, Ligeng and Han, Song , eprint=. International Conference on Learning Representations , year=

work page

[25] [25]

Wu, Bichen and Dai, Xiaoliang and Zhang, Peizhao and Wang, Yanghan and Sun, Fei and Wu, Yiming and Tian, Yuandong and Vajda, Peter and Jia, Yangqing and Keutzer, Kurt , booktitle=

work page

[26] [26]

1908.09791 , year=

Once-for-all: Train one network and specialize it for efficient deployment , author=. 1908.09791 , year=

work page arXiv 1908

[27] [27]

efficiency: Achieving both through fpga-implementation aware neural architecture search , author=

Accuracy vs. efficiency: Achieving both through fpga-implementation aware neural architecture search , author=. Proceedings of the 56th Annual Design Automation Conference 2019 , pages=

work page 2019

[28] [28]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Apq: Joint search for network architecture, pruning and quantization policy , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[29] [29]

ACM Trans

Multi-objective hardware-aware neural architecture search with Pareto rank-preserving surrogate models , author=. ACM Trans. Archit. Code Optim. , volume=

work page

[30] [30]

International Conference on Learning Representations , year=

Li, Chaojian and Yu, Zhongzhi and Fu, Yonggan and Zhang, Yongan and Zhao, Yang and You, Haoran and Yu, Qixuan and Wang, Yue and Lin, Yingyan Celine , eprint=. International Conference on Learning Representations , year=

work page

[31] [31]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Multi-objective hardware aware neural architecture search using hardware cost diversity , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page

[32] [32]

2021 , doi =

Zhang, Li Lyna and Han, Shihao and Wei, Jianyu and Zheng, Ningxin and Cao, Ting and Yang, Yuqing and Liu, Yunxin , booktitle=. 2021 , doi =

work page 2021

[33] [33]

Dudziak, Lukasz and Chau, Thomas and Abdelfattah, Mohamed and Lee, Royson and Kim, Hyeji and Lane, Nicholas , journal=

work page

[34] [34]

Proceedings of Machine Learning and Systems , volume=

On latency predictors for neural architecture search , author=. Proceedings of Machine Learning and Systems , volume=

work page

[35] [35]

Advances in neural information processing systems , volume=

Efficient and robust automated machine learning , author=. Advances in neural information processing systems , volume=

work page

[36] [36]

Jin, Haifeng and Chollet, Fran. J. Mach. Learn. Res. , volume=. 2023 , url =

work page 2023

[37] [37]

IEEE Trans

Auto-pytorch: Multi-fidelity metalearning for efficient and robust autodl , author=. IEEE Trans. Pattern Anal. Mach. Intell. , volume=. 2021 , publisher=

work page 2021

[38] [38]

Proceedings of machine learning and systems , volume=

Flaml: A fast and lightweight automl library , author=. Proceedings of machine learning and systems , volume=

work page

[39] [39]

Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

Optuna: A next-generation hyperparameter optimization framework , author=. Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining , pages=

work page

[40] [40]

Fast inference of deep neural networks in FPGAs for particle physics

Duarte, Javier and others. Fast inference of deep neural networks in FPGAs for particle physics. J. Instrum. 2018. doi:10.1088/1748-0221/13/07/P07027. arXiv:1804.06913

work page doi:10.1088/1748-0221/13/07/p07027 2018

[41] [41]

2024 , booktitle =

Tang, Zhiqiang and Fang, Haoyang and Zhou, Su and Yang, Taojiannan and Zhong, Zihan and Hu, Tony and Kirchhoff, Katrin and Karypis, George , eprint=. 2024 , booktitle =

work page 2024

[42] [42]

Yang, Zekang and Zeng, Wang and Jin, Sheng and Qian, Chen and Luo, Ping and Liu, Wentao , booktitle=

work page

[43] [43]

IEEE Trans

End-to-end workflow for machine learning-based qubit readout with. IEEE Trans. Quantum Eng. 2025. doi:10.1109/TQE.2025.3604712. arXiv:2501.14663 , primaryclass =

work page doi:10.1109/tqe.2025.3604712 2025