Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

Chenyun Yin; Renze Chen; Ruifan Xu; Yansong Xu; Yitian Sun; Youwei Xiao; Yuhao Luo; Yun Liang; Yuyang Zou

arxiv: 2511.22267 · v2 · submitted 2025-11-27 · 💻 cs.AR

Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

Yuyang Zou , Youwei Xiao , Chenyun Yin , Yansong Xu , Yuhao Luo , Yitian Sun , Ruifan Xu , Renze Chen

show 1 more author

Yun Liang

This is my paper

Pith reviewed 2026-05-17 05:04 UTC · model grok-4.3

classification 💻 cs.AR

keywords ASIPRISC-Vhardware-software co-designMLIRcustom instructionsdomain accelerationmemory interface modelinge-graph compiler

0 comments

The pith

Aquas offers a MLIR-based co-design framework that models memory interfaces with cache awareness and uses e-graph compilation to automate custom instruction offloading for RISC-V ASIPs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that existing ASIP design flows struggle with memory bottlenecks and complex custom instructions, limiting automation in emerging domains. Aquas addresses this through a joint memory interface model and an e-graph retargetable compiler that maps and offloads instructions effectively. If the approach holds, it would enable substantial performance gains across applications while keeping hardware costs low and frequency intact. A sympathetic reader would care because this reduces reliance on manual tuning and fixed extension interfaces, making domain specialization more practical as applications grow in complexity.

Core claim

Aquas proposes a memory interface model that jointly considers interface characteristics and cache effects, along with an interface-aware synthesis flow that optimizes the input specification and generates efficient hardware. It also introduces an e-graph-based retargetable compiler with a novel matching engine for robust instruction mapping and offloading. Case studies in four domains demonstrate up to 15.61x speedup with 14.5 percent area overhead and zero frequency degradation, remaining competitive against stronger general-purpose cores and vector extensions.

What carries the argument

The memory interface model that accounts for both interface traits and cache effects, paired with an e-graph-based retargetable compiler featuring a novel matching engine for automated instruction offloading.

If this is right

Memory access can be optimized progressively during synthesis rather than treated as an afterthought.
Custom instructions with non-trivial control and memory behavior become viable for automated offloading.
Domain accelerators achieve high speedups while using less area than more powerful general cores.
The framework supports multiple diverse applications through a single retargetable compilation approach.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This style of co-optimization might shorten the iteration cycle between hardware generation and software adaptation in RISC-V ecosystems.
If the matching engine proves robust, similar e-graph techniques could apply to other compiler targets beyond MLIR dialects.
The low area overhead suggests the approach could scale to systems with multiple specialized extensions without compounding hardware costs.

Load-bearing premise

The memory interface model and e-graph compiler will generalize to new domains without needing extensive manual tuning or post-hoc fixes.

What would settle it

Applying Aquas to a fifth domain with highly irregular memory patterns and complex control logic, then measuring whether speedups remain above 5x without developer adjustments to the models or engine.

Figures

Figures reproduced from arXiv: 2511.22267 by Chenyun Yin, Renze Chen, Ruifan Xu, Yansong Xu, Yitian Sun, Youwei Xiao, Yuhao Luo, Yun Liang, Yuyang Zou.

**Figure 1.** Figure 1: Overview of the unified toolchain in Aquas. to build the cross-level flow for both the hardware synthesis and compiler support. Specifically, Aquas introduces the aquas dialect, which extends existing features of [25] with operations for fast memory access and optimization directives (Section 4.1). At the microarchitecture level, Aquas introduces a burst DMA engine based on TileLink-UH for data transfers … view at source ↗

**Figure 2.** Figure 2: Synthesis flow of gemv using Aquas. It consists of (a) CADL input, (b) MLIR parsed from CADL including aquas dialect, and (c) synthesized hardware including DMA engine, scratchpad memory, and main execution pipeline. Optimization Directives. To exploit data-level parallelism enabled by efficient memory access, we introduce optimization directives to guide hardware synthesis. Available directives include … view at source ↗

**Figure 3.** Figure 3: End-to-end workflow of the Aquas retargetable compiler. ❶ to ❽ correspond to the steps for compiling an application. expressions, which blocks MLIR’s loop transformations. An internal rewrite first canonicalizes the index computation (i≪2⇒i*4). Subsequently, the compiler leverages a cost model to extract a program variant that avoids non-affine (e.g., "≪") access within loops, enabling more aggressive aff… view at source ↗

read the original abstract

Application-Specific Instruction-Set Processors (ASIPs) built on the RISC-V architecture offer specialization opportunities for various applications. Existing frameworks are largely designed around fixed instruction extension interfaces and rely on manual software adaptation. However, as emerging domains scale up in complexity, two major challenges arise. First, memory access remains a primary bottleneck as existing design flows lack architectural awareness of memory interfaces, leading to suboptimal interface selection and orchestration. Second, the semantic complexity of custom instruction extensions, characterized by non-trivial control logic and irregular memory behaviors, hinders the ability of conventional compilers to perform automated and comprehensive offloading. We present Aquas, a holistic hardware-software co-design framework built upon MLIR. Aquas proposes a memory interface model that jointly considers interface characteristics and cache effects, along with an interface-aware synthesis flow guided by this model that progressively optimizes the input specification and generates efficient hardware implementations. We also propose an e-graph-based retargetable compiler approach with a novel matching engine for efficient instruction mapping and offloading, enabling robust and effective utilization of custom instruction capabilities. Case studies across four diverse domains show that Aquas delivers substantial acceleration, achieving up to 15.61x speedup with 14.5% area overhead and zero frequency degradation, proving highly competitive in domain acceleration against more powerful general-purpose cores and vector extensions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Aquas adds a cache-aware memory model and e-graph matcher to MLIR for RISC-V ASIP flows, with case-study speedups that look useful but rest on unshown details about how much manual tuning drove the numbers.

read the letter

The core of this paper is a memory interface model that factors in both interface traits and cache behavior, paired with an interface-guided synthesis flow and an e-graph retargetable compiler that includes a new matching engine. Those pieces let them move beyond fixed-extension interfaces and handle more irregular custom instructions for ASIPs. The four-domain case studies report up to 15.61x speedup, 14.5% area overhead, and no frequency loss, which puts the results in a competitive range against stronger general-purpose cores and vector extensions.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Aquas, an MLIR-based holistic hardware-software co-design framework for RISC-V ASIPs. It introduces a memory interface model that jointly accounts for interface characteristics and cache effects, an interface-aware synthesis flow for hardware generation, and an e-graph-based retargetable compiler with a novel matching engine for automated instruction offloading. Case studies in four domains report up to 15.61x speedup, 14.5% area overhead, and zero frequency degradation, claiming competitiveness versus general-purpose cores and vector extensions.

Significance. If the central performance claims are substantiated by detailed ablations and reproducible methodology, the work could advance automated domain specialization by unifying memory-aware synthesis with compiler retargeting in a single MLIR infrastructure. The e-graph matching approach for irregular custom instructions represents a technically interesting direction that could reduce manual effort in ASIP flows.

major comments (2)

[Evaluation / Case Studies] Evaluation section (case studies): the headline results (up to 15.61x speedup, 14.5% area, zero frequency loss) are presented without ablations that isolate the contribution of the joint memory-interface/cache model versus the novel e-graph matcher, nor any count of manually added patterns per domain. This leaves open whether the reported gains derive from the automated co-optimization framework or from domain-specific manual tuning in the synthesis and matching rules.
[Memory Interface Model] Memory interface model description: the claim that the model 'jointly considers interface characteristics and cache effects' is central to addressing the stated memory bottleneck, yet the manuscript provides no quantitative comparison against prior interface-only models or sensitivity analysis on cache-effect parameters, making it impossible to verify that the model itself drives the observed interface selection improvements.

minor comments (2)

[Abstract / Introduction] The abstract and introduction use 'zero frequency degradation' without specifying the synthesis tool, target process node, or timing constraints under which this holds; add a sentence clarifying the experimental setup.
[Compiler Approach] Notation for the e-graph matching engine (e.g., cost functions or rewrite rules) is introduced without a compact summary table; a small table listing the novel matching heuristics would improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the major comments point by point below and describe the revisions we will incorporate to improve the manuscript.

read point-by-point responses

Referee: [Evaluation / Case Studies] Evaluation section (case studies): the headline results (up to 15.61x speedup, 14.5% area, zero frequency loss) are presented without ablations that isolate the contribution of the joint memory-interface/cache model versus the novel e-graph matcher, nor any count of manually added patterns per domain. This leaves open whether the reported gains derive from the automated co-optimization framework or from domain-specific manual tuning in the synthesis and matching rules.

Authors: We agree that explicit ablations would strengthen the evaluation by isolating component contributions. In the revised manuscript we will add ablations that separately disable the joint memory-interface/cache model and the e-graph matcher to quantify their individual effects on the reported speedups. We will also include a table reporting the number of manually added patterns per domain; these are limited to a small set of domain-specific edge cases, as the majority of instruction patterns are automatically discovered and matched by the e-graph engine. revision: yes
Referee: [Memory Interface Model] Memory interface model description: the claim that the model 'jointly considers interface characteristics and cache effects' is central to addressing the stated memory bottleneck, yet the manuscript provides no quantitative comparison against prior interface-only models or sensitivity analysis on cache-effect parameters, making it impossible to verify that the model itself drives the observed interface selection improvements.

Authors: We acknowledge the value of direct quantitative validation. The revised version will include a new subsection with comparisons of the joint model against prior interface-only models on the same benchmarks, plus sensitivity analysis varying cache-effect parameters (e.g., hit rates and latency multipliers) to show their influence on interface selection and overall performance. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework and results are independent of each other.

full rationale

The paper describes a hardware-software co-design framework using MLIR, a memory interface model, and an e-graph retargetable compiler, then reports empirical speedups from case studies on four domains. No equations, fitted parameters, or self-citations are shown that would make the performance numbers reduce to the inputs by construction. The claimed acceleration is presented as an outcome of applying the described components rather than a tautological renaming or load-bearing self-reference. The derivation chain for the models and compiler remains self-contained against the external benchmarks of the evaluated domains.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the memory interface model and e-graph engine are described at high level without stated assumptions or new postulated components.

pith-pipeline@v0.9.0 · 5566 in / 1178 out tokens · 26305 ms · 2026-05-17T05:04:15.795285+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Aquas proposes a memory interface model that jointly considers interface characteristics and cache effects, along with an interface-aware synthesis flow... e-graph-based retargetable compiler approach with a novel matching engine
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

up to 9.27× speedup... 34.41% area overhead

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages

[1]

AMD. 2025. AMD Vitis™HLS. https://www.amd.com/en/products/software/ adaptive-socs-and-fpgas/vitis/vitis-hls.html

work page 2025
[2]

Giorgos Armeniakos, Alexis Maras, Sotirios Xydis, and Dimitrios Soudris. 2025. Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi- Pumped Soft SIMD Operations. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/367...

work page doi:10.1145/3676536.3676840 2025
[3]

2016.The Rocket Chip Generator

Krste Asanović, Rimas Avižienis, and Jonathan Bachrach. 2016.The Rocket Chip Generator. Technical Report UCB/EECS-2016-17. Berkeley, CA. http: //www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.html

work page 2016
[4]

Yaohui Cai, Kaixin Yang, Chenhui Deng, Cunxi Yu, and Zhiru Zhang. 2025. SmoothE: Differentiable E-Graph Extraction. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (ASPLOS ’25). Association for Computing Machinery, New York, NY, USA, 1020–1034. doi:10.1145/3669940.3707262

work page doi:10.1145/3669940.3707262 2025
[5]

Pham, and Peter Y

Hao Cheng, Georgios Fotiadis, Johann Großschädl, Daniel Page, Thinh H. Pham, and Peter Y. A. Ryan. 2024. RISC-V Instruction Set Extensions for Multi-Precision Integer Arithmetic: A Case Study on Post-Quantum Key Exchange Using CSIDH-

work page 2024
[6]

Association for Computing Machinery, New York, NY, USA, 1–6

InProceedings of the 61st ACM/IEEE Design Automation Conference (DAC ’24). Association for Computing Machinery, New York, NY, USA, 1–6. doi:10.1145/ 3649329.3657347

work page arXiv
[7]

Codasip. 2025. CodAL RISC-V architecture description language - Codasip. https://codasip.com/products/codasip-studio/codal/

work page 2025
[8]

Schuyler Eldridge, Prithayan Barua, Aliaksei Chapyzhenka, Adam Izraelevitz, Jack Koenig, Chris Lattner, Andrew Lenharth, George Leontiev, Fabian Schuiki, Ram Sunder, Andrew Young, and Richard Xia. 2021. MLIR as Hardware Compiler Infrastructure. InWorkshop on Open-Source EDA Technology (WOSET)

work page 2021
[9]

Gerasimos Gerogiannis, Stijn Eyerman, Evangelos Georganas, Wim Heirman, and Josep Torrellas. 2025. DECA: A Near-Core LLM Decompression Accelerator Grounded on a 3D Roofline Model. InProceedings of the 58th IEEE/ACM Interna- tional Symposium on Microarchitecture (MICRO ’25). Association for Computing Machinery, New York, NY, USA, 184–200. doi:10.1145/37258...

work page doi:10.1145/3725843.3756073 2025
[10]

Hoe and Arvind

James C. Hoe and Arvind. 2000. Synthesis of operation-centric hardware descrip- tions. InProceedings of the 2000 IEEE/ACM international conference on Computer- aided design (ICCAD ’00). IEEE Press, San Jose, California, 511–519

work page 2000
[11]

Lana Josipović. 2025. dynamatic: DHLS (Dynamic High-Level Synthesis) compiler based on MLIR. https://github.com/EPFL-LAP/dynamatic

work page 2025
[12]

Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Olek- sandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2–14. doi:10.1109/CGO51591.2021.9370308

work page doi:10.1109/cgo51591.2021.9370308 2021
[13]

Huimin Li, Nele Mentens, and Stjepan Picek. 2022. A scalable SIMD RISC-V based processor with customized vector extensions for CRYSTALS-kyber. InProceedings of the 59th ACM/IEEE Design Automation Conference (DAC ’22). Association for Computing Machinery, New York, NY, USA, 733–738. doi:10.1145/3489517. 3530552

work page doi:10.1145/3489517 2022
[14]

Jules Merckx, Alexandre Lopoukhine, Samuel Coward, Jianyi Cheng, Bjorn De Sutter, and Tobias Grosser. 2025. eqsat: An Equality Saturation Dialect for Non- destructive Rewriting. doi:10.48550/arXiv.2505.09363 arXiv:2505.09363 [cs]

work page doi:10.48550/arxiv.2505.09363 2025
[15]

Julian Oppermann, Brindusa Mihaela Damian-Kosterhon, Florian Meisel, Tammo Mürmann, Eyck Jentzsch, and Andreas Koch. 2024. Longnail: High-Level Syn- thesis of Portable Custom Instruction Set Extensions for RISC-V Processors from Descriptions in the Open-Source CoreDSL Language. InProceedings of the 29th ACM International Conference on Architectural Suppor...

work page doi:10.1145/3620666.3651375 2024
[16]

Tianwei Pan, Tianao Dai, Jianlei Yang, Hongbin Jing, Yang Su, Zeyu Hao, Xi- aotao Jia, Chunming Hu, and Weisheng Zhao. 2025. Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design. InProceedings of the 52nd Annual International Symposium on Computer Architec- ture. ACM, Tokyo Japan, 65–77. doi:10.1145/3695053.3731033

work page doi:10.1145/3695053.3731033 2025
[17]

Radu Bogdan Rusu and Steve Cousins. 2011. 3D is here: Point Cloud Library (PCL). InIEEE International Conference on Robotics and Automation (ICRA). IEEE, Shanghai, China

work page 2011
[18]

Paul Scheffler, Luca Colagrande, and Luca Benini. 2024. SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indi- rect Stream Registers. InProceedings of the 61st ACM/IEEE Design Automation Conference (DAC ’24). Association for Computing Machinery, New York, NY, USA, 1–6. doi:10.1145/3649329.3658494

work page doi:10.1145/3649329.3658494 2024
[19]

Synopsys, Inc. 2025. ASIP Designer. https://www.synopsys.com/dw/ipdir.php? ds=asip-designer

work page 2025
[20]

Simpson, Fadi Alzammar, Liam Cooper, and Hyesoon Kim

Blaise Tine, Varun Saxena, Santosh Srivatsan, Joshua R. Simpson, Fadi Alzammar, Liam Cooper, and Hyesoon Kim. 2023. Skybox: Open-Source Graphic Rendering on Programmable RISC-V GPUs. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Compu...

work page doi:10.1145/3582016.3582024 2023
[21]

Blaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, and Kim Hyesoon

work page
[22]

Post-Fabrication Microarchitecture,

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics. InMICRO- 54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). Association for Computing Machinery, New York, NY, USA, 754–766. doi:10. 1145/3466752.3480128

work page arXiv
[23]

Johan Van Praet, Dirk Lanneer, Werner Geurts, and Gert Goossens. 2008. Chapter 4 - nML: A Structural Processor Modeling Language for Retargetable Compilation and ASIP Design. InProcessor Description Languages. Systems on Silicon, Vol. 1. Morgan Kaufmann, Burlington, 65–93. doi:10.1016/B978-012374287-2.50007-0

work page doi:10.1016/b978-012374287-2.50007-0 2008
[24]

Veripool. 2025. Veripool. https://www.veripool.org/verilator/

work page 2025
[25]

Patterson, and Krste Asanović

Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0. Technical Report UCB/EECS-2014-54. EECS Department, University of California, Berkeley, Berkeley, CA. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014- 54.html

work page 2014
[26]

egg: Fast and Extensible Equality Saturation , url =

Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. egg: Fast and extensible equality saturation.Artifact for "Fast and Extensible Equality Saturation"5, POPL (Jan. 2021), 23:1–23:29. doi:10.1145/3434304

work page doi:10.1145/3434304 2021
[27]

Youwei Xiao, Yuyang Zou, Yansong Xu, Yuhao Luo, Yitian Sun, Chenyun Yin, Ruifan Xu, Renze Chen, and Yun Liang. 2025. APS: Open-Source Hardware- Software Co-Design Framework for Agile Processor Specialization. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). Mu- nich, Germany

work page 2025
[28]

Ruifan Xu, Youwei Xiao, Jin Luo, and Yun Liang. 2022. HECTOR: A Multi-level Intermediate Representation for Hardware Synthesis Methodologies. In2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD). San Diego, California, USA, 1–9. https://ieeexplore.ieee.org/document/10068908 ISSN: 1558-2434

work page arXiv 2022
[29]

En-Yu Yang, Tianyu Jia, David Brooks, and Gu-Yeon Wei. 2021. FlexACC: A Pro- grammable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference. In2021 IEEE 32nd International Conference on Application- specific Systems, Architectures and Processors (ASAP). 266–273. doi:10.1109/ ASAP52443.2021.00046 ISSN: 2160-052X

work page arXiv 2021
[30]

Hanchen Ye, HyeGang Jun, Hyunmin Jeong, Stephen Neuendorffer, and Deming Chen. 2022. ScaleHLS: a scalable high-level synthesis framework with multi-level transformations and optimizations: invited. InProceedings of the 59th ACM/IEEE Design Automation Conference (DAC ’22). Association for Computing Machinery, New York, NY, USA, 1355–1358. doi:10.1145/34895...

work page doi:10.1145/3489517.3530631 2022
[31]

Jiaqi Yin, Zhan Song, Chen Chen, Yaohui Cai, Zhiru Zhang, and Cunxi Yu. 2025. e-boost: Boosted E-Graph Extraction with Adaptive Heuristics and Exact Solving. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). Munich, Germany

work page 2025
[32]

Abd-El-Aziz Zayed and Christophe Dubach. 2025. DialEgg: Dialect-Agnostic MLIR Optimizer using Equality Saturation with Egglog. InProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization (CGO ’25). Association for Computing Machinery, New York, NY, USA, 271–283. doi:10.1145/3696443.3708957

work page doi:10.1145/3696443.3708957 2025
[33]

Better Together: Unifying Datalog and Equality Saturation

Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. 2023. Better Together: Unifying 7 Conference’17, July 2017, Washington, DC, USA Zou et al. and Yuyang Zou, Youwei Xiao, Yansong Xu, Chenyun Yin, Yuhao Luo, Yitian Sun, Ruifan Xu, Renze Chen and Yun Liang Datalog and Equality Saturation.Ar...

work page 2023
[34]

Jerry Zhao, Ben Korpan, Abraham Gonzalez, and Krste Asanovic. 2020. Son- icBOOM: The 3rd Generation Berkeley Out-of-Order Machine. InThe Fourth Workshop on RISC-V for Computer Architecture Research (CARRV). 8

work page 2020

[1] [1]

AMD. 2025. AMD Vitis™HLS. https://www.amd.com/en/products/software/ adaptive-socs-and-fpgas/vitis/vitis-hls.html

work page 2025

[2] [2]

Giorgos Armeniakos, Alexis Maras, Sotirios Xydis, and Dimitrios Soudris. 2025. Mixed-precision Neural Networks on RISC-V Cores: ISA extensions for Multi- Pumped Soft SIMD Operations. InProceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design. Association for Computing Machinery, New York, NY, USA, 1–9. https://doi.org/10.1145/367...

work page doi:10.1145/3676536.3676840 2025

[3] [3]

2016.The Rocket Chip Generator

Krste Asanović, Rimas Avižienis, and Jonathan Bachrach. 2016.The Rocket Chip Generator. Technical Report UCB/EECS-2016-17. Berkeley, CA. http: //www.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.html

work page 2016

[4] [4]

Yaohui Cai, Kaixin Yang, Chenhui Deng, Cunxi Yu, and Zhiru Zhang. 2025. SmoothE: Differentiable E-Graph Extraction. InProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1 (ASPLOS ’25). Association for Computing Machinery, New York, NY, USA, 1020–1034. doi:10.1145/3669940.3707262

work page doi:10.1145/3669940.3707262 2025

[5] [5]

Pham, and Peter Y

Hao Cheng, Georgios Fotiadis, Johann Großschädl, Daniel Page, Thinh H. Pham, and Peter Y. A. Ryan. 2024. RISC-V Instruction Set Extensions for Multi-Precision Integer Arithmetic: A Case Study on Post-Quantum Key Exchange Using CSIDH-

work page 2024

[6] [6]

Association for Computing Machinery, New York, NY, USA, 1–6

InProceedings of the 61st ACM/IEEE Design Automation Conference (DAC ’24). Association for Computing Machinery, New York, NY, USA, 1–6. doi:10.1145/ 3649329.3657347

work page arXiv

[7] [7]

Codasip. 2025. CodAL RISC-V architecture description language - Codasip. https://codasip.com/products/codasip-studio/codal/

work page 2025

[8] [8]

Schuyler Eldridge, Prithayan Barua, Aliaksei Chapyzhenka, Adam Izraelevitz, Jack Koenig, Chris Lattner, Andrew Lenharth, George Leontiev, Fabian Schuiki, Ram Sunder, Andrew Young, and Richard Xia. 2021. MLIR as Hardware Compiler Infrastructure. InWorkshop on Open-Source EDA Technology (WOSET)

work page 2021

[9] [9]

Gerasimos Gerogiannis, Stijn Eyerman, Evangelos Georganas, Wim Heirman, and Josep Torrellas. 2025. DECA: A Near-Core LLM Decompression Accelerator Grounded on a 3D Roofline Model. InProceedings of the 58th IEEE/ACM Interna- tional Symposium on Microarchitecture (MICRO ’25). Association for Computing Machinery, New York, NY, USA, 184–200. doi:10.1145/37258...

work page doi:10.1145/3725843.3756073 2025

[10] [10]

Hoe and Arvind

James C. Hoe and Arvind. 2000. Synthesis of operation-centric hardware descrip- tions. InProceedings of the 2000 IEEE/ACM international conference on Computer- aided design (ICCAD ’00). IEEE Press, San Jose, California, 511–519

work page 2000

[11] [11]

Lana Josipović. 2025. dynamatic: DHLS (Dynamic High-Level Synthesis) compiler based on MLIR. https://github.com/EPFL-LAP/dynamatic

work page 2025

[12] [12]

Chris Lattner, Mehdi Amini, Uday Bondhugula, Albert Cohen, Andy Davis, Jacques Pienaar, River Riddle, Tatiana Shpeisman, Nicolas Vasilache, and Olek- sandr Zinenko. 2021. MLIR: Scaling Compiler Infrastructure for Domain Specific Computation. In2021 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2–14. doi:10.1109/CGO51591.2021.9370308

work page doi:10.1109/cgo51591.2021.9370308 2021

[13] [13]

Huimin Li, Nele Mentens, and Stjepan Picek. 2022. A scalable SIMD RISC-V based processor with customized vector extensions for CRYSTALS-kyber. InProceedings of the 59th ACM/IEEE Design Automation Conference (DAC ’22). Association for Computing Machinery, New York, NY, USA, 733–738. doi:10.1145/3489517. 3530552

work page doi:10.1145/3489517 2022

[14] [14]

Jules Merckx, Alexandre Lopoukhine, Samuel Coward, Jianyi Cheng, Bjorn De Sutter, and Tobias Grosser. 2025. eqsat: An Equality Saturation Dialect for Non- destructive Rewriting. doi:10.48550/arXiv.2505.09363 arXiv:2505.09363 [cs]

work page doi:10.48550/arxiv.2505.09363 2025

[15] [15]

Julian Oppermann, Brindusa Mihaela Damian-Kosterhon, Florian Meisel, Tammo Mürmann, Eyck Jentzsch, and Andreas Koch. 2024. Longnail: High-Level Syn- thesis of Portable Custom Instruction Set Extensions for RISC-V Processors from Descriptions in the Open-Source CoreDSL Language. InProceedings of the 29th ACM International Conference on Architectural Suppor...

work page doi:10.1145/3620666.3651375 2024

[16] [16]

Tianwei Pan, Tianao Dai, Jianlei Yang, Hongbin Jing, Yang Su, Zeyu Hao, Xi- aotao Jia, Chunming Hu, and Weisheng Zhao. 2025. Finesse: An Agile Design Framework for Pairing-based Cryptography via Software/Hardware Co-Design. InProceedings of the 52nd Annual International Symposium on Computer Architec- ture. ACM, Tokyo Japan, 65–77. doi:10.1145/3695053.3731033

work page doi:10.1145/3695053.3731033 2025

[17] [17]

Radu Bogdan Rusu and Steve Cousins. 2011. 3D is here: Point Cloud Library (PCL). InIEEE International Conference on Robotics and Automation (ICRA). IEEE, Shanghai, China

work page 2011

[18] [18]

Paul Scheffler, Luca Colagrande, and Luca Benini. 2024. SARIS: Accelerating Stencil Computations on Energy-Efficient RISC-V Compute Clusters with Indi- rect Stream Registers. InProceedings of the 61st ACM/IEEE Design Automation Conference (DAC ’24). Association for Computing Machinery, New York, NY, USA, 1–6. doi:10.1145/3649329.3658494

work page doi:10.1145/3649329.3658494 2024

[19] [19]

Synopsys, Inc. 2025. ASIP Designer. https://www.synopsys.com/dw/ipdir.php? ds=asip-designer

work page 2025

[20] [20]

Simpson, Fadi Alzammar, Liam Cooper, and Hyesoon Kim

Blaise Tine, Varun Saxena, Santosh Srivatsan, Joshua R. Simpson, Fadi Alzammar, Liam Cooper, and Hyesoon Kim. 2023. Skybox: Open-Source Graphic Rendering on Programmable RISC-V GPUs. InProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3 (ASPLOS 2023). Association for Compu...

work page doi:10.1145/3582016.3582024 2023

[21] [21]

Blaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, and Kim Hyesoon

work page

[22] [22]

Post-Fabrication Microarchitecture,

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics. InMICRO- 54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’21). Association for Computing Machinery, New York, NY, USA, 754–766. doi:10. 1145/3466752.3480128

work page arXiv

[23] [23]

Johan Van Praet, Dirk Lanneer, Werner Geurts, and Gert Goossens. 2008. Chapter 4 - nML: A Structural Processor Modeling Language for Retargetable Compilation and ASIP Design. InProcessor Description Languages. Systems on Silicon, Vol. 1. Morgan Kaufmann, Burlington, 65–93. doi:10.1016/B978-012374287-2.50007-0

work page doi:10.1016/b978-012374287-2.50007-0 2008

[24] [24]

Veripool. 2025. Veripool. https://www.veripool.org/verilator/

work page 2025

[25] [25]

Patterson, and Krste Asanović

Andrew Waterman, Yunsup Lee, David A. Patterson, and Krste Asanović. 2014. The RISC-V Instruction Set Manual, Volume I: User-Level ISA, Version 2.0. Technical Report UCB/EECS-2014-54. EECS Department, University of California, Berkeley, Berkeley, CA. http://www2.eecs.berkeley.edu/Pubs/TechRpts/2014/EECS-2014- 54.html

work page 2014

[26] [26]

egg: Fast and Extensible Equality Saturation , url =

Max Willsey, Chandrakana Nandi, Yisu Remy Wang, Oliver Flatt, Zachary Tatlock, and Pavel Panchekha. 2021. egg: Fast and extensible equality saturation.Artifact for "Fast and Extensible Equality Saturation"5, POPL (Jan. 2021), 23:1–23:29. doi:10.1145/3434304

work page doi:10.1145/3434304 2021

[27] [27]

Youwei Xiao, Yuyang Zou, Yansong Xu, Yuhao Luo, Yitian Sun, Chenyun Yin, Ruifan Xu, Renze Chen, and Yun Liang. 2025. APS: Open-Source Hardware- Software Co-Design Framework for Agile Processor Specialization. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). Mu- nich, Germany

work page 2025

[28] [28]

Ruifan Xu, Youwei Xiao, Jin Luo, and Yun Liang. 2022. HECTOR: A Multi-level Intermediate Representation for Hardware Synthesis Methodologies. In2022 IEEE/ACM International Conference On Computer Aided Design (ICCAD). San Diego, California, USA, 1–9. https://ieeexplore.ieee.org/document/10068908 ISSN: 1558-2434

work page arXiv 2022

[29] [29]

En-Yu Yang, Tianyu Jia, David Brooks, and Gu-Yeon Wei. 2021. FlexACC: A Pro- grammable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference. In2021 IEEE 32nd International Conference on Application- specific Systems, Architectures and Processors (ASAP). 266–273. doi:10.1109/ ASAP52443.2021.00046 ISSN: 2160-052X

work page arXiv 2021

[30] [30]

Hanchen Ye, HyeGang Jun, Hyunmin Jeong, Stephen Neuendorffer, and Deming Chen. 2022. ScaleHLS: a scalable high-level synthesis framework with multi-level transformations and optimizations: invited. InProceedings of the 59th ACM/IEEE Design Automation Conference (DAC ’22). Association for Computing Machinery, New York, NY, USA, 1355–1358. doi:10.1145/34895...

work page doi:10.1145/3489517.3530631 2022

[31] [31]

Jiaqi Yin, Zhan Song, Chen Chen, Yaohui Cai, Zhiru Zhang, and Cunxi Yu. 2025. e-boost: Boosted E-Graph Extraction with Adaptive Heuristics and Exact Solving. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). Munich, Germany

work page 2025

[32] [32]

Abd-El-Aziz Zayed and Christophe Dubach. 2025. DialEgg: Dialect-Agnostic MLIR Optimizer using Equality Saturation with Egglog. InProceedings of the 23rd ACM/IEEE International Symposium on Code Generation and Optimization (CGO ’25). Association for Computing Machinery, New York, NY, USA, 271–283. doi:10.1145/3696443.3708957

work page doi:10.1145/3696443.3708957 2025

[33] [33]

Better Together: Unifying Datalog and Equality Saturation

Yihong Zhang, Yisu Remy Wang, Oliver Flatt, David Cao, Philip Zucker, Eli Rosenthal, Zachary Tatlock, and Max Willsey. 2023. Better Together: Unifying 7 Conference’17, July 2017, Washington, DC, USA Zou et al. and Yuyang Zou, Youwei Xiao, Yansong Xu, Chenyun Yin, Yuhao Luo, Yitian Sun, Ruifan Xu, Renze Chen and Yun Liang Datalog and Equality Saturation.Ar...

work page 2023

[34] [34]

Jerry Zhao, Ben Korpan, Abraham Gonzalez, and Krste Asanovic. 2020. Son- icBOOM: The 3rd Generation Berkeley Out-of-Order Machine. InThe Fourth Workshop on RISC-V for Computer Architecture Research (CARRV). 8

work page 2020