LLMForge is a NAS framework with Infinite-Head Attention, a Forge-Former surrogate, and Forge-DSE engine that discovers hardware-specific architectures for edge language models, yielding variants with improved accuracy, energy, or latency on different substrates.
hub
In: ACM/IEEE Design Automation Con- ference
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
Standard visual diffusion models operating in pixel space can approximate solutions to the inscribed square, Steiner tree, and simple polygon problems.
DAE4HLS enables explicit decoupling of access and execute in HLS to unlock memory-level parallelism, delivering 10-79x speedups for complex workloads on commercial and dynamic HLS tools.
Quokka# is a Python library that converts quantum circuit analysis tasks into #SAT problems, offering multiple encodings, approximate equivalence checking, and depth-optimal synthesis.
The ObfAx framework detects IP piracy in approximate circuits by comparing statistical error profiles of protected designs against suspicious ones, even when attackers apply approximate obfuscation.
A scalable verification framework for neural control barrier functions uses linear bound propagation on network gradients combined with McCormick relaxations to certify safety conditions for control-affine systems.
Crosstalk patterns between quantum circuits on IBM processors are predictable by circuit type and hardware architecture, with high intra-revision consistency and topological decoupling between lattice types.
MATCHA optimizes DNN deployment on heterogeneous multi-accelerator edge SoCs via constraint programming for memory and scheduling plus pattern matching for parallel execution, cutting latency up to 35% versus the MATCH compiler on MLPerf Tiny.
PIM-CACHE reduces mandatory coarse-grained transfers in UPMEM-style PIM by dynamically staging only non-redundant data via content-aware copy that exploits workload similarity.
A two-level decoder scheduling framework reduces classical processing requirements for quantum error correction by 10-40% on fault-tolerant benchmarks by managing bursty workloads as shared resources.
Emulation is positioned as a high-throughput pre-silicon method for exposing SoC security issues under realistic hardware/software workloads, with organized workflows, challenges, and future directions.
The paper reviews multiscale thermal modeling techniques for 3D ICs, unifying scales from device to system while stressing thermal boundary resistance and validation needs.
citing papers explorer
-
LLMForge: Multi-Backend Hardware-Aware Neural Architecture Search with Infinite-Head Attention for Edge Language Models
LLMForge is a NAS framework with Infinite-Head Attention, a Forge-Former surrogate, and Forge-DSE engine that discovers hardware-specific architectures for edge language models, yielding variants with improved accuracy, energy, or latency on different substrates.
-
Visual Diffusion Models are Geometric Solvers
Standard visual diffusion models operating in pixel space can approximate solutions to the inscribed square, Steiner tree, and simple polygon problems.
-
DAE4HLS: Exposing Memory-Level Parallelism for High-Level Synthesis using Explicit Decoupling
DAE4HLS enables explicit decoupling of access and execute in HLS to unlock memory-level parallelism, delivering 10-79x speedups for complex workloads on commercial and dynamic HLS tools.
-
Quokka#: Quantum Computing with #SAT
Quokka# is a Python library that converts quantum circuit analysis tasks into #SAT problems, offering multiple encodings, approximate equivalence checking, and depth-optimal synthesis.
-
ObfAx: Obfuscation and IP Piracy Detection in Approximate Circuits
The ObfAx framework detects IP piracy in approximate circuits by comparing statistical error profiles of protected designs against suspicious ones, even when attackers apply approximate obfuscation.
-
Scalable Verification of Neural Control Barrier Functions Using Linear Bound Propagation
A scalable verification framework for neural control barrier functions uses linear bound propagation on network gradients combined with McCormick relaxations to certify safety conditions for control-affine systems.
-
Toward Secure Multitenant Quantum Computing: Circuit Affinity, Crosstalk Patterns, and Grouping Strategies
Crosstalk patterns between quantum circuits on IBM processors are predictable by circuit type and hardware architecture, with high intra-revision consistency and topological decoupling between lattice types.
-
MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs
MATCHA optimizes DNN deployment on heterogeneous multi-accelerator edge SoCs via constraint programming for memory and scheduling plus pattern matching for parallel execution, cutting latency up to 35% versus the MATCH compiler on MLPerf Tiny.
-
PIM-CACHE: High-Efficiency Content-Aware Copy for Processing-In-Memory
PIM-CACHE reduces mandatory coarse-grained transfers in UPMEM-style PIM by dynamically staging only non-redundant data via content-aware copy that exploits workload similarity.
-
Managing Classical Processing Requirements for Quantum Error Correction
A two-level decoder scheduling framework reduces classical processing requirements for quantum error correction by 10-40% on fault-tolerant benchmarks by managing bursty workloads as shared resources.
-
Emulation-based System-on-Chip Security Verification: Challenges and Opportunities
Emulation is positioned as a high-throughput pre-silicon method for exposing SoC security issues under realistic hardware/software workloads, with organized workflows, challenges, and future directions.
-
A Review of Multiscale Thermal Modeling in Heterogeneous 3D ICs
The paper reviews multiscale thermal modeling techniques for 3D ICs, unifying scales from device to system while stressing thermal boundary resistance and validation needs.
- A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks