archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 6

cs.DC 2026-04-18 reviewed

Hierarchical sparsity speeds LLM attention 4.57 times
HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

Haoxuan Wang +1
cs.AR 2026-04-17 reviewed

Genetic search finds shift-add CNNs for 33% faster TinyML on FPGA
Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition

Jos\'e Juan Hern\'andez Morales +6
cs.NI 2026-04-17 reviewed

Real traces show congestion from HPC collectives
Characterization of Real Communication Patterns and Congestion Dynamics in HPC Interconnection Networks

Miguel S\'anchez de La Rosa +11
cs.AR 2026-04-17 reviewed

MemExplorer auto-designs memory for agentic NPUs
MemExplorer: Navigating the Heterogeneous Memory Design Space for Agentic Inference NPUs

Haoran Wu +17
cs.AR 2026-04-17 reviewed

MLIR unifies equivalence checking from algorithms to netlists
EquivFusion: Unifying Hardware Equivalence Checking from Algorithms to Netlists via MLIR

Jiaying Zhu +6
cs.AR 2026-04-17 reviewed

SRAM CIM accelerator hits 26.1 TOPS/W for attention
CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration

Bas Ahn +4
cs.CR 2026-04-17 reviewed

SRAM PUF with Hamming codes keeps IoT auth errors below 1%
Secure Authentication in Wireless IoT: Hamming Code Assisted SRAM PUF as Device Fingerprint

Florian Lehn +2
cs.AR 2026-04-17 reviewed

Specialized agents close hardware coverage with 4-13x fewer tokens
Understanding Inference-Time Token Allocation and Coverage Limits in Agentic Hardware Verification

Vihaan Patel +2
cs.AR 2026-04-17 reviewed

Annealing step stabilizes LLM-generated RTL designs
HYPERHEURIST: A Simulated Annealing-Based Control Framework for LLM-Driven Code Generation in Optimized Hardware Design

Shiva Ahir +2
cs.AR 2026-04-17 reviewed

Overmind hits 8.1 TOPS/W on neuro-symbolic workloads
Overmind NSA: A Unified Neuro-Symbolic Computing Architecture with Approximate Nonlinear Activations and Preemptive Memory Bypass

Weilun Wang +2
cs.AR 2026-04-17 reviewed

LLM agent closes hardware coverage gaps automatically
Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

Sean Lowe +5
cs.AR 2026-04-17 reviewed

LLM agent reaches 100% hardware coverage on simple designs
Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

Sean Lowe +5
cs.AR 2026-04-16 reviewed

Symmetric grids lift photonic AI use by 6X
Towards Topology-Aware Very Large-Scale Photonic AI Accelerators

Belal Jahannia +2
cs.AR 2026-04-16 reviewed

Rack storage tames millisecond GPU power swings
EasyRider: Mitigating Power Transients in Datacenter-Scale Training Workloads

Dillon Jensen +6
cs.AR 2026-04-16 reviewed

Microcontroller fixes timing for real-time photoacoustic imaging
Democratization of Real-time Multi-Spectral Photoacoustic Imaging: Open-Sourced System Architecture for OPOTEK Phocus & Verasonics Vantage Combination

Ryo Murakami +2
cs.AR 2026-04-16 reviewed

SCENIC hits 200G SmartNIC speed with programmable stream units
SCENIC: Stream Computation-Enhanced SmartNIC

Benjamin Ramhorst +6
cs.AR 2026-04-16 reviewed

LLM agents evolve the ABC synthesis tool to higher QoR
Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC

Cunxi Yu +1
cs.AI 2026-04-16 reviewed

Agentic AI improves RTL timing by 21 percent on real designs
Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement

Wenji Fang +7
cs.AR 2026-04-16 reviewed

CRONet runs fully on-chip on AIE-ML for 2.49x latency gain
Accelerating CRONet on AMD Versal AIE-ML Engines

Kaustubh Mhatre +6
physics.optics 2026-04-16 reviewed

Unary encoding boosts parallelism in photonic tensor cores
Scaling Photonic Tensor Cores with Unary and Homodyne Designs

Oluwaseun Alo +1
cs.AR 2026-04-16 reviewed

Multi-agent testbenches match SOTA Verilog generation with less data
Exploring LLM-based Verilog Code Generation with Data-Efficient Fine-Tuning and Testbench Automation

Mu-Chi Chen +8
cs.LG 2026-04-16 reviewed

MoE serving gains 6.6x speedup via elastic self-speculation on 3D stacks
ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

Yuseon Choi +7
cs.PF 2026-04-16 reviewed

L4 GPU delivers up to 4.4x inference throughput over T4
DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance

Kathiravan Palaniappan
cs.AR 2026-04-16 reviewed

Knowledge graph guides LLMs to build correct RISC-V hardware
VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs

Sazzadul Islam +2
cs.AR 2026-04-15 reviewed

Chiplet tasks cut LLM decode latency on multi-die GPUs
Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs

Sangeeta Chowdhary +9
cs.AR 2026-04-15 reviewed

Embeddings detect line-level CWEs in Verilog at 89% precision
VeriCWEty: Embedding enabled Line-Level CWE Detection in Verilog

Prithwish Basu Roy +6
cs.AR 2026-04-15 reviewed

ASIC emulates oscillators to solve max-cut and coloring at 97-100% accuracy
An ASIC Emulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems

Yilmaz Ege Gonul +1
cs.AR 2026-04-15 reviewed

Memory stack runs full matrix math inside the chip
GEM3D CIM General Purpose Matrix Computation Using 3D Integrated SRAM eDRAM Hybrid Compute In Memory on Memory Architecture

Subhradip Chakraborty +2
cs.AR 2026-04-15 reviewed

LSTM accelerator spots gait issues 4x faster on tiny ASIC
Cross-Layer Co-Optimized LSTM Accelerator for Real-Time Gait Analysis

Mohammad Hasan Ahmadilivani +4
cs.AR 2026-04-15 reviewed

Pipeline lifts bit-level accelerator code to tensor ISA specs
ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics

Ruijie Gao +3
cs.LG 2026-04-14 reviewed

Full biosignal model tuning runs under 50mW on edge chips
BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

Run Wang +7
cs.AR 2026-04-14 reviewed

Hardware unit reorganizes data on the fly for ideal CPU cache locality
Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality

Denis Hoornaert +5
cs.LG 2026-04-14 reviewed

TCL tunes tensor programs 16x faster across CPU and GPU
TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

Chaoyao Shen +7
cs.AR 2026-04-14 reviewed

EPAC RISC-V chip with three tiles taped out in 22nm
EPAC: The Last Dance

Filippo Mantovani +38
cs.AR 2026-04-14 reviewed

CODO compiler speeds FPGA dataflow designs up to 33x on DNNs
CODO: An Automated Compiler for Comprehensive Dataflow Optimization

Weichuang Zhang +8
cs.AR 2026-04-14 reviewed

Passive optical elements classify images by embedded phase patterns
Photonic AI: A Hybrid Diffractive Holographic Neural System for Passive Optical Real-Time Image Classification

Prakul Sunil Hiremath
cs.AR 2026-04-14 reviewed

Hadamard patterns cut RRAM read noise impact in neural nets
HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM Programming

Ilhuan Choi +5
cs.AR 2026-04-14 reviewed

Compiler cuts NPU transformer energy use by up to 41%
Forge-UGC: FX optimization and register-graph engine for universal graph compiler

Satyam Kumar +1
cs.AI 2026-04-13 reviewed

Reference-based replication creates AI agents in constant time
Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents

Swanand Rao +3
cs.LG 2026-04-13 reviewed

Imitation learning yields thermal-safe LFM schedules on 3D many-cores
Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

Yixian Shen +5
cs.AR 2026-04-13 reviewed

Decoupled matrix units deliver up to 2.31x AI speedups on CPUs
CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

Jinpeng Ye +13
cs.CV 2026-04-13 reviewed

Neural model sequences shape operations for better mask correction
MorphOPC: Advancing Mask Optimization with Multi-scale Hierarchical Morphological Learning

Yuting Hu +6
cs.AR 2026-04-13 reviewed

CIM design runs 1B-4B models at 336 tokens/s with 49x energy gain
EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

Jinane Bazzi +4
cs.AR 2026-04-13 reviewed

New dataset trains ML models on 61k chip layout windows for capacitance
CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction

Hector R. Rodriguez +2
cs.AR 2026-04-13 reviewed

High-bandwidth storage enables interactive 13B model inference on mobiles
Technology solutions targeting the performance of gen-AI inference in resource constrained platforms

Joyjit Kundu +3
cs.AR 2026-04-13 reviewed

Specialized LLM matches syntax but raises SVA semantic accuracy by 23 points
Automated SVA Generation with LLMs

Lik Tung Fu +6
quant-ph 2026-04-13 reviewed

Pulse sequence moves Rydberg excitation for remote CZ gates
Compiler Framework for Directional Transport in Zoned Neutral Atom Systems with AOD Assistance: A Hybrid Remote CZ Approach

Lingyi Kong +6
cs.AR 2026-04-13 reviewed

Heterogeneous PIM chiplet speeds graph DP 42x over GPU
GEN-Graph: Heterogeneous PIM Accelerator for General Computational Patterns in Graph-based Dynamic Programming

Yanru Chen +5
cs.AR 2026-04-12 reviewed

Optimal AI accelerator shifts with batch size and model scale
The xPU-athalon: Quantifying the Competition of AI Acceleration

Alicia Golden +3
physics.optics 2026-04-12 reviewed

Photonics scales AI past transistor density limits
Harnessing Photonics for Machine Intelligence

Hanqing Zhu +6