archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 1

cs.AR 2026-05-22 reviewed

DORA keeps DNN accelerator efficiency steady across 6× workload variation
DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration

Xingzhen Chen +7
cs.NE 2026-05-22 reviewed

UniSpike bundles spikes to cut neuromorphic traffic 1.93 times
UniSpike: Accelerating Spiking Neural Networks on Neuromorphic Systems via Eliminating Address Redundancy

Qinghui Xing +8
cs.AR 2026-05-22 reviewed

Overlays beat custom designs for frequent model switches in self-driving
To Overlay or to Customize? Revisiting Architectural Choices in Heterogeneous Systems

Xingzhen Chen +3
cs.AR 2026-05-22 reviewed

Explicit decoupling gives HLS 10-79x speedups on complex memory patterns
DAE4HLS: Exposing Memory-Level Parallelism for High-Level Synthesis using Explicit Decoupling

David Metz +1
cs.AR 2026-05-22 reviewed

3D NAND fuses MoE selection and compute for 114x faster inference
NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference

Weikai Xu +9
cs.AR 2026-05-22 reviewed

Stage-wise precision cuts masked diffusion compute by up to 16x
MASQ: Accelerating Masked Diffusion via Stage-Wise Multi-Precision Quantization

Seeyeon Kim +3
cs.AR 2026-05-21 reviewed

ACALSim reaches 14x speedup over SST on large GPU simulations
ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration

Wei-Fen Lin +7
cs.CV 2026-05-21 reviewed

Prior outputs double token cuts in video diffusion for 4.5x speedup
ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

Hangyeol Lee +1
cs.AR 2026-05-21 reviewed

Co-design speeds vector search up to 8.4 times over CPU
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

Cheng Zou +8
cs.AR 2026-05-21 reviewed

Memory technologies reviewed for room and cryogenic use
Emerging memory technologies at room/cryogenic temperature

Siddhartha Raman Sundara Raman
cs.AR 2026-05-21 reviewed

Component-level GPU control yields 10% energy savings
CompPow: A Case for Component-level GPU Power Management

Shaizeen Aga +1
cs.AR 2026-05-20 reviewed

Dynamic control-flow speeds up reconfigurable processors
Supporting Dynamic Control-Flow Execution for Runtime Reconfigurable Processors

Hassan Nassar +3
cs.DC 2026-05-20 reviewed

Roadside perception services turn on only when vehicles approach
Cloud-Native Operation of Roadside Infrastructure Enabling Demand-Driven Collective Perception via V2X

Lukas Zanger +5
quant-ph 2026-05-20 reviewed

Telesistors provide noise-protected Clifford gates for quantum computing
Towards transistor-based quantum computing

Y.-D. Liu +3
cs.AR 2026-05-20 reviewed

ELSA gives spiking networks 3.4x faster inference than top accelerators
ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

Kang You +8
cs.NE 2026-05-20 reviewed

ReRAM macro reaches 419 TOPS/W for edge neural inference
E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference

Ankit Kumar Tenwar +2
cs.CR 2026-05-19 reviewed

Multi-rank PIM beats CPUs on AES and SHA-256
Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

Nicola Barcarolo +5
cs.AR 2026-05-19 reviewed

Hardware latch enables 452 nA quiescent drain in sensors
A Hardware-Based Multi-Stage Dynamic Power Management Architecture for Autonomous Low-Light Operation

Charalampos S. Kouzinopoulos +8
cs.AR 2026-05-19 reviewed

Digital near-memory design accelerates GNNs up to 230x
A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks

Siddhartha Raman Sundara Raman +2
cs.AR 2026-05-19 reviewed

Only two of five LLMs finish valid SoC co-design
HSCO-Bench: An Agent-Driven End-to-End Hardware-Software Co-design Benchmark for Systems-on-Chip

Pei-Huan Tsai +4
cs.AR 2026-05-18 reviewed

Software scheduling predicts optical thermal drift early
Predictive Software Scheduling as an Early-Warning Hint Layer for Optical Engine Thermal Drift in Heterogeneous SoIC Packaging

Chi Fei Chung
cs.AR 2026-05-18 reviewed

Input flips extend multiplier life under NBTI aging
Building Reliable Arithmetic Multipliers Under NBTI Aging and Process Variations

Masoud Heidary +1
cs.DC 2026-05-18 reviewed

Hybrid cluster cuts HTTP response time by over 40%
iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience

Siddique Abubakr Muntaka +11
cs.NI 2026-05-18 reviewed

Hybrid radio matches dedicated performance with far less setup
Enabling Agile Ambient IoT Networking via a Parameterized Hybrid Radio

Jiazhen Lei +10
cs.AR 2026-05-18 reviewed

JSON IR and compiler checks lift LLM circuit correctness
CPPL: A Circuit Prompt Programming Language

Shuo Yin +8
cs.AR 2026-05-18 reviewed

ROA bricks stabilize SHIL signals for Ising machines under variations
ROA-Based Subharmonic Injection Locking for Oscillator-Based Ising Machines

Nicholas Sica +1
cs.AR 2026-05-17 reviewed

Direct AIE links enable 0.93 μs DNN inference on ACAP
{\mu}-ORCA: Optimizing Acceleration for Microsecond-Scale Deep Neural Network Inference on ACAP

Shixin Ji +5
cs.AR 2026-05-17 reviewed

Compressed KV cache yields full accuracy at 4x throughput
VeriCache: Turning Lossy KV Cache into Lossless LLM Inference

Jiayi Yao +9
cs.AR 2026-05-16 reviewed

Workload traces cut early PDN metal area by up to 33%
Workload-Aware Early-Stage Power Delivery Network Optimization via Architectural Power Traces

Oran Hayes +6
cs.AR 2026-05-16 reviewed

Near-cache accelerator speeds sparse ILP 15x with 152x less energy
A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture

Siddhartha Raman Sundara Raman +2
cs.AR 2026-05-15 reviewed

Traversal stack guides precise prefetching for faster ray tracing
TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing

Yavuz Selim Tozlu +2
cs.AR 2026-05-15 reviewed

6T SRAM sorting cuts latency by 3.4x versus memristor methods
ADS-IMC: Accelerating Data Sorting with In-Memory Computation

Narendra Singh Dhakad +1
cs.AR 2026-05-15 reviewed

SRAM engine halves routing for binary neural nets
SRAM Based Digital Custom Compute Engine for Improved Area Efficiency of AI Hardware

Narendra Singh Dhakad +1
cs.LO 2026-05-15 reviewed

Certificate-aware PDR solves six more instances with smaller proofs
Certificate-Aware Property-Directed Reachability

Arman Ferdowsi +1
cs.AR 2026-05-15 reviewed

Instruction correlation prefetcher beats prior art by 14% with 2 KB storage
ICP: Exploiting Instruction Correlation for Prefetching Irregular Memory Accesses

Mengming Li +9
cs.AR 2026-05-15 reviewed

Intra-thread duplication catches 39% more defective servers
ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions

Ioanna Vavelidou +5
quant-ph 2026-05-14 reviewed

Cache reorganization lifts GPU speedups for 28-qubit simulations on laptops
Accelerating State-Vector Quantum Simulation on Integrated GPUs via Cache Locality Optimization: A Cross-Architecture Evaluation

Gabriel Fernandes Thomaz +4
cs.AR 2026-05-14 reviewed

Agentic AI automates full accelerator design from scientific applications
A3D: Agentic AI flow for autonomous Accelerator Design

Abinand Nallathambi +4
cs.ET 2026-05-13 reviewed

Time-domain near-memory MAC reaches 7.62 TOPS/W
Time Domain Near Memory Computing Engine

Sarthak Antal +1
cs.CV 2026-05-13 reviewed

ViTs reach 84% accuracy by replacing layer norm with evolved scalars
Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation

Kieran Carrigg +3
cs.AR 2026-05-13 reviewed

End-to-end DVS-memristor system is the missing piece for low-power vision
Memristor Technologies for Dynamic Vision Sensors: A Critical Assessment and Research Roadmap

Mohamad Yazan Sadoun +3
cs.AR 2026-05-13 reviewed

AI agents drop 37-58% on hardware vs software tasks
Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

Qingyun Zou +4
cs.AR 2026-05-13 reviewed

FPGA accelerator skips sparse beams for 2x faster MIMO localization
Efficient Implementation of an Adaptive Transformer Accelerator for Massive MIMO Outdoor Localization

Ilayda Yaman +3
cs.AR 2026-05-13 reviewed

7B model surpasses 671B baselines on SVA generation
Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

Qingyun Zou +4
cs.AR 2026-05-13 reviewed

FPGA lock agents boost OLTP throughput 51X over CPUs
FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration

Shien Zhu +1
cs.AR 2026-05-13 reviewed

PoisonCap gives CHERI strict use-after-free at zero overhead
PoisonCap: Efficient Hierarchical Temporal Safety for CHERI

Yuecheng Wang +7
cs.AR 2026-05-13 reviewed

GenAI workflow maps RISC-V supply chains for risk analysis
GenAI-Driven Approach to RISC-V Supply Chain Exploration

Nenad Petrovic +3
cs.LG 2026-05-12 reviewed

Block-scale search cuts quantization error 27% in BFP
Search Your Block Floating Point Scales!

Tanmaey Gupta +12
cs.AR 2026-05-12 reviewed

Joint TLB-cache tweaks boost instruction prefetching 8.7%
Enhancing Instruction Prefetching via Cache and TLB Management

Alexandre Valentin Jamet +4
cs.AR 2026-05-12 reviewed

FPGA SoC matches silicon SNN accuracy for neuromorphic edge tasks
Heterogeneous SoC Integrating an Open-Source Recurrent SNN Accelerator for Neuromorphic Edge Computing on FPGA

Michelangelo Barocci +3