archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 7

cs.AR 2026-04-12 reviewed

Spatial heat patterns dominate power-grid lifetime over averages
EMSpice 3: Full-chip Temperature-Aware Multiphysics Electromigration and IR-Drop Analysis

Haotian Lu +1
cs.AR 2026-04-12 reviewed

Octree islands cut PCN feature fetching by 55-94 percent
L-PCN: A Point Cloud Accelerator Exploiting Spatial Locality through Octree-based Islandization

Yiming Gao +7
cs.AR 2026-04-12 reviewed

Two-stage mining extracts accurate message flows from SoC traces
AutoFlows++: Hierarchical Message Flow Mining for System on Chip Designs

Bardia Nadimi +1
cs.AR 2026-04-12 reviewed

BFP NPU hits near-DMR reliability at 3.55% overhead
From Characterization to Microarchitecture: Designing an Elegant and Reliable BFP-Based NPU

Jie Zhang +6
cs.AR 2026-04-12 reviewed

Re-partitioned NPU catches and fixes faults in under a microsecond
Strix: Re-thinking NPU Reliability from a System Perspective

Jiapeng Guan +10
cs.AR 2026-04-12 reviewed

LLM training resists low GPU fault rates but fails in key paths
LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training

Abhishek Tyagi +6
cs.AR 2026-04-11 reviewed

Chip renders 3D Gaussian Splatting at 129 FPS in full HD
A 129FPS Full HD Real-Time Accelerator for 3D Gaussian Splatting

Fang-Chi Chang +1
cs.PF 2026-04-11 reviewed

Wave-aware model picks near-optimal GPU kernel settings fast
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning

Kaixuan Zhang +8
cs.AR 2026-04-11 reviewed

Sparse measurements predict latency at every CPU-GPU frequency
Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge

Jiesong Chen +3
cs.DC 2026-04-11 reviewed

FlexVector speeds GCN inference 3.78x with flexible registers
FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs

Bohan Li +5
cs.AR 2026-04-11 reviewed

Open framework speeds SystemC-FPGA co-emulation up to 2500x
Late Breaking Results: CHESSY: Coupled Hybrid Emulation with SystemC-FPGA Synchronization

Lorenzo Ruotolo +9
cs.AR 2026-04-11 reviewed

Microcontroller runs full SNN simulation at 20 mW
Full Feature Spiking Neural Network Simulation on Micro-Controllers for Neuromorphic Applications at the Edge

L. Niedermeier +1
cs.AR 2026-04-11 reviewed

DNN-resilient voltage scaling cuts aging degradation up to 46%
Aging Aware Adaptive Voltage Scaling for Reliable and Efficient AI Accelerators

Tong Xie +5
cs.AR 2026-04-10 reviewed

Photonic accelerator speeds transformers 7.6x with lower energy
Sustainable Transformer Neural Network Acceleration with Stochastic Photonic Computing

S. Afifi +3
cs.AR 2026-04-10 reviewed

0.5V encoder maps voltages to spikes within 5.6 percent linearity
A 0.5-V Linear Neuromorphic Voltage-to-Spike Encoder Using a Bulk-Driven Transconductor

Meysam Akbari +2
cs.DC 2026-04-10 reviewed

MATCHA cuts DNN inference latency up to 35% on heterogeneous edge SoCs
MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

Enrico Russo +8
cs.AR 2026-04-10 reviewed

Diffusion models cut energy 36% by tolerating controlled faults
DRIFT: Harnessing Inherent Fault Tolerance for Efficient and Reliable Diffusion Model Inference

Jinqi Wen +3
cs.AR 2026-04-10 reviewed

Key signals cut RTL assertion needs by two thirds
From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation

Yonghao Wang +10
cs.AR 2026-04-09 reviewed

Neuromorphic chips hit new memory wall from on-chip storage
Memory Wall is not gone: A Critical Outlook on Memory Architecture in Digital Neuromorphic Computing

Amirreza Yousefzadeh +2
cs.PL 2026-04-09 reviewed

Profile labels cut memory dependence checks 79% on small cores
PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores

Luke Panayi +7
cs.DC 2026-04-09 reviewed

Energy-efficient GPUs deliver better value under budget limits
Wattlytics: A Web Platform for Co-Optimizing Performance, Energy, and TCO in HPC Clusters

Ayesha Afzal +2
cs.AR 2026-04-09 reviewed

ATLAS models 3D-DRAM LLM accelerators to 8.57% of silicon accuracy
A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

Cong Li +13
cs.AR 2026-04-09 reviewed

Mamba-3 raises edge latency up to 48% to favor cloud GPUs
The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

Robin Geens +3
cs.PL 2026-04-09 reviewed

Faster 32-bit constant division on 64-bit CPUs
Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

Shigeo Mitsunari +1
cs.DC 2026-04-09 reviewed

Integrated panels give orbital AI 100 kW per ton
Reduced-Mass Orbital AI Inference via Integrated Solar, Compute, and Radiator Panels

Stephen Gaalema +2
cs.AR 2026-04-08 reviewed

TrilinearCIM runs Transformer attention in NVM without reprogramming
Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer Acceleration

Md Zesun Ahmed Mia +3
cs.AR 2026-04-08 reviewed

RL agent designs ASIC chips for AI that adapt across 7 process nodes
From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference

Ravindra Ganti +1
cs.AR 2026-04-08 reviewed

FILCO reconfigures DNN accelerators on the fly for 1.3x-5x gains
FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration

Xingzhen Chen +7
cs.AR 2026-04-08 reviewed

Symbolic analysis estimates energy for loop nests independent of size
Symbolic Polyhedral-Based Energy Analysis for Nested Loop Programs

Avinash Mahesh Nirmala +3
cs.CV 2026-04-08 reviewed

Onboard EO processing delivers sub-3m burnt-area maps
Assessing the Added Value of Onboard Earth Observation Processing with the IRIDE HEO Service Segment

Parampuneet Kaur Thind +5
cs.AR 2026-04-08 reviewed

GQA models cut peak memory 2.72x versus MHA on embedded hardware
TRAPTI: Time-Resolved Analysis for SRAM Banking and Power Gating Optimization in Embedded Transformer Inference

Jan Klhufek +4
cs.AR 2026-04-08 reviewed

New chip runs annealing and reservoir tasks at 25-54x efficiency
CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir Computing

Kanta Yoshioka +5
cs.AR 2026-04-08 reviewed

SHIELD cuts eDRAM refresh energy 35% for edge LLM inference
SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs

Jintao Zhang +1
cs.AR 2026-04-08 reviewed

SwarmIO emulates 40M IOPS SSDs for GPUs with 300x speedup
SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems

Hyeseong Kim +2
cs.AR 2026-04-08 reviewed

One DC simulation calibrates LLM equations for analog sizing
A Self-Calibrating Framework for Analog Circuit Sizing Using LLM-Derived Analytical Equations

Antonio J. Bujana +1
cs.AR 2026-04-08 reviewed

Coverage feedback raises assertion coverage 9-15 percent
CoverAssert: Iterative LLM Assertion Generation Driven by Functional Coverage via Syntax-Semantic Representations

Yonghao Wang +10
eess.SP 2026-04-07 reviewed

Dominant interferer nulling cuts CG iterations in massive MU-MIMO
Interference Suppression for Massive MU-MIMO Long-Term Beamforming with Matrix Inversion Approximation

Amirreza Kiani +3
cs.DC 2026-04-07 reviewed

Power reconstruction shows 79% energy cut from mixed precision on Frontier
Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes

Adam McDaniel +10
cs.AR 2026-04-07 reviewed

PHAROS finds more deadline-meeting accelerator designs
PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance

Shixin Ji +8
cs.AR 2026-04-06 reviewed

Prime power moduli simplify RNS integer division hardware
Direct Integer Division in RNS and its Hardware Solutions

Eric B. Olsen
cs.AR 2026-04-06 reviewed

KV cache choice depends on memory limits and request load
Comparative Characterization of KV Cache Management Strategies for LLM Inference

Oteo Mamo +3
cs.CR 2026-04-06 reviewed

GPU boosts encrypted LLM nonlinear layers by up to 17 times
GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference

Guoci Chen +7
cs.AR 2026-04-06 reviewed

DRAM PIM techniques create bursty power demands that stress delivery networks
A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM

Siddhartha Raman Sundara Raman +2
cs.AR 2026-04-06 reviewed

Tool explores 250 trillion 3D AI accelerator designs 100000 times faster
DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

Zhiwen Mo +13
cs.AR 2026-04-06 reviewed

Neuromorphic hardware could break CMOS energy limits for AI
Neuromorphic Computing for Low-Power Artificial Intelligence

Keshava Katti +2
cs.CR 2026-04-06 reviewed

GPIR lifts GPU PIR speed by up to 297 times
GPIR: Enabling Practical Private Information Retrieval with GPUs

Hyesung Ji +5
cs.AR 2026-04-06 reviewed

CGRA sharing with migration cuts workload time by 70%
Mestra: Exploring Migration on Virtualized CGRAs

Agamemnon Kyriazis +4
cs.AR 2026-04-06 reviewed

Packed LUTs deliver 1.82x speedup for DNN inference on DRAM-PIM
LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

Junguk Hong +8
cs.AR 2026-04-06 reviewed

Bit partitioning lets one PE run FP8 or dual FP4 with 60% less area
DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration

Shubham Kumar +3
cs.CR 2026-04-05 reviewed

Hardware cuts real-time interrupt latency by 50x
Enabling Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension

Hongbin Yang +2