archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 2

cs.AR 2026-05-12 reviewed

Analog recurrence works at sub-microwatt power via bistable units
Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations

Arthur Fyon +5
quant-ph 2026-05-12 reviewed

Calibration feedback control cuts optimization gaps in local and tight-loop regimes
Runtime Calibration as State-Trajectory Feedback Control in Quantum-Classical Workflows

Xiaolong Deng
cs.LG 2026-05-12 reviewed

Cumulative updates fix gradient flow in low-power RNNs
Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

Julien Brandoit +3
cs.AR 2026-05-11 reviewed

Dynamic scheduler lifts MoE inference 1.3-1.6x on PIM hardware
Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models

Jungwoo Kim +7
cs.AR 2026-05-11 reviewed

Triton gains direct warp-group control for modern GPU hardware
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

Yue Guan +12
cs.AR 2026-05-11 reviewed

TLX adds MIMW warp-group control to Triton for modern GPUs
TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

Yue Guan +12
cs.CR 2026-05-11 reviewed

LLMs generate hardware code but introduce security risks
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel +2
cs.CR 2026-05-11 reviewed

LLMs automate chip design but create security risks
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel +2
cs.CR 2026-05-11 reviewed

LLMs Generate RTL Code but Create New Hardware Risks
LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

Johann Knechtel +2
cs.AR 2026-05-11 reviewed

Hybrid chip runs GNN at 2.94M events/sec for physics triggers
Reconfigurable Computing Challenge: Real-Time Graph Neural Networks for Online Event Selection in Big Science

Marc Neu +5
cs.AR 2026-05-11 reviewed

Error profiles detect stolen approximate circuit IP despite mimicry
ObfAx: Obfuscation and IP Piracy Detection in Approximate Circuits

Lukas Sekanina +1
cs.AR 2026-05-11 reviewed

Piezoelectric sensors turn desk vibrations into six-gesture commands
Towards an End-To-End System for Real-Time Gesture Recognition from Surface Vibrations

Florian Hettstedt +5
cs.AI 2026-05-11 reviewed

Hardware assertion sets reduced by 76 percent
Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring

Hongqin Lyu +4
cs.AR 2026-05-11 reviewed

LLM agents size RF amplifiers via resource allocation
RFAmpDesigner: A Self-Evolving Multi-Agent LLM Framework for Automated Radio Frequency Amplifier Design

Hang Lu +11
cs.AR 2026-05-10 reviewed

KV-cache movement regularization cuts static-graph LLM latency spikes
KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

Zhiqing Zhong +5
cs.AR 2026-05-10 reviewed

Wafer integration of three 2D devices decides next computing decade
Emerging 2D Materials for Beyond von Neumann Computing: A Perspective

Yaser Banad
cs.CL 2026-05-10 reviewed

LLM accuracy depends only on evicted tokens
Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

Aojie Yuan +2
cs.AR 2026-05-10 reviewed

ReRAM-on-logic chip reaches 14-136 tokens per second on LLMs
31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding

Pingcheng Dong +15
quant-ph 2026-05-10 reviewed

Memoized heuristics scale ion-trap qubit mapping
Scaling Qubit Mapping and Routing With Position Graph Abstraction and Memoization

Brent Russon +3
cs.AR 2026-05-09 reviewed

Complex GAN metric separates gate-failure effects in circuits
Fault tolerance estimation in digital circuits with visualised generative networks

Sascha Biel +4
cs.LG 2026-05-09 reviewed

MPS decoding latency spikes up to 21x in narrow ranges
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

Willy Fitra Hendria
cs.LG 2026-05-09 reviewed

Apple MPS shows 21x latency spikes in narrow decoding ranges
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

Willy Fitra Hendria
cs.AR 2026-05-09 reviewed

New cache bypass method meets deadlines while boosting heterogeneous system speed
HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

Ayushi Agarwal +2
cs.AR 2026-05-09 reviewed

HyDRA balances accelerator deadlines with cache reuse via clustering
HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

Ayushi Agarwal +2
eess.SP 2026-05-09 reviewed

Low-complexity denoiser matches heavy mmWave MIMO methods
Low-Complexity Beamspace Channel Denoiser for mmWave Massive MIMO with Low-Resolution ADCs

Hanyoung Park +2
cs.AR 2026-05-09 reviewed

Reconfigurable multiplier cuts power 44-68% in RISC-V core
A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core

Pragun Jaswal +2
cs.AR 2026-05-09 reviewed

DDR5 single sub-channel matches cache lines but loses 40-60% bandwidth
Single 32-bit Sub-Channel DDR5 DIMMs: Architecture, Performance Bounds, and Standardisation

Chih-Hua Ke
cs.AR 2026-05-09 reviewed

Edge processor hits 109 TFLOPS/W on DeepSeek
DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing

Yuhan Zhang (1) +36
cs.AR 2026-05-09 reviewed

Coprime test vectors localize faulty rows in systolic arrays after one pass
FLARE: One-Shot PE-Level Fault Localization in Systolic Arrays via Algebraic Test Vectors

Logashree Venkatasubramanian (1) +2
cs.AR 2026-05-08 reviewed

Static checker decides barrier sufficiency for accelerator races
AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs

Hangcheng An +2
cs.AR 2026-05-08 reviewed

Model runs 1024-core chip sims 115x faster at under 7% error
Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

Yinrong Li +7
cs.ET 2026-05-08 reviewed

Plasma simulations need three post-Moore tech tiers
Post-Moore Technologies for Plasma Simulation: A Community Roadmap

Luca Pennati +23
cs.LG 2026-05-08 reviewed

GNNs for EDA succeed when matched to each task's native algebra
Graph Computation Meets Circuit Algebra: A Task-Aligned Analysis of Graph Neural Networks for Electronic Design Automation

Hyunmog Kim
cs.AR 2026-05-08 reviewed

Bit-hardening methods surpass ECC for reliable DNNs with no memory cost
Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs

Mohammad Hasan Ahmadilivani +5
cs.AR 2026-05-08 reviewed

TREA accelerator reduces edge detection latency up to 9x
TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification

Vijay Pratap Sharma +4
cs.AR 2026-05-08 reviewed

Reconfigurable FPU gives up to 8x throughput for low-precision dot products
TransDot: An Area-efficient Reconfigurable Floating-Point Unit for Trans-Precision Dot-Product Accumulation for FPGA AI Engines

Jiayi Wang +4
cs.AR 2026-05-07 reviewed

Open schema and datasets released for ML benchmarks in chip design
EDA-Schema-V2: A Multimodal Schema, Open Datasets, and Benchmarks for Machine Learning in Digital Physical Design

Pratik Shrestha +2
cs.AR 2026-05-07 reviewed

Agents reach just 20% success on multi-PPA in new benchmark
Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

Pengju Liu +4
cs.AR 2026-05-07 reviewed

Agents solve only 37% of practical chip design rule problems
Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

Pengju Liu +4
cs.AR 2026-05-07 reviewed

CORDIC iteration depth trims 33 percent of inference cycles
CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning

Sonu Kumar +3
cs.AR 2026-05-07 reviewed

Posit engine cuts ADAS power by 72 percent with near full accuracy
EULER-ADAS: Energy-Efficient & SIMD-Unified Logarithmic-Posit Engine for Precision-Reconfigurable Approximate ADAS Acceleration

Mukul Lokhande +4
physics.chem-ph 2026-05-07 reviewed

FPGA YOLOv3-Tiny system detects in 0.211 seconds
Development of embedded target detection system based on FPGA and YOLOv3-Tiny

Zihan Jiang +7
cs.CV 2026-05-07 reviewed

Self-supervised pretraining yields tiny wildfire spotters for satellites
On-Orbit Real-Time Wildfire Detection Under On-Board Constraints

Matthias R\"otzer +8
cs.AR 2026-05-07 reviewed

Pipeline speeds power-of-two DNNs on edge FPGAs by up to 3.6x
PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs

Rappy Saha +4
cs.AR 2026-05-07 reviewed

FPGA MAC unifies mixed-precision ops for 1.2x LLM speedup
XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA

Feng Yu +4
cs.AR 2026-05-07 reviewed

Photonic solver beats digital annealers on dense spin-glasses
A virtually connected probabilistic computer as a solver for higher-order, densely connected, or reconfigurable combinatorial optimisation problems

Amy J. Searle +5
cs.AR 2026-05-07 reviewed

LLMs automate FPGA accelerator design space exploration
LLM-Driven Design Space Exploration of FPGA-based Accelerators

Vinamra Sharma +3
cs.AR 2026-05-07 reviewed

Hardware hub lets MoE send data before knowing GPU addresses
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems

Zhuoshan Zhou +12
cs.AR 2026-05-07 reviewed

Heterogeneous HBM-PIM stack lifts LLM throughput 1.62x
TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference

Zhuoran Li +5
cs.AR 2026-05-07 reviewed

New in-switch method delivers 1.38x faster LLM tensor parallel training
Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems

Chen Zhang +12