archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 10

cs.AR 2025-07-01 reviewed

Specialized LLMs raise HLS debugging success by 32 percent
ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis

Runkai Li +7
cs.AR 2025-06-19 reviewed

Sparse NN linearizes RF amps on FPGA at 241 mW with -59 dBc ACPR
SparseDPD: A Sparse Neural Network-based Digital Predistortion FPGA Accelerator for RF Power Amplifier Linearization

Manno Versluis +2
cond-mat.stat-mech 2025-06-19 reviewed

Microcanonical annealing cuts random-number use in parallel spin-glass sims
Microcanonical simulated annealing: Massively parallel Monte Carlo simulations with sporadic random-number generation

M. Bernaschi +9
cs.AR 2025-06-18 reviewed

RISC-V calibration lifts CIM compute SNR by 25-45 percent
Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration

Omar Numan +8
cs.AR 2025-06-13 reviewed

System predicts lane changes 3-4 seconds ahead in real-world tests
Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

M. Manzour +4
cs.AR 2025-06-03 reviewed

MLA cuts bandwidth use in attention and stabilizes hardware performance
Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention

Robin Geens +1
q-bio.GN 2025-05-31 reviewed

PIM co-design cuts energy and time for genomics workloads
Processing-in-memory for genomics workloads

William Andrew Simon +14
cs.DC 2025-05-29 reviewed

GreenCache trims LLM carbon 15% by trading storage against compute
Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving

Yuyang Tian +3
cs.AR 2025-05-22 reviewed

60k code pairs train models for 88% accurate CUDA to HIP translation
CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

Ahmed Heakl +7
cs.AR 2025-05-19 reviewed

Seamless switching boosts CPU LLM serving speed by 2x
Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving

Juntao Zhao +2
quant-ph 2025-04-29 reviewed

Co-optimized Iceberg gadgets raise QAOA success from 44% to 65%
Iceberg Beyond the Tip: Co-Compilation of a Quantum Error Detection Code and a Quantum Algorithm

Yuwei Jin +7
cs.AR 2025-04-28 reviewed

LLM automates UVM testbench creation for RTL designs
From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

Junhao Ye +10
cs.AR 2025-04-24 reviewed

Fusion-aware design speeds SSM accelerators 1.78x at fixed area
Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration

Robin Geens +2
cs.AR 2025-04-14 reviewed

Simulator explores LLM configs without 40K cloud costs
MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference

Abhimanyu Rajeshkumar Bambhaniya +10
cs.ET 2025-04-08 reviewed

Memristor arrays solve XOR-CNF SAT problems 10 times faster
Accelerating Hybrid XOR$-$CNF Boolean Satisfiability Problems Natively with In-Memory Computing

Haesol Im +16
cs.AR 2025-03-27 reviewed

71.2 μW accelerator runs real-time speech recognition
A 71.2-$\mu$W Speech Recognition Accelerator with Recurrent Spiking Neural Network

Chih-Chyau Yang +1
cs.AR 2025-03-26 reviewed

Edge criteria halve MACs for 8K super-resolution at 30 FPS
ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

Chih-Chia Hsu +1
cs.AR 2025-03-17 reviewed

Benchmark shows 51 percent area cut for 3D chip designs
Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation

Yunqi Shi +8
math.OC 2025-03-12 reviewed

Hardware co-design checks all feasible QAP moves in one step
Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment Problem

Haesol Im +12
cs.OS 2025-03-05 reviewed

90% of Linux radiation failures route through one eMMC path
Where Linux Breaks Under Radiation: A Cross-Architecture Kernel-Level Characterization of Proton-Induced Failures in COTS SoCs

Saad Memon +7
cs.CV 2025-03-05 reviewed

Quantization method raises 4-bit SAM mAP 15.2% on COCO
AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization

Wenlun Zhang +5
cs.AR 2025-02-23 reviewed

Taxonomy maps 25 years of FPGA neuromorphic architectures
A Quarter of a Century of Neuromorphic Architectures on FPGAs -- an Overview

Wiktor J. Szczerek +1
cs.AR 2025-01-30 reviewed

Posits shrink wearable hardware 38% and cut power 42%
Increasing the Energy-Efficiency of Wearables Using Low-Precision Posit Arithmetic with PHEE

David Mallas\'en +4
cs.DC 2025-01-27 reviewed

Framework enables any-cycle preemption for FPGA tasks in clouds
EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems

Arsalan Ali Malik +2
cs.AR 2025-01-23 reviewed

Taylor softmax cuts FPGA resources 14% at 0.2% accuracy cost
A Quantitative Evaluation of Approximate Softmax Functions for Deep Neural Networks

Anthony Leiva-Valverde +4
cs.AR 2025-01-15 reviewed

Octopus sparse links save 3-5.4% server costs in CXL pods
Octopus: Enhancing CXL Memory Pods via Sparse Topology

Yuhong Zhong +6
cs.CR 2025-01-13 reviewed

Compiler aligns HE workloads with TPU matrix engines
Leveraging ASIC AI Chips for Homomorphic Encryption

Jianming Tong +11
cs.LG 2025-01-07 reviewed

Hybrid federated method boosts hotspot detection accuracy
Federated Knowledge Distillation for Multi-Model Architectures Lithography Hotspot Detection

Yuqi Li +8
cs.LG 2024-11-10 reviewed

Filter turns AI-generated PCIe traces into usable simulation data
The Phantom of PCIe: Constraining Generative Artificial Intelligences for Practical Peripherals Trace Synthesizing

Zhibai Huang +11
cs.LG 2024-10-19 reviewed

Async pipeline training on analog hardware matches digital SGD rate
On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training

Zhaoxian Wu +5
cs.CV 2024-09-25 reviewed

SSD MobileNet V1 minimizes latency and energy but not accuracy on edge devices
A Comprehensive Evaluation of Deep Learning Object Detection Models on Heterogeneous Edge Devices

Daghash K. Alqahtani +3
cs.AR 2024-06-28 reviewed

FPGA idle-waiting extends DL accelerator life 12x vs powering off
Idle is the New Sleep: Configuration-Aware Alternative to Powering Off FPGA-Based DL Accelerators During Inactivity

Chao Qian +3
quant-ph 2024-06-26 reviewed

Two-level scheduler cuts quantum decoder hardware by 10-40%
Managing Classical Processing Requirements for Quantum Error Correction

Satvik Maurya +3
cs.ET 2024-06-20 reviewed

Weight shuffling restores 83.5% accuracy in resistive crossbar DNNs
WAGONN: Weight Bit Agglomeration in Crossbar Arrays for Reduced Impact of Interconnect Resistance on DNN Inference Accuracy

Jeffry Victor +4
cs.AR 2024-05-21 reviewed

Accelerator switches dataflows per layer at 6% extra area
FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

Jianming Tong +3
cs.AR 2024-05-06 reviewed

SparrowSNN cuts ECG energy by 20-100x at full accuracy
SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification

Zhanglu Yan +3
cs.AR 2023-01-07 reviewed

Cache-coherent eFPGAs cut processor-accelerator latency by 82%
Duet: Creating Harmony between Processors and Embedded FPGAs

Ang Li +2
quant-ph 2022-11-14 reviewed

Hundreds of thousands of qubits needed for practical quantum advantage
Assessing requirements to scale to practical quantum advantage

Michael E. Beverland +9
cs.AR 2019-07-24 reviewed

QDI adder comparison in 32nm CMOS identifies low-power options
Performance Comparison of Quasi-Delay-Insensitive Asynchronous Adders

P Balasubramanian
cs.AR 2019-07-22 reviewed

Memristor-CMOS multiplier reconfigures for multiple bit widths
Reconfigurable multiplier architecture based on memristor-cmos with higher flexibility

Seungbum Baek
cs.AR 2019-07-19 reviewed

PPAC runs neural nets and crypto inside memory arrays
PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

Oscar Casta\~neda +3
cs.AR 2019-07-17 reviewed

RL scheduler adapts multicore memory access for 20% CPI gain
CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers

Eduardo Olmedo Sanchez +1
cs.AR 2019-07-16 reviewed

History yields conditions for coprocessor long-term success
Coprocessors: failures and successes

Daniel Etiemble
cs.AR 2019-07-10 reviewed

RM-CAM plus TMR repairs NRAM defects with fewer resources at high error rates
A Range Matching CAM for Hierarchical Defect Tolerance Technique in NRAM Structures

Hossein Pourmeidani +1
cs.AR 2019-07-04 reviewed

RTL FPGA accelerator matches Caffe-CPU for CNN inference
FusionAccel: A General Re-configurable Deep Learning Inference Accelerator on FPGA for Convolutional Neural Networks

Shi Shi
cs.AR 2019-07-04 reviewed

TicToc speeds hybrid memory 10% using 34KB SRAM
TicToc: Enabling Bandwidth-Efficient DRAM Caching for both Hits and Misses in Hybrid Memory Systems

Vinson Young +2
cs.AR 2019-07-04 reviewed

One line per region tracks reuse to speed DRAM caches 18%
To Update or Not To Update?: Bandwidth-Efficient Intelligent Replacement Policies for DRAM Caches

Vinson Young +1
cs.OS 2019-06-29 reviewed

Hardware scheduler delivers 12x speedup on accelerator systems
HTS: A Hardware Task Scheduler for Heterogeneous Systems

Kartik Hegde +2
eess.SP 2019-06-28 reviewed

FPGA speeds Tucker decomposition up to 30x on heart MRI
Tucker Tensor Decomposition on FPGA

Kaiqi Zhang +2
cs.AR 2019-06-27 reviewed

Bit-partitioned dot products share A/D converters via charge accumulation
Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic

Soroush Ghodrati +7