pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 1

  1. cs.AR 2026-05-22 reviewed
    DORA keeps DNN accelerator efficiency steady across 6× workload variation

    DORA: Dataflow-Instruction Orchestration Architecture for DNN Acceleration

    Xingzhen Chen +7

  2. cs.NE 2026-05-22 reviewed
    UniSpike bundles spikes to cut neuromorphic traffic 1.93 times

    UniSpike: Accelerating Spiking Neural Networks on Neuromorphic Systems via Eliminating Address Redundancy

    Qinghui Xing +8

  3. cs.AR 2026-05-22 reviewed
    Overlays beat custom designs for frequent model switches in self-driving

    To Overlay or to Customize? Revisiting Architectural Choices in Heterogeneous Systems

    Xingzhen Chen +3

  4. cs.AR 2026-05-22 reviewed
    Explicit decoupling gives HLS 10-79x speedups on complex memory patterns

    DAE4HLS: Exposing Memory-Level Parallelism for High-Level Synthesis using Explicit Decoupling

    David Metz +1

  5. cs.AR 2026-05-22 reviewed
    3D NAND fuses MoE selection and compute for 114x faster inference

    NASiC: 3D NAND-based CAM-Selected Multibit CIM Architecture for Efficient On-Device Mixture-of-Experts LLM Inference

    Weikai Xu +9

  6. cs.AR 2026-05-22 reviewed
    Stage-wise precision cuts masked diffusion compute by up to 16x

    MASQ: Accelerating Masked Diffusion via Stage-Wise Multi-Precision Quantization

    Seeyeon Kim +3

  7. cs.AR 2026-05-21 reviewed
    ACALSim reaches 14x speedup over SST on large GPU simulations

    ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration

    Wei-Fen Lin +7

  8. cs.CV 2026-05-21 reviewed
    Prior outputs double token cuts in video diffusion for 4.5x speedup

    ORBIS: Output-Guided Token Reduction with Distribution-Aware Matching for Video Diffusion Acceleration

    Hangyeol Lee +1

  9. cs.AR 2026-05-21 reviewed
    Co-design speeds vector search up to 8.4 times over CPU

    NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

    Cheng Zou +8

  10. cs.AR 2026-05-21 reviewed
    Memory technologies reviewed for room and cryogenic use

    Emerging memory technologies at room/cryogenic temperature

    Siddhartha Raman Sundara Raman

  11. cs.AR 2026-05-21 reviewed
    Component-level GPU control yields 10% energy savings

    CompPow: A Case for Component-level GPU Power Management

    Shaizeen Aga +1

  12. cs.AR 2026-05-20 reviewed
    Dynamic control-flow speeds up reconfigurable processors

    Supporting Dynamic Control-Flow Execution for Runtime Reconfigurable Processors

    Hassan Nassar +3

  13. cs.DC 2026-05-20 reviewed
    Roadside perception services turn on only when vehicles approach

    Cloud-Native Operation of Roadside Infrastructure Enabling Demand-Driven Collective Perception via V2X

    Lukas Zanger +5

  14. quant-ph 2026-05-20 reviewed
    Telesistors provide noise-protected Clifford gates for quantum computing

    Towards transistor-based quantum computing

    Y.-D. Liu +3

  15. cs.AR 2026-05-20 reviewed
    ELSA gives spiking networks 3.4x faster inference than top accelerators

    ELSA: An ELastic SNN Inference Architecture for Efficient Neuromorphic Computing

    Kang You +8

  16. cs.NE 2026-05-20 reviewed
    ReRAM macro reaches 419 TOPS/W for edge neural inference

    E-ReCON: An Energy- and Resource-Efficient Precision-Configurable Sparse nvCIM Macro for Conventional and Spiking Neural Edge Inference

    Ankit Kumar Tenwar +2

  17. cs.CR 2026-05-19 reviewed
    Multi-rank PIM beats CPUs on AES and SHA-256

    Taking Cryptography Out of the Data Path via Near-Memory Processing in DRAM

    Nicola Barcarolo +5

  18. cs.AR 2026-05-19 reviewed
    Hardware latch enables 452 nA quiescent drain in sensors

    A Hardware-Based Multi-Stage Dynamic Power Management Architecture for Autonomous Low-Light Operation

    Charalampos S. Kouzinopoulos +8

  19. cs.AR 2026-05-19 reviewed
    Digital near-memory design accelerates GNNs up to 230x

    A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks

    Siddhartha Raman Sundara Raman +2

  20. cs.AR 2026-05-19 reviewed
    Only two of five LLMs finish valid SoC co-design

    HSCO-Bench: An Agent-Driven End-to-End Hardware-Software Co-design Benchmark for Systems-on-Chip

    Pei-Huan Tsai +4

  21. cs.AR 2026-05-18 reviewed
    Software scheduling predicts optical thermal drift early

    Predictive Software Scheduling as an Early-Warning Hint Layer for Optical Engine Thermal Drift in Heterogeneous SoIC Packaging

    Chi Fei Chung

  22. cs.AR 2026-05-18 reviewed
    Input flips extend multiplier life under NBTI aging

    Building Reliable Arithmetic Multipliers Under NBTI Aging and Process Variations

    Masoud Heidary +1

  23. cs.DC 2026-05-18 reviewed
    Hybrid cluster cuts HTTP response time by over 40%

    iHAC: A Hybrid Cluster Architecture for Enhanced Performance and Resilience

    Siddique Abubakr Muntaka +11

  24. cs.NI 2026-05-18 reviewed
    Hybrid radio matches dedicated performance with far less setup

    Enabling Agile Ambient IoT Networking via a Parameterized Hybrid Radio

    Jiazhen Lei +10

  25. cs.AR 2026-05-18 reviewed
    JSON IR and compiler checks lift LLM circuit correctness

    CPPL: A Circuit Prompt Programming Language

    Shuo Yin +8

  26. cs.AR 2026-05-18 reviewed
    ROA bricks stabilize SHIL signals for Ising machines under variations

    ROA-Based Subharmonic Injection Locking for Oscillator-Based Ising Machines

    Nicholas Sica +1

  27. cs.AR 2026-05-17 reviewed
    Direct AIE links enable 0.93 μs DNN inference on ACAP

    {\mu}-ORCA: Optimizing Acceleration for Microsecond-Scale Deep Neural Network Inference on ACAP

    Shixin Ji +5

  28. cs.AR 2026-05-17 reviewed
    Compressed KV cache yields full accuracy at 4x throughput

    VeriCache: Turning Lossy KV Cache into Lossless LLM Inference

    Jiayi Yao +9

  29. cs.AR 2026-05-16 reviewed
    Workload traces cut early PDN metal area by up to 33%

    Workload-Aware Early-Stage Power Delivery Network Optimization via Architectural Power Traces

    Oran Hayes +6

  30. cs.AR 2026-05-16 reviewed
    Near-cache accelerator speeds sparse ILP 15x with 152x less energy

    A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture

    Siddhartha Raman Sundara Raman +2

  31. cs.AR 2026-05-15 reviewed
    Traversal stack guides precise prefetching for faster ray tracing

    TTP: A Hardware-Efficient Design for Precise Prefetching in Ray Tracing

    Yavuz Selim Tozlu +2

  32. cs.AR 2026-05-15 reviewed
    6T SRAM sorting cuts latency by 3.4x versus memristor methods

    ADS-IMC: Accelerating Data Sorting with In-Memory Computation

    Narendra Singh Dhakad +1

  33. cs.AR 2026-05-15 reviewed
    SRAM engine halves routing for binary neural nets

    SRAM Based Digital Custom Compute Engine for Improved Area Efficiency of AI Hardware

    Narendra Singh Dhakad +1

  34. cs.LO 2026-05-15 reviewed
    Certificate-aware PDR solves six more instances with smaller proofs

    Certificate-Aware Property-Directed Reachability

    Arman Ferdowsi +1

  35. cs.AR 2026-05-15 reviewed
    Instruction correlation prefetcher beats prior art by 14% with 2 KB storage

    ICP: Exploiting Instruction Correlation for Prefetching Irregular Memory Accesses

    Mengming Li +9

  36. cs.AR 2026-05-15 reviewed
    Intra-thread duplication catches 39% more defective servers

    ITHICA: Intra-Thread Instruction Checking Approach for Defect-Induced Silent Data Corruptions

    Ioanna Vavelidou +5

  37. quant-ph 2026-05-14 reviewed
    Cache reorganization lifts GPU speedups for 28-qubit simulations on laptops

    Accelerating State-Vector Quantum Simulation on Integrated GPUs via Cache Locality Optimization: A Cross-Architecture Evaluation

    Gabriel Fernandes Thomaz +4

  38. cs.AR 2026-05-14 reviewed
    Agentic AI automates full accelerator design from scientific applications

    A3D: Agentic AI flow for autonomous Accelerator Design

    Abinand Nallathambi +4

  39. cs.ET 2026-05-13 reviewed
    Time-domain near-memory MAC reaches 7.62 TOPS/W

    Time Domain Near Memory Computing Engine

    Sarthak Antal +1

  40. cs.CV 2026-05-13 reviewed
    ViTs reach 84% accuracy by replacing layer norm with evolved scalars

    Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation

    Kieran Carrigg +3

  41. cs.AR 2026-05-13 reviewed
    End-to-end DVS-memristor system is the missing piece for low-power vision

    Memristor Technologies for Dynamic Vision Sensors: A Critical Assessment and Research Roadmap

    Mohamad Yazan Sadoun +3

  42. cs.AR 2026-05-13 reviewed
    AI agents drop 37-58% on hardware vs software tasks

    Is Agentic AI Ready for Real-World Hardware Engineering? A Deep Dive with Phoenix-bench

    Qingyun Zou +4

  43. cs.AR 2026-05-13 reviewed
    FPGA accelerator skips sparse beams for 2x faster MIMO localization

    Efficient Implementation of an Adaptive Transformer Accelerator for Massive MIMO Outdoor Localization

    Ilayda Yaman +3

  44. cs.AR 2026-05-13 reviewed
    7B model surpasses 671B baselines on SVA generation

    Reward-Weighted On-Policy Distillation with an Open Property-Equivalence Verifier for NL-to-SVA Generation

    Qingyun Zou +4

  45. cs.AR 2026-05-13 reviewed
    FPGA lock agents boost OLTP throughput 51X over CPUs

    FPGA-Accelerated Lock Management and Transaction Processing: Architecture, Optimization, and Design Space Exploration

    Shien Zhu +1

  46. cs.AR 2026-05-13 reviewed
    PoisonCap gives CHERI strict use-after-free at zero overhead

    PoisonCap: Efficient Hierarchical Temporal Safety for CHERI

    Yuecheng Wang +7

  47. cs.AR 2026-05-13 reviewed
    GenAI workflow maps RISC-V supply chains for risk analysis

    GenAI-Driven Approach to RISC-V Supply Chain Exploration

    Nenad Petrovic +3

  48. cs.LG 2026-05-12 reviewed
    Block-scale search cuts quantization error 27% in BFP

    Search Your Block Floating Point Scales!

    Tanmaey Gupta +12

  49. cs.AR 2026-05-12 reviewed
    Joint TLB-cache tweaks boost instruction prefetching 8.7%

    Enhancing Instruction Prefetching via Cache and TLB Management

    Alexandre Valentin Jamet +4

  50. cs.AR 2026-05-12 reviewed
    FPGA SoC matches silicon SNN accuracy for neuromorphic edge tasks

    Heterogeneous SoC Integrating an Open-Source Recurrent SNN Accelerator for Neuromorphic Edge Computing on FPGA

    Michelangelo Barocci +3