pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 3

  1. cs.AR 2026-05-07 reviewed
    DySHARP speeds MoE models 1.79x with dynamic in-switch computing

    Accelerating MoE with Dynamic In-Switch Computing on Multi-GPUs

    Qijun Zhang +12

  2. cs.AR 2026-05-06 reviewed
    Reconfigurable arrays nearly double GPU energy efficiency

    DICE: Enabling Efficient General-Purpose SIMT Execution with Statically Scheduled Coarse-Grained Reconfigurable Arrays

    Jiayi Wang +3

  3. cs.AR 2026-05-06 reviewed
    Two policies cut mean IPC loss 13.6 times

    Beyond Static Policies: Exploring Dynamic Policy Selection for Single-Thread Performance Optimization

    Yanxin Zhang +5

  4. cs.LG 2026-05-06 reviewed
    Joint training cuts AI multiplier power by up to 27 percent

    TRAM: Training Approximate Multiplier Structures for Low-Power AI Accelerators

    Chang Meng +5

  5. cs.AR 2026-05-06 reviewed
    Flow automatically converts flip-flops to two-phase latches

    An Open-Source Flow for Single-Phase, Edge-Triggered to Two-Phase, Non-Overlapping Clocking Conversion

    Paolo Pedroso +2

  6. cs.AR 2026-05-06 reviewed
    Multicore design achieves 3.1x speedup with four cores

    REPTILES: Repeated Tiles of Sargantana, a RISC-V multicore based on OpenPiton

    Noelia Oliete-Escu\'in +28

  7. cs.AR 2026-05-06 reviewed
    Agent Builds TurboQuant Accelerator in 80 Hours

    Design Conductor 2.0: An agent builds a TurboQuant inference accelerator in 80 hours

    The Verkor Team: Ravi Krishna +2

  8. cs.AR 2026-05-06 reviewed
    Commercial 3D NAND chips run over a billion bitwise ops error-free

    MCFlash: Bulk Bitwise Processing in 3D NAND with Dynamic Sensing and Multi-level Encoding

    Habib Ur Rahman +3

  9. cs.AR 2026-05-06 reviewed
    Data corruption dominates transient faults in RISC-V vectors

    Not All Faults Are Equal: Transient-Fault Sensitivity Characterization of an Open-Source RISC-V Vector Cluster

    Maoyuan Cai +3

  10. cs.LG 2026-05-06 reviewed
    Approximate multipliers allow full ResNet MoE recovery after retraining

    AxMoE: Characterizing the Impact of Approximate Multipliers on Mixture-of-Experts DNN Architectures

    Omkar B Shende +2

  11. cs.AR 2026-05-06 reviewed
    LLM framework builds UVM testbenches in 4.5 hours at 95.65% coverage

    UVMarvel: an Automated LLM-aided UVM Machine for Subsystem-level RTL Verification

    Junhao Ye +9

  12. cs.AR 2026-05-06 reviewed
    SDM circuit switching cuts NoC power by 38 percent

    Ultra Low-Power SDM-based Circuit-Switching for Networks-on-Chip

    Meysam Zaeemi +1

  13. cs.AR 2026-05-06 reviewed
    RangeGuard corrects 64+ bit flips using 16-bit parity in DNNs

    RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs

    Hanum Ko +3

  14. cs.AR 2026-05-05 reviewed
    GPU silent errors rarely produce NaN or infinity values

    The Anatomy of Silent Data Corruption: GPU Error Pattern Study and Modeling Guidance

    Chung-Hsuan Tung +7

  15. cs.DC 2026-05-05 reviewed
    Microbenchmark models predict GPU performance with 1% error

    Microbenchmark-Driven Analytical Performance Modeling Across Modern GPU Architectures

    Aaron Jarmusch +1

  16. cs.AR 2026-05-05 reviewed
    ISA-level model defines safe behaviors for programmable caches

    t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)

    Pranav Srinivasan +2

  17. cs.CR 2026-05-05 reviewed
    LIPPEN is a hardware-software co-design that encrypts the full 64-bit pointer in place

    LIPPEN: A Lightweight In-Place Pointer Encryption Architecture for Pointer Integrity

    Erfan Iravani +6

  18. cs.AR 2026-05-05 reviewed
    SPEC CPU2026 increases instruction volume and cache pressure

    SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison

    RuiHao Li +3

  19. cs.AR 2026-05-05 reviewed
    4-5 workloads preserve 96-99% of SPEC CPU2026 behavior

    SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison

    RuiHao Li +3

  20. cs.AR 2026-05-05 reviewed
    FPGA BNN YOLO detector matches ONNX at 0.999964 correlation

    Design and Implementation of BNN-Based Object Detection on FPGA

    Xuyu Zhao +7

  21. cs.AR 2026-05-05 reviewed
    FPGA runs BNN object detector matching software at 0.999964 correlation

    Design and Implementation of BNN-Based Object Detection on FPGA

    Xuyu Zhao +7

  22. cs.AR 2026-05-04 reviewed
    Narrow final layer cuts LGN FPGA use by 28%

    Resource Utilization of Differentiable Logic Gate Networks Deployed on FPGAs

    Stephen Wormald +3

  23. quant-ph 2026-05-04 reviewed
    Automated predecoders cut quantum decoder use by up to 4000 times

    Mitigating Classical Resource Costs in Quantum Error Correction via Generalized qLDPC Predecoding

    Alexander Knapen +8

  24. eess.SP 2026-05-04 reviewed
    Beamspace low-rank preconditioner cuts CG iterations by two to three

    Low-rank Preconditioning in Beamspace Domain For Massive MU-MIMO Long-Term Beamforming

    Amirreza Kiani +3

  25. cs.AR 2026-05-04 reviewed
    MRDIMMs raise server memory bandwidth 41% with 30% energy savings

    Performance and Energy Benefits of MRDIMMs

    Pau D\'iaz +9

  26. cs.AR 2026-05-04 reviewed
    Single encoding unifies device

    Cerberus: Cross-Layer ECC Co-Design for Robust and Efficient Memory Protection

    Junhwan Kim +4

  27. cs.AR 2026-05-04 reviewed
    Single encoding reused across DRAM ECC layers

    Cerberus: Cross-Layer ECC Co-Design for Robust and Efficient Memory Protection

    Junhwan Kim +4

  28. cs.NI 2026-05-04 reviewed
    One NIC data path runs TCP and RoCE at line rate

    A Protocol-Independent Transport Architecture

    Kimiya Mohammadtaheri +10

  29. cs.AR 2026-05-03 reviewed
    3D stacking cuts NCL circuit area by 44%

    Monolithic 3D Integration for Null Convention Logic (NCL)-Based Asynchronous Circuits

    Xiameng Zhang +3

  30. cs.LG 2026-05-03 reviewed
    The paper surveys neural architecture search methods through the lens of efficiency

    HERCULES: Hardware-Efficient, Robust, Continual Learning Neural Architecture Search

    Matteo Gambella +2

  31. cs.AR 2026-05-03 reviewed
    The paper introduces ViM-Q, a co-design of quantization techniques and custom FPGA…

    ViM-Q: Scalable Algorithm-Hardware Co-Design for Vision Mamba Model Inference on FPGA

    Shengzhe Lyu +4

  32. cs.IT 2026-05-03 reviewed
    SwiftChannel pairs a compressed deep learning model for reconstructing 5G channel…

    SwiftChannel: Algorithm-Hardware Co-Design for Deep Learning-Based 5G Channel Estimation

    Shengzhe Lyu +7

  33. cs.AR 2026-05-03 reviewed
    RISC-V pipeline at 8 stages triples frequency and lifts throughput 71 percent

    RV-IM100: Quantifying ISA Extension, Datapath Width, and Pipeline Depth Trade-offs in RISC-V Microarchitectures

    Hyunwoo Kang

  34. cs.AR 2026-05-03 reviewed
    IR-level register tweaks cut delay

    PipeRTL: Timing-Aware Pipeline Optimization at IR-Level for RTL Generation

    Shuo Yin +8

  35. cs.PF 2026-05-02 reviewed
    SPEC CPU 2026 standardizes mixed-workload CPU benchmarking

    SPEC CPU: The Next Generation

    Mahesh Madhav +33

  36. cs.AR 2026-05-02 reviewed
    FPGA accelerator speeds SVD for PCA 22x over GPU

    MANOJAVAM: A Scalable, Unified FPGA Accelerator for Matrix Multiplication and Singular Value Decomposition in Principal Component Analysis

    Srivaths Ramasubramanian +7

  37. cs.AR 2026-05-02 reviewed
    Gem5 call stacks reveal what stats miss in simulated CPUs

    Understanding Simulated Architecture via gem5 Call-Stack Profiling

    Johan S\"oderstr\"om (1) +2

  38. cs.AR 2026-05-02 reviewed
    AMSnet-q converts schematic images of analog and mixed-signal circuits into a fully…

    AMSnet-q: Unsupervised Circuit Identification and Performance Labeling for AMS Circuits

    Ze Zhang +8

  39. eess.IV 2026-05-02 reviewed
    Blackwell NVENC UHQ gains quality at 400% latency cost

    Evolution of NVENC Efficiency: A Longitudinal Analysis of HQ and UHQ Tuning Efficiency, Latency and Energy Trade-offs

    Kasidis Arunruangsirilert +1

  40. cs.AR 2026-05-01 reviewed
    Simulator models FlashAttention-3 pipelines to 5.7% error

    Sim-FA: A GPGPU Simulator Framework for Fine-Grained FlashAttention Pipeline Analysis

    Zhongchun Zhou +4

  41. cs.DC 2026-05-01 reviewed
    Fixed-core approach yields 211x higher efficiency for edge GEMM

    Tempus: A Temporally Scalable Resource-Invariant GEMM Streaming Framework for Versal AI Edge

    M. Grailoo +1

  42. cs.PF 2026-05-01 reviewed
    Apple Silicon runs 80B LLMs at 23x Nvidia energy efficiency

    Silicon Showdown: Performance, Efficiency, and Ecosystem Barriers in Consumer-Grade LLM Inference

    Abdurrahman Javat +1

  43. cs.AR 2026-05-01 reviewed
    Prototype chip runs 3B ternary LLM at 72 tokens per second

    VitaLLM: A Versatile and Tiny Accelerator for Mixed-Precision LLM Inference on Edge Devices

    Zi-Wei Lin +1

  44. cs.AR 2026-05-01 reviewed
    Subthreshold SRAM CIM hits 1181 TOPS/W for spiking networks

    A PVT-Resilient Subthreshold SRAM-Based In-Memory Computing Accelerator with In-Situ Regulation for Energy-Efficient Spiking Neural Networks

    Shih-Hang Kao +9

  45. cs.AR 2026-04-30 reviewed
    DPU-GPU split cuts CNN latency up to 3.37 times versus GPU alone

    DPU or GPU for Accelerating Neural Networks Inference -- Why not both? Split CNN Inference

    Ali Emre Oztas +3

  46. cs.CY 2026-04-30 reviewed
    AI trust can be measured via pillars and agentic interfaces

    I hope we don't do to trust what advertising has done to love

    Jade Alglave

  47. cs.CY 2026-04-30 reviewed
    AI trust needs pillars and vectors to stay meaningful

    I hope we don't do to trust what advertising has done to love

    Jade Alglave

  48. cs.AR 2026-04-30 reviewed
    Ring topology on FPGAs runs cortical circuit faster than real time

    NeuroRing: Scaling Spiking Neural Networks via Multi-FPGA Bidirectional Ring Topologies and Stream-Dataflow Architectures

    Muhammad Ihsan Al Hafiz +1

  49. cs.OS 2026-04-30 reviewed
    Affinity hints give 12% throughput boost on chiplet servers

    Affinity Tailor: Dynamic Locality-Aware Scheduling at Scale

    Jin Xin Ng +9

  50. cs.AR 2026-04-30 reviewed
    Memory chips run matrix math at 14.9 GFLOP/s

    AME-PIM: Can Memory be Your Next Tensor Accelerator?

    Emanuele Venieri +5