pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 7

  1. cs.AR 2026-04-12 reviewed
    Spatial heat patterns dominate power-grid lifetime over averages

    EMSpice 3: Full-chip Temperature-Aware Multiphysics Electromigration and IR-Drop Analysis

    Haotian Lu +1

  2. cs.AR 2026-04-12 reviewed
    Octree islands cut PCN feature fetching by 55-94 percent

    L-PCN: A Point Cloud Accelerator Exploiting Spatial Locality through Octree-based Islandization

    Yiming Gao +7

  3. cs.AR 2026-04-12 reviewed
    Two-stage mining extracts accurate message flows from SoC traces

    AutoFlows++: Hierarchical Message Flow Mining for System on Chip Designs

    Bardia Nadimi +1

  4. cs.AR 2026-04-12 reviewed
    BFP NPU hits near-DMR reliability at 3.55% overhead

    From Characterization to Microarchitecture: Designing an Elegant and Reliable BFP-Based NPU

    Jie Zhang +6

  5. cs.AR 2026-04-12 reviewed
    Re-partitioned NPU catches and fixes faults in under a microsecond

    Strix: Re-thinking NPU Reliability from a System Perspective

    Jiapeng Guan +10

  6. cs.AR 2026-04-12 reviewed
    LLM training resists low GPU fault rates but fails in key paths

    LLM-PRISM: Characterizing Silent Data Corruption from Permanent GPU Faults in LLM Training

    Abhishek Tyagi +6

  7. cs.AR 2026-04-11 reviewed
    Chip renders 3D Gaussian Splatting at 129 FPS in full HD

    A 129FPS Full HD Real-Time Accelerator for 3D Gaussian Splatting

    Fang-Chi Chang +1

  8. cs.PF 2026-04-11 reviewed
    Wave-aware model picks near-optimal GPU kernel settings fast

    WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning

    Kaixuan Zhang +8

  9. cs.AR 2026-04-11 reviewed
    Sparse measurements predict latency at every CPU-GPU frequency

    Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge

    Jiesong Chen +3

  10. cs.DC 2026-04-11 reviewed
    FlexVector speeds GCN inference 3.78x with flexible registers

    FlexVector: A SpMM Vector Processor with Flexible VRF for GCNs on Varying-Sparsity Graphs

    Bohan Li +5

  11. cs.AR 2026-04-11 reviewed
    Open framework speeds SystemC-FPGA co-emulation up to 2500x

    Late Breaking Results: CHESSY: Coupled Hybrid Emulation with SystemC-FPGA Synchronization

    Lorenzo Ruotolo +9

  12. cs.AR 2026-04-11 reviewed
    Microcontroller runs full SNN simulation at 20 mW

    Full Feature Spiking Neural Network Simulation on Micro-Controllers for Neuromorphic Applications at the Edge

    L. Niedermeier +1

  13. cs.AR 2026-04-11 reviewed
    DNN-resilient voltage scaling cuts aging degradation up to 46%

    Aging Aware Adaptive Voltage Scaling for Reliable and Efficient AI Accelerators

    Tong Xie +5

  14. cs.AR 2026-04-10 reviewed
    Photonic accelerator speeds transformers 7.6x with lower energy

    Sustainable Transformer Neural Network Acceleration with Stochastic Photonic Computing

    S. Afifi +3

  15. cs.AR 2026-04-10 reviewed
    0.5V encoder maps voltages to spikes within 5.6 percent linearity

    A 0.5-V Linear Neuromorphic Voltage-to-Spike Encoder Using a Bulk-Driven Transconductor

    Meysam Akbari +2

  16. cs.DC 2026-04-10 reviewed
    MATCHA cuts DNN inference latency up to 35% on heterogeneous edge SoCs

    MATCHA: Efficient Deployment of Deep Neural Networks on Multi-Accelerator Heterogeneous Edge SoCs

    Enrico Russo +8

  17. cs.AR 2026-04-10 reviewed
    Diffusion models cut energy 36% by tolerating controlled faults

    DRIFT: Harnessing Inherent Fault Tolerance for Efficient and Reliable Diffusion Model Inference

    Jinqi Wen +3

  18. cs.AR 2026-04-10 reviewed
    Key signals cut RTL assertion needs by two thirds

    From Indiscriminate to Targeted: Efficient RTL Verification via Functionally Key Signal-Driven LLM Assertion Generation

    Yonghao Wang +10

  19. cs.AR 2026-04-09 reviewed
    Neuromorphic chips hit new memory wall from on-chip storage

    Memory Wall is not gone: A Critical Outlook on Memory Architecture in Digital Neuromorphic Computing

    Amirreza Yousefzadeh +2

  20. cs.PL 2026-04-09 reviewed
    Profile labels cut memory dependence checks 79% on small cores

    PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores

    Luke Panayi +7

  21. cs.DC 2026-04-09 reviewed
    Energy-efficient GPUs deliver better value under budget limits

    Wattlytics: A Web Platform for Co-Optimizing Performance, Energy, and TCO in HPC Clusters

    Ayesha Afzal +2

  22. cs.AR 2026-04-09 reviewed
    ATLAS models 3D-DRAM LLM accelerators to 8.57% of silicon accuracy

    A Full-Stack Performance Evaluation Infrastructure for 3D-DRAM-based LLM Accelerators

    Cong Li +13

  23. cs.AR 2026-04-09 reviewed
    Mamba-3 raises edge latency up to 48% to favor cloud GPUs

    The Hyperscale Lottery: How State-Space Models Have Sacrificed Edge Efficiency

    Robin Geens +3

  24. cs.PL 2026-04-09 reviewed
    Faster 32-bit constant division on 64-bit CPUs

    Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets

    Shigeo Mitsunari +1

  25. cs.DC 2026-04-09 reviewed
    Integrated panels give orbital AI 100 kW per ton

    Reduced-Mass Orbital AI Inference via Integrated Solar, Compute, and Radiator Panels

    Stephen Gaalema +2

  26. cs.AR 2026-04-08 reviewed
    TrilinearCIM runs Transformer attention in NVM without reprogramming

    Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer Acceleration

    Md Zesun Ahmed Mia +3

  27. cs.AR 2026-04-08 reviewed
    RL agent designs ASIC chips for AI that adapt across 7 process nodes

    From LLM to Silicon: RL-Driven ASIC Architecture Exploration for On-Device AI Inference

    Ravindra Ganti +1

  28. cs.AR 2026-04-08 reviewed
    FILCO reconfigures DNN accelerators on the fly for 1.3x-5x gains

    FILCO: Flexible Composing Architecture with Real-Time Reconfigurability for DNN Acceleration

    Xingzhen Chen +7

  29. cs.AR 2026-04-08 reviewed
    Symbolic analysis estimates energy for loop nests independent of size

    Symbolic Polyhedral-Based Energy Analysis for Nested Loop Programs

    Avinash Mahesh Nirmala +3

  30. cs.CV 2026-04-08 reviewed
    Onboard EO processing delivers sub-3m burnt-area maps

    Assessing the Added Value of Onboard Earth Observation Processing with the IRIDE HEO Service Segment

    Parampuneet Kaur Thind +5

  31. cs.AR 2026-04-08 reviewed
    GQA models cut peak memory 2.72x versus MHA on embedded hardware

    TRAPTI: Time-Resolved Analysis for SRAM Banking and Power Gating Optimization in Embedded Transformer Inference

    Jan Klhufek +4

  32. cs.AR 2026-04-08 reviewed
    New chip runs annealing and reservoir tasks at 25-54x efficiency

    CBM-Dual: A 65-nm Fully Connected Chaotic Boltzmann Machine Processor for Dual Function Simulated Annealing and Reservoir Computing

    Kanta Yoshioka +5

  33. cs.AR 2026-04-08 reviewed
    SHIELD cuts eDRAM refresh energy 35% for edge LLM inference

    SHIELD: A Segmented Hierarchical Memory Architecture for Energy-Efficient LLM Inference on Edge NPUs

    Jintao Zhang +1

  34. cs.AR 2026-04-08 reviewed
    SwarmIO emulates 40M IOPS SSDs for GPUs with 300x speedup

    SwarmIO: Towards 100 Million IOPS SSD Emulation for Next-generation GPU-centric Storage Systems

    Hyeseong Kim +2

  35. cs.AR 2026-04-08 reviewed
    One DC simulation calibrates LLM equations for analog sizing

    A Self-Calibrating Framework for Analog Circuit Sizing Using LLM-Derived Analytical Equations

    Antonio J. Bujana +1

  36. cs.AR 2026-04-08 reviewed
    Coverage feedback raises assertion coverage 9-15 percent

    CoverAssert: Iterative LLM Assertion Generation Driven by Functional Coverage via Syntax-Semantic Representations

    Yonghao Wang +10

  37. eess.SP 2026-04-07 reviewed
    Dominant interferer nulling cuts CG iterations in massive MU-MIMO

    Interference Suppression for Massive MU-MIMO Long-Term Beamforming with Matrix Inversion Approximation

    Amirreza Kiani +3

  38. cs.DC 2026-04-07 reviewed
    Power reconstruction shows 79% energy cut from mixed precision on Frontier

    Fine-Grained Power and Energy Attribution on AMD GPU/APU-Based Exascale Nodes

    Adam McDaniel +10

  39. cs.AR 2026-04-07 reviewed
    PHAROS finds more deadline-meeting accelerator designs

    PHAROS: Pipelined Heterogeneous Accelerators for Real-time Safety-critical Systems With Deadline Compliance

    Shixin Ji +8

  40. cs.AR 2026-04-06 reviewed
    Prime power moduli simplify RNS integer division hardware

    Direct Integer Division in RNS and its Hardware Solutions

    Eric B. Olsen

  41. cs.AR 2026-04-06 reviewed
    KV cache choice depends on memory limits and request load

    Comparative Characterization of KV Cache Management Strategies for LLM Inference

    Oteo Mamo +3

  42. cs.CR 2026-04-06 reviewed
    GPU boosts encrypted LLM nonlinear layers by up to 17 times

    GPU Acceleration of TFHE-Based High-Precision Nonlinear Layers for Encrypted LLM Inference

    Guoci Chen +7

  43. cs.AR 2026-04-06 reviewed
    DRAM PIM techniques create bursty power demands that stress delivery networks

    A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM

    Siddhartha Raman Sundara Raman +2

  44. cs.AR 2026-04-06 reviewed
    Tool explores 250 trillion 3D AI accelerator designs 100000 times faster

    DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

    Zhiwen Mo +13

  45. cs.AR 2026-04-06 reviewed
    Neuromorphic hardware could break CMOS energy limits for AI

    Neuromorphic Computing for Low-Power Artificial Intelligence

    Keshava Katti +2

  46. cs.CR 2026-04-06 reviewed
    GPIR lifts GPU PIR speed by up to 297 times

    GPIR: Enabling Practical Private Information Retrieval with GPUs

    Hyesung Ji +5

  47. cs.AR 2026-04-06 reviewed
    CGRA sharing with migration cuts workload time by 70%

    Mestra: Exploring Migration on Virtualized CGRAs

    Agamemnon Kyriazis +4

  48. cs.AR 2026-04-06 reviewed
    Packed LUTs deliver 1.82x speedup for DNN inference on DRAM-PIM

    LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

    Junguk Hong +8

  49. cs.AR 2026-04-06 reviewed
    Bit partitioning lets one PE run FP8 or dual FP4 with 60% less area

    DHFP-PE: Dual-Precision Hybrid Floating Point Processing Element for AI Acceleration

    Shubham Kumar +3

  50. cs.CR 2026-04-05 reviewed
    Hardware cuts real-time interrupt latency by 50x

    Enabling Deterministic User-Level Interrupts in Real-Time Processors via Hardware Extension

    Hongbin Yang +2