pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 9

  1. cs.CR 2026-01-30 reviewed
    NTT design detects Trojan control and timing faults in PQC

    Trojan-Resilient NTT: Protecting Against Control Flow and Timing Faults on Reconfigurable Platforms

    Rourab Paul +2

  2. cs.AR 2026-01-28 reviewed
    First NPU designed for diffusion language model inference

    NPU Design for Diffusion Language Model Inference

    Binglei Lou +11

  3. cs.AR 2026-01-22 reviewed
    Hypergraphs cut spike traffic in neuromorphic SNN mappings

    A Case for Hypergraphs to Model and Map SNNs on Neuromorphic Hardware

    Marco Ronzani +1

  4. cs.PF 2026-01-21 reviewed
    Hybrid model cuts GPU kernel prediction error by 6.7x

    PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction

    Kaixuan Zhang +10

  5. cs.LG 2026-01-20 reviewed
    NSF urged to fund AI for faster chip design cycles

    Report for NSF Workshop on AI for Electronic Design Automation

    Deming Chen +9

  6. eess.SP 2026-01-13 reviewed
    Compact RISC-V core fits biomedical control in 708 LUTs

    Bio-RV: Low-Power Resource-Efficient RISC-V Processor for Biomedical Applications

    Vijay Pratap Sharma +4

  7. cs.AR 2026-01-05 reviewed
    Timing windows detect microcontroller ageing via frequency shifts

    Ageing Monitoring for Commercial Microcontrollers Based on Timing Windows

    Leandro Lanzieri +4

  8. cs.DC 2025-12-18 reviewed
    Tool spots bit-flip faults in LLMs for fast fixes

    BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs

    Muhammad Zeeshan Karamat +2

  9. cs.AR 2025-12-10 reviewed
    Dynamic buckets lift LLM cache use 19% on LPDDR chips

    ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

    Guoqiang Zou +4

  10. cs.LG 2025-12-09 reviewed
    Pipelined NN training sets delays by layer depth and reconstructs old weights with moving

    LayerPipe2: Multistage Pipelining and Weight Recompute via Improved Exponential Moving Average for Training Neural Networks

    Nanda K. Unnikrishnan +1

  11. cs.AR 2025-12-08 reviewed
    FPGA accelerator speeds graph classification 6.85× with 3.4% accuracy gain

    Efficient and Accurate Graph Classification with Hyperdimensional Computing on FPGA

    Jebacyril Arockiaraj +2

  12. cs.AR 2025-12-08 reviewed
    Generative transformer cuts circuit delay 30% and gates 50%

    GTAC: A Generative Transformer for Approximate Circuits

    Jingxin Wang +6

  13. cs.MS 2025-12-07 reviewed
    Models emulate NVIDIA Tensor Core behavior in low precision

    Accurate Models of NVIDIA Tensor Cores

    Faizan A. Khattak +1

  14. cs.AR 2025-11-27 reviewed
    Co-design framework accelerates domains up to 15x with low overhead

    Aquas: Enhancing Domain Specialization through Holistic Hardware-Software Co-Optimization based on MLIR

    Yuyang Zou +8

  15. cs.DC 2025-11-25 reviewed
    Voxel traits let Spira skip kernel-map overhead for 3x faster point-cloud convolution

    Spira: Exploiting Voxel Data Structural Properties for Efficient Sparse Convolution in Point Cloud Networks

    Dionysios Adamopoulos +3

  16. cs.LG 2025-11-25 reviewed
    Round-trip LLM translation catches hallucinations in hardware design

    Mitigating hallucinations and omissions in LLMs for invertible problems: An application to hardware logic design automation

    Andrew S. Cassidy +6

  17. cs.AR 2025-11-21 reviewed
    AmpereOne adds memory tagging with zero capacity overhead

    Optimized Memory Tagging on AmpereOne Processors

    Shivnandan Kaushik +16

  18. cs.AR 2025-11-21 reviewed
    Digital in-memory design reaches 3.59 TOPS/W for AI matrix math

    DISCA: A Digital In-memory Stochastic Computing Architecture Using A Compressed Bent-Pyramid Format

    Shady Agwa +3

  19. cs.AR 2025-11-19 reviewed
    Fused unit runs mixed-precision dot products in four cycles

    Ten-Four: An Open-Source Fused Dot Product Unit for Mixed-Precision GPGPU Tensor Cores

    Nikhil Rout +1

  20. cs.AR 2025-11-19 reviewed
    Joint data-compute tuning speeds ML kernels on PIM up to 13x

    DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures

    Peiming Yang +6

  21. cs.AR 2025-11-14 reviewed
    8x faster linearity testing for 16-bit SAR ADCs

    Advanced Strategies for Uncertainty-Guided Live Measurement Sequencing in Fast, Robust SAR ADC Linearity Testing

    Thorben Schey +3

  22. cs.AR 2025-11-14 reviewed
    Adaptive EKF sequencing cuts SAR ADC linearity test time

    Uncertainty-Guided Live Measurement Sequencing for Fast SAR ADC Linearity Testing

    Thorben Schey +3

  23. cs.AR 2025-11-14 reviewed
    Closed-loop tests yield first bit-accurate models for ten GPU matrix units

    Bit-Accurate Modeling of GPU Matrix Multiply-Accumulate Units: Demystifying Numerical Discrepancy and Accuracy

    Peichen Xie +4

  24. cs.DC 2025-11-13 reviewed
    Thermal imbalance creates stragglers that slow multi-GPU nodes

    Lit Silicon: A Case Where Thermal Imbalance Couples Concurrent Execution in Multiple GPUs

    Marco Kurzynski +2

  25. cs.AR 2025-11-10 reviewed
    Hybrid formats give 4.9× faster edge LLM inference on PIM

    P3-LLM: An Integrated NPU-PIM Accelerator for Edge LLM Inference Using Hybrid Numerical Formats

    Yuzong Chen +6

  26. cs.DC 2025-11-10 reviewed
    DMA offloads close 4.5x gap for latency-bound ML collectives

    DMA-Latte: Expanding the Reach of DMA Offloads to Latency-bound ML Communication

    Suchita Pati +5

  27. cs.AR 2025-11-06 reviewed
    Five-minute rule shrinks to seconds for AI systems

    Five-Minute Rule 40 Years Later: A First-Principles Revisit for Modern Memory Hierarchy

    Tong Zhang +9

  28. cs.AI 2025-11-05 reviewed
    SnapStream cuts KV cache memory by 4x for 128k LLM inference

    SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

    Jonathan Li +21

  29. quant-ph 2025-10-27 reviewed
    Two-ion traps beat larger designs for surface-code trapped-ion computers

    Architecting Scalable Trapped Ion Quantum Computers using Surface Codes

    Scott Jones +1

  30. cs.AR 2025-10-24 reviewed
    SilentZNS slashes ZNS SSD write amplification by 92%

    Eliminating the Hidden Cost of Zone Management in ZNS SSDs

    Teona Bagashvili +3

  31. cs.SE 2025-10-24 reviewed
    Search tunes allocators to cut heap use by 4 percent

    GreenMalloc: Allocator Optimisation for Industrial Workloads

    Aidan Dakhama +3

  32. cs.AR 2025-10-17 reviewed
    Fixed configs make Ramulator 2.0 match real memory performance

    Cleaning up the Mess: Re-Evaluating the Real-System Modeling Accuracy of Ramulator 2.0

    F. Nisa Bostanci +6

  33. cs.AR 2025-10-16 reviewed
    Dynamic pruning cuts vision transformer ops by 61 percent

    Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow

    Ching-Lin Hsiung +1

  34. cs.AR 2025-10-16 reviewed
    Two-stage adaptation hits 93% compression on CIM chips

    Computing-In-Memory Aware Model Adaption For Edge Devices

    Ming-Han Lin +1

  35. cs.AR 2025-10-09 reviewed
    Switch cache lifts HDFS metadata throughput up to 181%

    Fletch: File-System Metadata Caching in Programmable Switches

    Qingxiu Liu +6

  36. cs.DC 2025-10-08 reviewed
    Framework measures real makespans from abstract graphs on CPU-GPU-FPGA hardware

    Evaluating Rapid Makespan Predictions for Heterogeneous Systems with Programmable Logic

    Martin Wilhelm +3

  37. cs.DC 2025-10-07 reviewed
    Profiling uncovers patterns that speed up large MoE inference 6.6x

    Patterns behind Chaos: Forecasting Data Movement for Efficient Large-Scale MoE LLM Inference

    Zhongkai Yu +8

  38. cs.AR 2025-10-03 reviewed
    Extended precision cuts large Max-Cut solve times

    A Hardware Accelerator for the Goemans-Williamson Algorithm

    D. A. Herrera-Mart\'i +2

  39. cs.AR 2025-09-22 reviewed
    Chiplet RISC-V SoC achieves 40% efficiency gain for edge AI

    Chiplet-Based RISC-V SoC with Modular AI Acceleration

    Suhas Suresh Bharadwaj +1

  40. cs.AR 2025-09-11 reviewed
    Flattened arrays and quantization break LLM memory walls

    Combating the Memory Walls: Optimization Pathways for Long-Context Agentic LLM Inference

    Haoran Wu +17

  41. cs.AR 2025-09-10 reviewed
    FASE runs multi-thread benchmarks on FPGA before SoC integration

    FASE: FPGA-Assisted Syscall Emulation for Rapid End-to-End Processor Performance Validation

    Chengzhen Meng +5

  42. cs.AR 2025-09-09 reviewed
    Lifetime variation enables 14.5X carbon reduction in disposable smart items

    Lifetime-Aware Design for Item-Level Intelligence at the Extreme Edge

    Shvetank Prakash +15

  43. cs.AR 2025-09-09 reviewed
    Diffusion model optimizes all VLSI macros at once

    DiffPlace: A Conditional Diffusion Framework for Simultaneous VLSI Placement Beyond Sequential Paradigms

    Kien Le Trung +1

  44. quant-ph 2025-08-26 reviewed
    Resource estimates find feasible setups for distributed quantum computers

    Architecting Distributed Quantum Computers: Design Insights from Resource Estimation

    Dmitry Filippov +2

  45. hep-ex 2025-08-21 reviewed
    Linear GNN tags jets under 60 ns on FPGAs

    JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs

    Zhiqiang Que +10

  46. cs.AR 2025-08-12 reviewed
    Memory reads turn into stochastic multiplies for matrix work

    OISMA: On-the-fly In-memory Stochastic Multiplication Architecture for Matrix-Multiplication Workloads

    Shady Agwa +3

  47. cs.DC 2025-08-02 reviewed
    Expert-sharded KV storage cuts memory use in MoE inference

    PiKV: KV Cache Management System for Mixture of Experts

    Dong Liu +3

  48. cs.AR 2025-07-21 reviewed
    Retrieval lifts LLM success on RTL test fixes 7.72 times

    VeriRAG: A Retrieval-Augmented Framework for Automated RTL Testability Repair

    Haomin Qi +5

  49. cs.AI 2025-07-07 reviewed
    RL trains LLMs to output efficient Verilog designs

    ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning

    Zhirong Chen +10

  50. cs.AR 2025-07-06 reviewed
    Distributed arithmetic cuts FPGA neural net resources by a third

    da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

    Chang Sun +4