pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 6

  1. cs.DC 2026-04-18 reviewed
    Hierarchical sparsity speeds LLM attention 4.57 times

    HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

    Haoxuan Wang +1

  2. cs.AR 2026-04-17 reviewed
    Genetic search finds shift-add CNNs for 33% faster TinyML on FPGA

    Co-Design of CNN Accelerators for TinyML using Approximate Matrix Decomposition

    Jos\'e Juan Hern\'andez Morales +6

  3. cs.NI 2026-04-17 reviewed
    Real traces show congestion from HPC collectives

    Characterization of Real Communication Patterns and Congestion Dynamics in HPC Interconnection Networks

    Miguel S\'anchez de La Rosa +11

  4. cs.AR 2026-04-17 reviewed
    MemExplorer auto-designs memory for agentic NPUs

    MemExplorer: Navigating the Heterogeneous Memory Design Space for Agentic Inference NPUs

    Haoran Wu +17

  5. cs.AR 2026-04-17 reviewed
    MLIR unifies equivalence checking from algorithms to netlists

    EquivFusion: Unifying Hardware Equivalence Checking from Algorithms to Netlists via MLIR

    Jiaying Zhu +6

  6. cs.AR 2026-04-17 reviewed
    SRAM CIM accelerator hits 26.1 TOPS/W for attention

    CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration

    Bas Ahn +4

  7. cs.CR 2026-04-17 reviewed
    SRAM PUF with Hamming codes keeps IoT auth errors below 1%

    Secure Authentication in Wireless IoT: Hamming Code Assisted SRAM PUF as Device Fingerprint

    Florian Lehn +2

  8. cs.AR 2026-04-17 reviewed
    Specialized agents close hardware coverage with 4-13x fewer tokens

    Understanding Inference-Time Token Allocation and Coverage Limits in Agentic Hardware Verification

    Vihaan Patel +2

  9. cs.AR 2026-04-17 reviewed
    Annealing step stabilizes LLM-generated RTL designs

    HYPERHEURIST: A Simulated Annealing-Based Control Framework for LLM-Driven Code Generation in Optimized Hardware Design

    Shiva Ahir +2

  10. cs.AR 2026-04-17 reviewed
    Overmind hits 8.1 TOPS/W on neuro-symbolic workloads

    Overmind NSA: A Unified Neuro-Symbolic Computing Architecture with Approximate Nonlinear Activations and Preemptive Memory Bypass

    Weilun Wang +2

  11. cs.AR 2026-04-17 reviewed
    LLM agent closes hardware coverage gaps automatically

    Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

    Sean Lowe +5

  12. cs.AR 2026-04-17 reviewed
    LLM agent reaches 100% hardware coverage on simple designs

    Spec2Cov: An Agentic Framework for Code Coverage Closure of Digital Hardware Designs

    Sean Lowe +5

  13. cs.AR 2026-04-16 reviewed
    Symmetric grids lift photonic AI use by 6X

    Towards Topology-Aware Very Large-Scale Photonic AI Accelerators

    Belal Jahannia +2

  14. cs.AR 2026-04-16 reviewed
    Rack storage tames millisecond GPU power swings

    EasyRider: Mitigating Power Transients in Datacenter-Scale Training Workloads

    Dillon Jensen +6

  15. cs.AR 2026-04-16 reviewed
    Microcontroller fixes timing for real-time photoacoustic imaging

    Democratization of Real-time Multi-Spectral Photoacoustic Imaging: Open-Sourced System Architecture for OPOTEK Phocus & Verasonics Vantage Combination

    Ryo Murakami +2

  16. cs.AR 2026-04-16 reviewed
    SCENIC hits 200G SmartNIC speed with programmable stream units

    SCENIC: Stream Computation-Enhanced SmartNIC

    Benjamin Ramhorst +6

  17. cs.AR 2026-04-16 reviewed
    LLM agents evolve the ABC synthesis tool to higher QoR

    Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC

    Cunxi Yu +1

  18. cs.AI 2026-04-16 reviewed
    Agentic AI improves RTL timing by 21 percent on real designs

    Dr. RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement

    Wenji Fang +7

  19. cs.AR 2026-04-16 reviewed
    CRONet runs fully on-chip on AIE-ML for 2.49x latency gain

    Accelerating CRONet on AMD Versal AIE-ML Engines

    Kaustubh Mhatre +6

  20. physics.optics 2026-04-16 reviewed
    Unary encoding boosts parallelism in photonic tensor cores

    Scaling Photonic Tensor Cores with Unary and Homodyne Designs

    Oluwaseun Alo +1

  21. cs.AR 2026-04-16 reviewed
    Multi-agent testbenches match SOTA Verilog generation with less data

    Exploring LLM-based Verilog Code Generation with Data-Efficient Fine-Tuning and Testbench Automation

    Mu-Chi Chen +8

  22. cs.LG 2026-04-16 reviewed
    MoE serving gains 6.6x speedup via elastic self-speculation on 3D stacks

    ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

    Yuseon Choi +7

  23. cs.PF 2026-04-16 reviewed
    L4 GPU delivers up to 4.4x inference throughput over T4

    DEEP-GAP: Deep-learning Evaluation of Execution Parallelism in GPU Architectural Performance

    Kathiravan Palaniappan

  24. cs.AR 2026-04-16 reviewed
    Knowledge graph guides LLMs to build correct RISC-V hardware

    VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs

    Sazzadul Islam +2

  25. cs.AR 2026-04-15 reviewed
    Chiplet tasks cut LLM decode latency on multi-die GPUs

    Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs

    Sangeeta Chowdhary +9

  26. cs.AR 2026-04-15 reviewed
    Embeddings detect line-level CWEs in Verilog at 89% precision

    VeriCWEty: Embedding enabled Line-Level CWE Detection in Verilog

    Prithwish Basu Roy +6

  27. cs.AR 2026-04-15 reviewed
    ASIC emulates oscillators to solve max-cut and coloring at 97-100% accuracy

    An ASIC Emulated Oscillator Ising/Potts Machine Solving Combinatorial Optimization Problems

    Yilmaz Ege Gonul +1

  28. cs.AR 2026-04-15 reviewed
    Memory stack runs full matrix math inside the chip

    GEM3D CIM General Purpose Matrix Computation Using 3D Integrated SRAM eDRAM Hybrid Compute In Memory on Memory Architecture

    Subhradip Chakraborty +2

  29. cs.AR 2026-04-15 reviewed
    LSTM accelerator spots gait issues 4x faster on tiny ASIC

    Cross-Layer Co-Optimized LSTM Accelerator for Real-Time Gait Analysis

    Mohammad Hasan Ahmadilivani +4

  30. cs.AR 2026-04-15 reviewed
    Pipeline lifts bit-level accelerator code to tensor ISA specs

    ATLAAS: Automatic Tensor-Level Abstraction of Accelerator Semantics

    Ruijie Gao +3

  31. cs.LG 2026-04-14 reviewed
    Full biosignal model tuning runs under 50mW on edge chips

    BioTrain: Sub-MB, Sub-50mW On-Device Fine-Tuning for Edge-AI on Biosignals

    Run Wang +7

  32. cs.AR 2026-04-14 reviewed
    Hardware unit reorganizes data on the fly for ideal CPU cache locality

    Tensor Memory Engine: On-the-fly Data Reorganization for Ideal Locality

    Denis Hoornaert +5

  33. cs.LG 2026-04-14 reviewed
    TCL tunes tensor programs 16x faster across CPU and GPU

    TCL: Enabling Fast and Efficient Cross-Hardware Tensor Program Optimization via Continual Learning

    Chaoyao Shen +7

  34. cs.AR 2026-04-14 reviewed
    EPAC RISC-V chip with three tiles taped out in 22nm

    EPAC: The Last Dance

    Filippo Mantovani +38

  35. cs.AR 2026-04-14 reviewed
    CODO compiler speeds FPGA dataflow designs up to 33x on DNNs

    CODO: An Automated Compiler for Comprehensive Dataflow Optimization

    Weichuang Zhang +8

  36. cs.AR 2026-04-14 reviewed
    Passive optical elements classify images by embedded phase patterns

    Photonic AI: A Hybrid Diffractive Holographic Neural System for Passive Optical Real-Time Image Classification

    Prakul Sunil Hiremath

  37. cs.AR 2026-04-14 reviewed
    Hadamard patterns cut RRAM read noise impact in neural nets

    HARP: Hadamard-Domain Write-and-Verify for Noise-Robust RRAM Programming

    Ilhuan Choi +5

  38. cs.AR 2026-04-14 reviewed
    Compiler cuts NPU transformer energy use by up to 41%

    Forge-UGC: FX optimization and register-graph engine for universal graph compiler

    Satyam Kumar +1

  39. cs.AI 2026-04-13 reviewed
    Reference-based replication creates AI agents in constant time

    Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents

    Swanand Rao +3

  40. cs.LG 2026-04-13 reviewed
    Imitation learning yields thermal-safe LFM schedules on 3D many-cores

    Active Imitation Learning for Thermal- and Kernel-Aware LFM Inference on 3D S-NUCA Many-Cores

    Yixian Shen +5

  41. cs.AR 2026-04-13 reviewed
    Decoupled matrix units deliver up to 2.31x AI speedups on CPUs

    CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

    Jinpeng Ye +13

  42. cs.CV 2026-04-13 reviewed
    Neural model sequences shape operations for better mask correction

    MorphOPC: Advancing Mask Optimization with Multi-scale Hierarchical Morphological Learning

    Yuting Hu +6

  43. cs.AR 2026-04-13 reviewed
    CIM design runs 1B-4B models at 336 tokens/s with 49x energy gain

    EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

    Jinane Bazzi +4

  44. cs.AR 2026-04-13 reviewed
    New dataset trains ML models on 61k chip layout windows for capacitance

    CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction

    Hector R. Rodriguez +2

  45. cs.AR 2026-04-13 reviewed
    High-bandwidth storage enables interactive 13B model inference on mobiles

    Technology solutions targeting the performance of gen-AI inference in resource constrained platforms

    Joyjit Kundu +3

  46. cs.AR 2026-04-13 reviewed
    Specialized LLM matches syntax but raises SVA semantic accuracy by 23 points

    Automated SVA Generation with LLMs

    Lik Tung Fu +6

  47. quant-ph 2026-04-13 reviewed
    Pulse sequence moves Rydberg excitation for remote CZ gates

    Compiler Framework for Directional Transport in Zoned Neutral Atom Systems with AOD Assistance: A Hybrid Remote CZ Approach

    Lingyi Kong +6

  48. cs.AR 2026-04-13 reviewed
    Heterogeneous PIM chiplet speeds graph DP 42x over GPU

    GEN-Graph: Heterogeneous PIM Accelerator for General Computational Patterns in Graph-based Dynamic Programming

    Yanru Chen +5

  49. cs.AR 2026-04-12 reviewed
    Optimal AI accelerator shifts with batch size and model scale

    The xPU-athalon: Quantifying the Competition of AI Acceleration

    Alicia Golden +3

  50. physics.optics 2026-04-12 reviewed
    Photonics scales AI past transistor density limits

    Harnessing Photonics for Machine Intelligence

    Hanqing Zhu +6