pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 2

  1. cs.AR 2026-05-12 reviewed
    Analog recurrence works at sub-microwatt power via bistable units

    Hardware-Software Co-Design of Scalable, Energy-Efficient Analog Recurrent Computations

    Arthur Fyon +5

  2. quant-ph 2026-05-12 reviewed
    Calibration feedback control cuts optimization gaps in local and tight-loop regimes

    Runtime Calibration as State-Trajectory Feedback Control in Quantum-Classical Workflows

    Xiaolong Deng

  3. cs.LG 2026-05-12 reviewed
    Cumulative updates fix gradient flow in low-power RNNs

    Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

    Julien Brandoit +3

  4. cs.AR 2026-05-11 reviewed
    Dynamic scheduler lifts MoE inference 1.3-1.6x on PIM hardware

    Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models

    Jungwoo Kim +7

  5. cs.AR 2026-05-11 reviewed
    Triton gains direct warp-group control for modern GPU hardware

    TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

    Yue Guan +12

  6. cs.AR 2026-05-11 reviewed
    TLX adds MIMW warp-group control to Triton for modern GPUs

    TLX: Hardware-Native, Evolvable MIMW GPU Compiler for Large-scale Production Environments

    Yue Guan +12

  7. cs.CR 2026-05-11 reviewed
    LLMs generate hardware code but introduce security risks

    LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

    Johann Knechtel +2

  8. cs.CR 2026-05-11 reviewed
    LLMs automate chip design but create security risks

    LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

    Johann Knechtel +2

  9. cs.CR 2026-05-11 reviewed
    LLMs Generate RTL Code but Create New Hardware Risks

    LLMs for Secure Hardware Design and Related Problems: Opportunities and Challenges

    Johann Knechtel +2

  10. cs.AR 2026-05-11 reviewed
    Hybrid chip runs GNN at 2.94M events/sec for physics triggers

    Reconfigurable Computing Challenge: Real-Time Graph Neural Networks for Online Event Selection in Big Science

    Marc Neu +5

  11. cs.AR 2026-05-11 reviewed
    Error profiles detect stolen approximate circuit IP despite mimicry

    ObfAx: Obfuscation and IP Piracy Detection in Approximate Circuits

    Lukas Sekanina +1

  12. cs.AR 2026-05-11 reviewed
    Piezoelectric sensors turn desk vibrations into six-gesture commands

    Towards an End-To-End System for Real-Time Gesture Recognition from Surface Vibrations

    Florian Hettstedt +5

  13. cs.AI 2026-05-11 reviewed
    Hardware assertion sets reduced by 76 percent

    Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring

    Hongqin Lyu +4

  14. cs.AR 2026-05-11 reviewed
    LLM agents size RF amplifiers via resource allocation

    RFAmpDesigner: A Self-Evolving Multi-Agent LLM Framework for Automated Radio Frequency Amplifier Design

    Hang Lu +11

  15. cs.AR 2026-05-10 reviewed
    KV-cache movement regularization cuts static-graph LLM latency spikes

    KV-RM: Regularizing KV-Cache Movement for Static-Graph LLM Serving

    Zhiqing Zhong +5

  16. cs.AR 2026-05-10 reviewed
    Wafer integration of three 2D devices decides next computing decade

    Emerging 2D Materials for Beyond von Neumann Computing: A Perspective

    Yaser Banad

  17. cs.CL 2026-05-10 reviewed
    LLM accuracy depends only on evicted tokens

    Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

    Aojie Yuan +2

  18. cs.AR 2026-05-10 reviewed
    ReRAM-on-logic chip reaches 14-136 tokens per second on LLMs

    31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding

    Pingcheng Dong +15

  19. quant-ph 2026-05-10 reviewed
    Memoized heuristics scale ion-trap qubit mapping

    Scaling Qubit Mapping and Routing With Position Graph Abstraction and Memoization

    Brent Russon +3

  20. cs.AR 2026-05-09 reviewed
    Complex GAN metric separates gate-failure effects in circuits

    Fault tolerance estimation in digital circuits with visualised generative networks

    Sascha Biel +4

  21. cs.LG 2026-05-09 reviewed
    MPS decoding latency spikes up to 21x in narrow ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  22. cs.LG 2026-05-09 reviewed
    Apple MPS shows 21x latency spikes in narrow decoding ranges

    Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes

    Willy Fitra Hendria

  23. cs.AR 2026-05-09 reviewed
    New cache bypass method meets deadlines while boosting heterogeneous system speed

    HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

    Ayushi Agarwal +2

  24. cs.AR 2026-05-09 reviewed
    HyDRA balances accelerator deadlines with cache reuse via clustering

    HyDRA: Deadline and Reuse-Aware Cacheability for Hardware Accelerators

    Ayushi Agarwal +2

  25. eess.SP 2026-05-09 reviewed
    Low-complexity denoiser matches heavy mmWave MIMO methods

    Low-Complexity Beamspace Channel Denoiser for mmWave Massive MIMO with Low-Resolution ADCs

    Hanyoung Park +2

  26. cs.AR 2026-05-09 reviewed
    Reconfigurable multiplier cuts power 44-68% in RISC-V core

    A Reconfigurable Multiplier Architecture for Error-Resilient Applications in RISC-V Core

    Pragun Jaswal +2

  27. cs.AR 2026-05-09 reviewed
    DDR5 single sub-channel matches cache lines but loses 40-60% bandwidth

    Single 32-bit Sub-Channel DDR5 DIMMs: Architecture, Performance Bounds, and Standardisation

    Chih-Hua Ke

  28. cs.AR 2026-05-09 reviewed
    Edge processor hits 109 TFLOPS/W on DeepSeek

    DSPE: An Energy-Efficient Edge Processor for DeepSeek Inference with MerkleTree-based Incremental Pruning, Multi-Stage Boothing Lookup and Dynamic Adaptive Posit Processing

    Yuhan Zhang (1) +36

  29. cs.AR 2026-05-09 reviewed
    Coprime test vectors localize faulty rows in systolic arrays after one pass

    FLARE: One-Shot PE-Level Fault Localization in Systolic Arrays via Algebraic Test Vectors

    Logashree Venkatasubramanian (1) +2

  30. cs.AR 2026-05-08 reviewed
    Static checker decides barrier sufficiency for accelerator races

    AccelSync: Verifying Synchronization Coverage in Accelerator Pipeline Programs

    Hangcheng An +2

  31. cs.AR 2026-05-08 reviewed
    Model runs 1024-core chip sims 115x faster at under 7% error

    Accelerating Precise End-to-End Simulation: Latency-Sensitive Many-core System Modeling

    Yinrong Li +7

  32. cs.ET 2026-05-08 reviewed
    Plasma simulations need three post-Moore tech tiers

    Post-Moore Technologies for Plasma Simulation: A Community Roadmap

    Luca Pennati +23

  33. cs.LG 2026-05-08 reviewed
    GNNs for EDA succeed when matched to each task's native algebra

    Graph Computation Meets Circuit Algebra: A Task-Aligned Analysis of Graph Neural Networks for Electronic Design Automation

    Hyunmog Kim

  34. cs.AR 2026-05-08 reviewed
    Bit-hardening methods surpass ECC for reliable DNNs with no memory cost

    Effective and Memory-Efficient Alternatives to ECC for Reliable Large-Scale DNNs

    Mohammad Hasan Ahmadilivani +5

  35. cs.AR 2026-05-08 reviewed
    TREA accelerator reduces edge detection latency up to 9x

    TREA: Low-precision Time-Multiplexed, Resource-Efficient Edge Accelerator for Object Detection and Classification

    Vijay Pratap Sharma +4

  36. cs.AR 2026-05-08 reviewed
    Reconfigurable FPU gives up to 8x throughput for low-precision dot products

    TransDot: An Area-efficient Reconfigurable Floating-Point Unit for Trans-Precision Dot-Product Accumulation for FPGA AI Engines

    Jiayi Wang +4

  37. cs.AR 2026-05-07 reviewed
    Open schema and datasets released for ML benchmarks in chip design

    EDA-Schema-V2: A Multimodal Schema, Open Datasets, and Benchmarks for Machine Learning in Digital Physical Design

    Pratik Shrestha +2

  38. cs.AR 2026-05-07 reviewed
    Agents reach just 20% success on multi-PPA in new benchmark

    Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

    Pengju Liu +4

  39. cs.AR 2026-05-07 reviewed
    Agents solve only 37% of practical chip design rule problems

    Bridging the Last Mile of Circuit Design: PostEDA-Bench, a Hierarchical Benchmark for PPA Convergence and DRC Fixing

    Pengju Liu +4

  40. cs.AR 2026-05-07 reviewed
    CORDIC iteration depth trims 33 percent of inference cycles

    CARMEN: CORDIC-Accelerated Resource-Efficient Multi-Precision Inference Engine for Deep Learning

    Sonu Kumar +3

  41. cs.AR 2026-05-07 reviewed
    Posit engine cuts ADAS power by 72 percent with near full accuracy

    EULER-ADAS: Energy-Efficient & SIMD-Unified Logarithmic-Posit Engine for Precision-Reconfigurable Approximate ADAS Acceleration

    Mukul Lokhande +4

  42. physics.chem-ph 2026-05-07 reviewed
    FPGA YOLOv3-Tiny system detects in 0.211 seconds

    Development of embedded target detection system based on FPGA and YOLOv3-Tiny

    Zihan Jiang +7

  43. cs.CV 2026-05-07 reviewed
    Self-supervised pretraining yields tiny wildfire spotters for satellites

    On-Orbit Real-Time Wildfire Detection Under On-Board Constraints

    Matthias R\"otzer +8

  44. cs.AR 2026-05-07 reviewed
    Pipeline speeds power-of-two DNNs on edge FPGAs by up to 3.6x

    PoTAcc: A Pipeline for End-to-End Acceleration of Power-of-Two Quantized DNNs

    Rappy Saha +4

  45. cs.AR 2026-05-07 reviewed
    FPGA MAC unifies mixed-precision ops for 1.2x LLM speedup

    XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA

    Feng Yu +4

  46. cs.AR 2026-05-07 reviewed
    Photonic solver beats digital annealers on dense spin-glasses

    A virtually connected probabilistic computer as a solver for higher-order, densely connected, or reconfigurable combinatorial optimisation problems

    Amy J. Searle +5

  47. cs.AR 2026-05-07 reviewed
    LLMs automate FPGA accelerator design space exploration

    LLM-Driven Design Space Exploration of FPGA-based Accelerators

    Vinamra Sharma +3

  48. cs.AR 2026-05-07 reviewed
    Hardware hub lets MoE send data before knowing GPU addresses

    MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems

    Zhuoshan Zhou +12

  49. cs.AR 2026-05-07 reviewed
    Heterogeneous HBM-PIM stack lifts LLM throughput 1.62x

    TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference

    Zhuoran Li +5

  50. cs.AR 2026-05-07 reviewed
    New in-switch method delivers 1.38x faster LLM tensor parallel training

    Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems

    Chen Zhang +12