pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 8

  1. cs.AR 2026-04-04 reviewed
    Einsum fusion cuts Mamba traffic for 4.9x prefill speedup

    Mambalaya: Einsum-Based Fusion Optimizations on State-Space Models

    Toluwanimi O. Odemuyiwa +3

  2. cs.AR 2026-04-03 reviewed
    Matrix encoding speeds attention dataflow optimization by 64-343x

    Fast Cross-Operator Optimization of Attention Dataflow

    Haodong Chang +7

  3. cs.NE 2026-04-03 reviewed
    FPGA SNN accelerator scales inference near-linearly with sparsity

    YANA: Bridging the Neuromorphic Simulation-to-Hardware Gap

    Brian Pachideh +7

  4. cs.AR 2026-04-03 reviewed
    Error-driven training puts 32B model at top of industrial code benchmarks

    InCoder-32B-Thinking: Industrial Code World Model for Thinking

    Jian Yang +24

  5. cs.AR 2026-04-03 reviewed
    Graph coloring speeds SPICE up to 45x on 64 cores

    EEspice: A Modular Circuit Simulation Platform with Parallel Device Model Evaluation via Graph Coloring

    Xuanhao Bao +1

  6. cs.AR 2026-04-03 reviewed
    Multi-agent LLMs generate hardware assertions at 96% functional accuracy

    ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs

    Lik Tung Fu +8

  7. cs.LG 2026-04-03 reviewed
    SRAM reads attention scores from quantized KV indices without dequantizing

    AXELRAM: Quantize Once, Never Dequantize

    Yasushi Nishida

  8. cs.LG 2026-04-02 reviewed
    Shared memory speeds NF4 dequantization 2x

    Fast NF4 Dequantization Kernels for Large Language Model Inference

    Xiangbo Qi +2

  9. cs.DC 2026-04-02 reviewed
    Cold TLB misses slow small GPU collectives up to 1.4x

    Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods

    Amel Fatima +2

  10. cs.AR 2026-04-02 reviewed
    TensorBoard plugin surfaces hidden fairness gaps during training

    InsightBoard: An Interactive Multi-Metric Visualization and Fairness Analysis Plugin for TensorBoard

    Ray Zeyao Chen +1

  11. cs.AR 2026-04-02 reviewed
    3DGS blending reformulated for Tensor Cores yields 1.42x speedup

    GEMM-GS: Accelerating 3D Gaussian Splatting on Tensor Cores with GEMM-Compatible Blending

    Haomin Li +6

  12. cs.AR 2026-03-31 reviewed
    Automated engines can design computer chips faster than human teams

    Computer Architecture's AlphaZero Moment: Automated Discovery in an Encircled World

    Karthikeyan Sankaralingam

  13. cs.AR 2026-03-31 reviewed
    Fixed Edge AI loses reliability or breaks budgets as conditions change

    Position Paper: From Edge AI to Adaptive Edge AI

    Fabrizio Pittorino +1

  14. cs.LG 2026-03-30 reviewed
    Circuit generator hits 99.9% validity with 8 simulations

    ARCS: Autoregressive Circuit Synthesis with Topology-Aware Graph Attention and Spec Conditioning

    Tushar Dhananjay Pathak

  15. cs.AR 2026-03-30 reviewed
    Switch-centric network speeds All-Reduce up to 8.7x in LLM inference

    A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network

    Aojie Jiang +6

  16. cs.AR 2026-03-28 reviewed
    Lossless compressor speeds Ascend NPU inference up to 6.3 times

    ENEC: A Lossless AI Model Compression Method Enabling Fast Inference on Ascend NPUs

    Jinwu Yang +19

  17. cs.AR 2026-03-27 reviewed
    NoC with direct core access speeds ML collectives 5.3x

    A Lightweight High-Throughput Collective-Capable NoC for Large-Scale ML Accelerators

    Luca Colagrande +5

  18. cs.AR 2026-03-27 reviewed
    Local ChatOps tool hits 0.90 precision on single-hop questions

    RAGnaroX: A Secure, Local-Hosted ChatOps Assistant Using Small Language Models

    Benedikt Dornauer +1

  19. cs.AR 2026-03-26 reviewed
    Simulator verifies accelerator firmware 50x faster than FPGA

    FireBridge: Cycle-Accurate Hardware + Firmware Co-Verification for Modern Accelerators

    G Abarajithan +3

  20. cs.AR 2026-03-26 reviewed
    Review creates unified thermal model for 3D chip stacks

    A Review of Multiscale Thermal Modeling in Heterogeneous 3D ICs

    Baibhari Priya Barua +2

  21. cs.AI 2026-03-26 reviewed
    Ten general agents deliver 8× average HLS speedup

    Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?

    Abhishek Bhandwaldar +3

  22. eess.SP 2026-03-25 reviewed
    Exact formulas predict spurs from ADC mismatches

    Spectral Impact of Mismatches in Interleaved ADCs

    J\'er\'emy Guichemerre +3

  23. quant-ph 2026-03-23 reviewed
    FPGA accelerator decodes quantum errors in under 1 microsecond

    Low Latency GNN Accelerator for Quantum Error Correction

    Alessio Cicero +4

  24. cs.CY 2026-03-21 reviewed
    AI data centers raise local land temperatures by 2°C

    The data heat island effect: quantifying the impact of AI data centers in a warming world

    Andrea Marinoni +8

  25. cs.DC 2026-03-21 reviewed
    Updated Amdahl sets specialization threshold at 1-1/R

    Modernizing Amdahl's Law: How AI Scaling Laws Shape Computer Architecture

    Chien-Ping Lu

  26. cs.AR 2026-03-20 reviewed
    COmPOSER automates mm-wave designs 100-300x faster

    COmPOSER: Circuit Optimization of mm-wave/RF circuits with Performance-Oriented Synthesis for Efficient Realizations

    Subhadip Ghosh +6

  27. cs.CR 2026-03-20 reviewed
    CPU replays exact NVIDIA GPU matrix multiplies without precision loss

    Hawkeye: Reproducing GPU-Level Non-Determinism

    Erez Badash +3

  28. cs.CR 2026-03-19 reviewed
    ML-KEM key exchange runs in 35.7 ms on M0+

    Benchmarking NIST-Standardised ML-KEM and ML-DSA on ARM Cortex-M0+: Performance, Memory, and Energy on the RP2040

    Rojin Chhetri

  29. cs.PL 2026-03-18 reviewed
    Hyperedges unify geometric algebra with compiler graphs

    The Program Hypergraph: Multi-Way Relational Structure for Geometric Algebra, Spatial Compute, and Physics-Aware Compilation

    Houston Haynes

  30. cs.NE 2026-03-18 reviewed
    Local hardware updates replace backpropagation for neural nets

    A Synthesizable RTL Implementation of Predictive Coding Networks

    Timothy Oh

  31. cs.PL 2026-03-17 reviewed
    Verilog vectorizer cuts Jasper elaboration time 28% and memory 51%

    Vectorization of Verilog Designs and its Effects on Verification and Synthesis

    Maria Fernanda Oliveira Guimar\~aes +6

  32. cs.AR 2026-03-11 reviewed
    LLM RTL generation splits into three quality regimes under synthesis

    Synthesis-in-the-Loop Evaluation of LLMs for RTL Generation: Quality, Reliability, and Failure Modes

    Weimin Fu +7

  33. cs.AR 2026-03-10 reviewed
    Graph unifies netlist and layout to predict chip congestion early

    VeriHGN: Heterogeneous Graph-Based Congestion Prediction for Chip Layout Verification

    Runbang Hu +3

  34. cs.LG 2026-03-10 reviewed
    MSB proxy skips 88% of CNN multiplications with zero accuracy loss

    Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs

    Vishal Shashidhar +2

  35. cs.AR 2026-03-06 reviewed
    Reasoning tree raises SVA functional correctness by 31 percent

    FVRuleLearner: Operator-Level Reasoning Tree (OP-Tree)-Based Rules Learning for Formal Verification

    Lily Jiaxin Wan +5

  36. cs.AR 2026-03-03 reviewed
    Method localizes 51% of bugs at top rank in sequential hardware

    Pecker: Bug Localization Framework for Sequential Designs via Causal Chain Reconstruction

    Jiaping Tang +5

  37. physics.optics 2026-02-27 reviewed
    One RTD creates THz radar sensing 5-micrometer moves

    Micrometer-scale displacement and thickness sensing using a single terahertz resonant-tunneling diode

    Li Yi +7

  38. cs.CR 2026-02-26 reviewed
    TEE architecture secures continuous attestation against platform control

    A TEE-Based Architecture for Confidential and Dependable Process Attestation in Authorship Verification

    David Condrey

  39. cs.AR 2026-02-24 reviewed
    Softcore loads custom instructions from memory with no frequency overhead

    LUTstructions: Self-loading FPGA-based Reconfigurable Instructions

    Philippos Papaphilippou

  40. cs.AR 2026-02-24 reviewed
    SAM2 extracts accurate SEM contours from only 60 images

    SegSEM: Enabling and Enhancing SAM2 for SEM Contour Extraction

    Da Chen +7

  41. cs.AR 2026-02-17 reviewed
    Hybrid memory design runs full kernels for 59x AES and 40x LLM speedups

    DARTH-PUM: A Hybrid Processing-Using-Memory Architecture

    Ryan Wong +2

  42. cs.AR 2026-02-16 reviewed
    Optimal accelerator mappings found in 17 seconds

    The Turbo-Charged Mapper: Fast and Optimal Mapping for Energy-efficient and Low-latency Accelerator Design

    Michael Gilbert +3

  43. cs.AR 2026-02-16 reviewed
    FFM finds optimal fused accelerator mappings over 10,000x faster

    Fast and Fusiest: An Optimal Fusion-Aware Mapper for Accelerator Design

    Tanner Andrulis +3

  44. cs.AR 2026-02-15 reviewed
    Near-memory GPU cuts energy use 6-13x while speeding AI tasks 6-16x

    ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

    Siddhartha Raman Sundara Raman +1

  45. cs.CY 2026-02-14 reviewed
    Offline LLM runs tutoring on legacy hardware without net

    Offline-First LLM Architecture for Adaptive Learning in Low-Connectivity Environments

    Joseph Walusimbi +3

  46. cs.AR 2026-02-10 reviewed
    Bipartite graphs and grammar rules generate valid analog topologies automatically

    AnalogToBi: Device-Level Analog Circuit Topology Generation via Bipartite Graph and Grammar Guided Decoding

    Seungmin Kim +3

  47. cs.AR 2026-02-05 reviewed
    D-Legion architecture reaches 135 TOPS for quantized LLM matrix math

    D-Legion: A Scalable Many-Core Architecture for Accelerating Matrix Multiplication in Quantized LLMs

    Ahmed J. Abdelmaksoud +3

  48. cs.AR 2026-02-05 reviewed
    On-the-fly predictor boosts FP8 CIM efficiency 2.8x

    Balancing FP8 Computation Accuracy and Efficiency on Digital CIM via Shift-Aware On-the-fly Aligned-Mantissa Bitwidth Prediction

    Liang Zhao +6

  49. cs.AR 2026-02-04 reviewed
    Verilog models show shared and model-specific prompt responses

    VeriInteresting: An Empirical Study of Model Prompt Interactions in Verilog Code Generation

    Luca Collini +4

  50. cs.AR 2026-02-02 reviewed
    KANs reach sub-microsecond online learning on FPGAs via spline locality

    Ultrafast On-chip Online Learning via Spline Locality in Kolmogorov-Arnold Networks

    Duc Hoang +2