pith. sign in

archive

Every paper Pith has read. Search by title, abstract, or pith.

493 papers in cs.AR · page 10

  1. cs.AR 2025-07-01 reviewed
    Specialized LLMs raise HLS debugging success by 32 percent

    ChatHLS: Towards Systematic Design Automation and Optimization for High-Level Synthesis

    Runkai Li +7

  2. cs.AR 2025-06-19 reviewed
    Sparse NN linearizes RF amps on FPGA at 241 mW with -59 dBc ACPR

    SparseDPD: A Sparse Neural Network-based Digital Predistortion FPGA Accelerator for RF Power Amplifier Linearization

    Manno Versluis +2

  3. cond-mat.stat-mech 2025-06-19 reviewed
    Microcanonical annealing cuts random-number use in parallel spin-glass sims

    Microcanonical simulated annealing: Massively parallel Monte Carlo simulations with sporadic random-number generation

    M. Bernaschi +9

  4. cs.AR 2025-06-18 reviewed
    RISC-V calibration lifts CIM compute SNR by 25-45 percent

    Acore-CIM: build accurate and reliable mixed-signal CIM cores with RISC-V controlled self-calibration

    Omar Numan +8

  5. cs.AR 2025-06-13 reviewed
    System predicts lane changes 3-4 seconds ahead in real-world tests

    Real-World Deployment of a Lane Change Prediction Architecture Based on Knowledge Graph Embeddings and Bayesian Inference

    M. Manzour +4

  6. cs.AR 2025-06-03 reviewed
    MLA cuts bandwidth use in attention and stabilizes hardware performance

    Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention

    Robin Geens +1

  7. q-bio.GN 2025-05-31 reviewed
    PIM co-design cuts energy and time for genomics workloads

    Processing-in-memory for genomics workloads

    William Andrew Simon +14

  8. cs.DC 2025-05-29 reviewed
    GreenCache trims LLM carbon 15% by trading storage against compute

    Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving

    Yuyang Tian +3

  9. cs.AR 2025-05-22 reviewed
    60k code pairs train models for 88% accurate CUDA to HIP translation

    CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark

    Ahmed Heakl +7

  10. cs.AR 2025-05-19 reviewed
    Seamless switching boosts CPU LLM serving speed by 2x

    Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving

    Juntao Zhao +2

  11. quant-ph 2025-04-29 reviewed
    Co-optimized Iceberg gadgets raise QAOA success from 44% to 65%

    Iceberg Beyond the Tip: Co-Compilation of a Quantum Error Detection Code and a Quantum Algorithm

    Yuwei Jin +7

  12. cs.AR 2025-04-28 reviewed
    LLM automates UVM testbench creation for RTL designs

    From Concept to Practice: an Automated LLM-aided UVM Machine for RTL Verification

    Junhao Ye +10

  13. cs.AR 2025-04-24 reviewed
    Fusion-aware design speeds SSM accelerators 1.78x at fixed area

    Fine-Grained Fusion: The Missing Piece in Area-Efficient State Space Model Acceleration

    Robin Geens +2

  14. cs.AR 2025-04-14 reviewed
    Simulator explores LLM configs without 40K cloud costs

    MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference

    Abhimanyu Rajeshkumar Bambhaniya +10

  15. cs.ET 2025-04-08 reviewed
    Memristor arrays solve XOR-CNF SAT problems 10 times faster

    Accelerating Hybrid XOR$-$CNF Boolean Satisfiability Problems Natively with In-Memory Computing

    Haesol Im +16

  16. cs.AR 2025-03-27 reviewed
    71.2 μW accelerator runs real-time speech recognition

    A 71.2-$\mu$W Speech Recognition Accelerator with Recurrent Spiking Neural Network

    Chih-Chyau Yang +1

  17. cs.AR 2025-03-26 reviewed
    Edge criteria halve MACs for 8K super-resolution at 30 FPS

    ESSR: An 8K@30FPS Super-Resolution Accelerator With Edge Selective Network

    Chih-Chia Hsu +1

  18. cs.AR 2025-03-17 reviewed
    Benchmark shows 51 percent area cut for 3D chip designs

    Open3DBench: Open-Source Benchmark for 3D-IC Backend Implementation and PPA Evaluation

    Yunqi Shi +8

  19. math.OC 2025-03-12 reviewed
    Hardware co-design checks all feasible QAP moves in one step

    Hardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment Problem

    Haesol Im +12

  20. cs.OS 2025-03-05 reviewed
    90% of Linux radiation failures route through one eMMC path

    Where Linux Breaks Under Radiation: A Cross-Architecture Kernel-Level Characterization of Proton-Induced Failures in COTS SoCs

    Saad Memon +7

  21. cs.CV 2025-03-05 reviewed
    Quantization method raises 4-bit SAM mAP 15.2% on COCO

    AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization

    Wenlun Zhang +5

  22. cs.AR 2025-02-23 reviewed
    Taxonomy maps 25 years of FPGA neuromorphic architectures

    A Quarter of a Century of Neuromorphic Architectures on FPGAs -- an Overview

    Wiktor J. Szczerek +1

  23. cs.AR 2025-01-30 reviewed
    Posits shrink wearable hardware 38% and cut power 42%

    Increasing the Energy-Efficiency of Wearables Using Low-Precision Posit Arithmetic with PHEE

    David Mallas\'en +4

  24. cs.DC 2025-01-27 reviewed
    Framework enables any-cycle preemption for FPGA tasks in clouds

    EPOCH: Enabling Preemption Operation for Context Saving in Heterogeneous FPGA Systems

    Arsalan Ali Malik +2

  25. cs.AR 2025-01-23 reviewed
    Taylor softmax cuts FPGA resources 14% at 0.2% accuracy cost

    A Quantitative Evaluation of Approximate Softmax Functions for Deep Neural Networks

    Anthony Leiva-Valverde +4

  26. cs.AR 2025-01-15 reviewed
    Octopus sparse links save 3-5.4% server costs in CXL pods

    Octopus: Enhancing CXL Memory Pods via Sparse Topology

    Yuhong Zhong +6

  27. cs.CR 2025-01-13 reviewed
    Compiler aligns HE workloads with TPU matrix engines

    Leveraging ASIC AI Chips for Homomorphic Encryption

    Jianming Tong +11

  28. cs.LG 2025-01-07 reviewed
    Hybrid federated method boosts hotspot detection accuracy

    Federated Knowledge Distillation for Multi-Model Architectures Lithography Hotspot Detection

    Yuqi Li +8

  29. cs.LG 2024-11-10 reviewed
    Filter turns AI-generated PCIe traces into usable simulation data

    The Phantom of PCIe: Constraining Generative Artificial Intelligences for Practical Peripherals Trace Synthesizing

    Zhibai Huang +11

  30. cs.LG 2024-10-19 reviewed
    Async pipeline training on analog hardware matches digital SGD rate

    On the Convergence Theory of Pipeline Gradient-based Analog In-memory Training

    Zhaoxian Wu +5

  31. cs.CV 2024-09-25 reviewed
    SSD MobileNet V1 minimizes latency and energy but not accuracy on edge devices

    A Comprehensive Evaluation of Deep Learning Object Detection Models on Heterogeneous Edge Devices

    Daghash K. Alqahtani +3

  32. cs.AR 2024-06-28 reviewed
    FPGA idle-waiting extends DL accelerator life 12x vs powering off

    Idle is the New Sleep: Configuration-Aware Alternative to Powering Off FPGA-Based DL Accelerators During Inactivity

    Chao Qian +3

  33. quant-ph 2024-06-26 reviewed
    Two-level scheduler cuts quantum decoder hardware by 10-40%

    Managing Classical Processing Requirements for Quantum Error Correction

    Satvik Maurya +3

  34. cs.ET 2024-06-20 reviewed
    Weight shuffling restores 83.5% accuracy in resistive crossbar DNNs

    WAGONN: Weight Bit Agglomeration in Crossbar Arrays for Reduced Impact of Interconnect Resistance on DNN Inference Accuracy

    Jeffry Victor +4

  35. cs.AR 2024-05-21 reviewed
    Accelerator switches dataflows per layer at 6% extra area

    FEATHER: A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching

    Jianming Tong +3

  36. cs.AR 2024-05-06 reviewed
    SparrowSNN cuts ECG energy by 20-100x at full accuracy

    SparrowSNN: A Hardware/software Co-design for Energy Efficient ECG Classification

    Zhanglu Yan +3

  37. cs.AR 2023-01-07 reviewed
    Cache-coherent eFPGAs cut processor-accelerator latency by 82%

    Duet: Creating Harmony between Processors and Embedded FPGAs

    Ang Li +2

  38. quant-ph 2022-11-14 reviewed
    Hundreds of thousands of qubits needed for practical quantum advantage

    Assessing requirements to scale to practical quantum advantage

    Michael E. Beverland +9

  39. cs.AR 2019-07-24 reviewed
    QDI adder comparison in 32nm CMOS identifies low-power options

    Performance Comparison of Quasi-Delay-Insensitive Asynchronous Adders

    P Balasubramanian

  40. cs.AR 2019-07-22 reviewed
    Memristor-CMOS multiplier reconfigures for multiple bit widths

    Reconfigurable multiplier architecture based on memristor-cmos with higher flexibility

    Seungbum Baek

  41. cs.AR 2019-07-19 reviewed
    PPAC runs neural nets and crypto inside memory arrays

    PPAC: A Versatile In-Memory Accelerator for Matrix-Vector-Product-Like Operations

    Oscar Casta\~neda +3

  42. cs.AR 2019-07-17 reviewed
    RL scheduler adapts multicore memory access for 20% CPI gain

    CADS: Core-Aware Dynamic Scheduler for Multicore Memory Controllers

    Eduardo Olmedo Sanchez +1

  43. cs.AR 2019-07-16 reviewed
    History yields conditions for coprocessor long-term success

    Coprocessors: failures and successes

    Daniel Etiemble

  44. cs.AR 2019-07-10 reviewed
    RM-CAM plus TMR repairs NRAM defects with fewer resources at high error rates

    A Range Matching CAM for Hierarchical Defect Tolerance Technique in NRAM Structures

    Hossein Pourmeidani +1

  45. cs.AR 2019-07-04 reviewed
    RTL FPGA accelerator matches Caffe-CPU for CNN inference

    FusionAccel: A General Re-configurable Deep Learning Inference Accelerator on FPGA for Convolutional Neural Networks

    Shi Shi

  46. cs.AR 2019-07-04 reviewed
    TicToc speeds hybrid memory 10% using 34KB SRAM

    TicToc: Enabling Bandwidth-Efficient DRAM Caching for both Hits and Misses in Hybrid Memory Systems

    Vinson Young +2

  47. cs.AR 2019-07-04 reviewed
    One line per region tracks reuse to speed DRAM caches 18%

    To Update or Not To Update?: Bandwidth-Efficient Intelligent Replacement Policies for DRAM Caches

    Vinson Young +1

  48. cs.OS 2019-06-29 reviewed
    Hardware scheduler delivers 12x speedup on accelerator systems

    HTS: A Hardware Task Scheduler for Heterogeneous Systems

    Kartik Hegde +2

  49. eess.SP 2019-06-28 reviewed
    FPGA speeds Tucker decomposition up to 30x on heart MRI

    Tucker Tensor Decomposition on FPGA

    Kaiqi Zhang +2

  50. cs.AR 2019-06-27 reviewed
    Bit-partitioned dot products share A/D converters via charge accumulation

    Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic

    Soroush Ghodrati +7