archive
Every paper Pith has read. Search by title, abstract, or pith.
493 papers in cs.AR · page 5
-
Helpers from high-level features speed HLS verification up to 6x
AutoINV: Automated Invariant Generation Framework for Formal Verification on High-Level Synthesis Designs
-
LLM evolves router code to cut wirelength up to 8.72%
GR-Evolve: Design-Adaptive Global Routing via LLM-Driven Algorithm Evolution
-
Optical neural net design cuts energy-delay product by 64%
ROSA: Robust and Energy-Efficient Microring-Based Optical Neural Networks via Optical Shift-and-Add and Layer-Wise Hybrid Mapping
-
PyTorch SNNs run on FPGAs with exact software accuracy
Hardware-Software Co-Design for Event-Driven SNN Deployment on Low-Cost Neuromorphic FPGAs
-
SPAC reduces FPGA switch resources by 55% and latency by 38%
SPAC: Automating FPGA-based Network Switches with Protocol Adaptive Customization
-
Volatile memristors reach 95.89% MNIST accuracy in reservoir computing
On the Role of Preprocessing and Memristor Dynamics in Reservoir Computing for Image Classification
-
Restructured big-integer ops deliver 4x SIMD speedups in libraries
Leveraging SIMD for Accelerating Large-number Arithmetic
-
Online learning delays failure in radiation-exposed spiking nets
Shooting Neutrons at Neurons: Radiation Testing of a Spiking Neural Network on Flash-Based FPGAs
-
Tree-encoded fusion suppresses erasure errors in photonic MBQC
Suppressing the Erasure Error of Fusion Operation in Photonic Quantum Computing
-
Co-design accelerates multimodal foundation models
Focus Session: Hardware and Software Techniques for Accelerating Multimodal Foundation Models
-
HDC operations realized as coherent wave behaviors
A wave-geometric duality for hyperdimensional computing
-
Runtime dispatcher shares Versal AI Engine tiles among mixed-criticality tasks
Enabling Mixed criticality applications for the Versal AI-Engines
-
FPGA level-wise batch search speeds B+ tree lookups 4.9x
Efficient Batch Search Algorithm for B+ Tree Index Structures with Level-Wise Traversal on FPGAs
-
Calibration-free quantization compresses LLMs to 37% size while beating 4-bit methods
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
-
FPGAs cut carbon in low-volume changing workloads
Evaluating Computing Platforms for Sustainability: A Comparative Analysis of FPGAs against ASICs, GPUs, and CPUs
-
Victim-row counting boosts RowHammer tolerance in DRAM
PVAC: A RowHammer Mitigation Architecture Exploiting Per-victim-row Counting
-
Series SRAM cells reduce cache leakage power
A Novel Low-Power Cache Architecture Based on 6-Transistor SRAM Cells
-
LLM pipeline completes analog IC design from image to layout
AnalogMaster: Large Language Model-based Automated Analog IC Design Framework from Image to Layout
-
AI GPU power estimated in seconds with 8% error
EnergAIzer: Fast and Accurate GPU Power Estimation Framework for AI Workloads
-
Dropout brings uncertainty estimates to complex neural networks
Algorithm and Hardware Co-Design for Efficient Complex-Valued Uncertainty Estimation
-
Duon skips TLB shootdowns for hybrid memory page moves
Efficient Page Migration in Hybrid Memory Systems
-
Co-designed detection and cancellation cuts logical errors 2-11x
Co-Designing Error Mitigation and Error Detection for Logical Qubits
-
Multi-agent system generates correct Verilog at 97% success
ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration
-
Workload profile guides surface-code layout to save 21% data tiles
Toward designing workload-aware Surface Code Architectures
-
Parameterized design hits 11.89 GOP/s/W for LSTM on embedded FPGAs
Energy Efficient LSTM Accelerators for Embedded FPGAs through Parameterised Architecture Design
-
Metric shows when specialized engines beat FPGA logic for edge models
Design Rules for Extreme-Edge Scientific Computing on AI Engines
-
Joint chiplet and optical design speeds LLM training
ChipLight: Cross-Layer Optimization of Chiplet Design with Optical Interconnects for LLM Training
-
Ternary memristive junctions store assertions for direct hardware reasoning
Ternary Memristive Logic: Hardware for Reasoning Realized via Domain Algebra
-
Apple M3 uses about 6 times less energy than AMD Ryzen on key tasks
A Comparative Analysis of ARM and x86-64 Laptop-Class Processors: Architecture, Assembly-Level Performance, and Energy Efficiency
-
Surrogate models select better 3D-IC partitions with fewer evaluations
A PPA-Driven 3D-IC Partitioning Selection Framework with Surrogate Models
-
LLM agents find lower-cost chiplet designs than simulated annealing
CHICO-Agent: An LLM Agent for the Cross-layer Optimization of 2.5D and 3D Chiplet-based Systems
-
Branch predictors can be tuned to cut mispredictions in graph apps
Optimizing Branch Predictor for Graph Applications
-
AutoPPA learns circuit rules by comparing code variants
AutoPPA: Automated Circuit PPA Optimization via Contrastive Code-based Rule Library Learning
-
Equal inductors turn bridged-T network into high-pass filter
Scattering-Matrix-Based Parametric Characterization of a Two-Port Bridged-T Network for Microstrip Filter Applications
-
Contrastive pairs raise Verilog LLM compile and correctness rates
VerilogCL: A Contrastive Learning Framework for Robust LLM-Based Verilog Generation
-
In-memory quantization breaks PIM capacity wall for LLMs
AQPIM: Breaking the PIM Capacity Wall for LLMs with In-Memory Activation Quantization
-
Processes and pipes made lightweight for far memory accelerators
Proxics: an efficient programming model for far memory accelerators
-
Dataflow chip outperforms GPUs on autonomous driving AI
M100: An Orchestrated Dataflow Architecture Powering General AI Computing
-
ZKP kernels reformulated to run 10x faster on TPUs
Enabling AI ASICs for Zero Knowledge Proof
-
AccelCIM charts complete dataflow options for SRAM memory chips
AccelCIM: Systematic Dataflow Exploration for SRAM Compute-in-Memory Accelerator
-
Multi-tier KV cache cuts LLM inference costs by 47%
Predictive Multi-Tier Memory Management for KV Cache in Large-Scale GPU Inference
-
Offloading avatars privately scales VR to 2.37x more users
Privatar: Scalable Privacy-preserving Multi-user VR via Secure Offloading
-
ML automation targets RISC-V certification costs for cars
RISC-V Functional Safety for Autonomous Automotive Systems: An Analytical Framework and Research Roadmap for ML-Assisted Certification
-
Stochastic tree search repairs 96.8% of RTL bugs
Clover: A Neural-Symbolic Agentic Harness with Stochastic Tree-of-Thoughts for Verified RTL Repair
-
Bit flips in shared KV caches silently alter LLM outputs
Bit-Flip Vulnerability of Shared KV-Cache Blocks in LLM Serving Systems
-
Hyperparameter choices matter more than model choice for LLM RTL generation
Configuration Over Selection: Hyperparameter Sensitivity Exceeds Model Differences in Open-Source LLMs for RTL Generation
-
IR choice, not LLM, sets hardware design success rates
From Natural Language to Silicon: The Representation Bottleneck in LLM Hardware Design
-
Spike sparsity fails to lower latency or energy on Jetson GPU
When Spike Sparsity Does Not Translate to Deployed Cost: VS-WNO on Jetson Orin Nano
-
CPU-memory interface fixes close simulator-to-hardware gaps
Different Perspectives of Memory System Simulation
-
Multiplier-free square-root unit hits 7.63 mW and 4.6 ns on FPGA
E2AFS: Energy-Efficient Approximate Floating Point Square Rooter for Error Tolerant Computing