IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.
Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.
Proxics introduces lightweight virtual processors and low-latency communication channels as portable OS abstractions for programming near-data processing accelerators, demonstrated on real hardware for memory-intensive workloads.
Profile-guided opcode labeling removes consistently independent loads from the MDP working set, cutting queries 79%, false dependencies 77%, and raising small-core IPC 1.47% on SPEC2017 intspeed.
QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.
EDGE extends Einsum notation with graph-specific operations to create a unified tensor-algebra framework for expressing and manipulating graph algorithms.
A two-level decoder scheduling framework reduces classical processing requirements for quantum error correction by 10-40% on fault-tolerant benchmarks by managing bursty workloads as shared resources.
citing papers explorer
-
Enhancing Instruction Prefetching via Cache and TLB Management
IP-CaT jointly optimizes TLB and cache management for L1I prefetching via a translation prefetch buffer and trimodal replacement policy, yielding 8.7% geomean speedup over EPI across 105 server workloads.
-
NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing
NasZip delivers up to 8.4x speedup over CPU baselines and 1.69x over prior NDP accelerators for ANNS by combining near-data processing with statistics-based PCA early exiting, dynamic-float encoding, and data-aware neighbor mapping.
-
Proxics: an efficient programming model for far memory accelerators
Proxics introduces lightweight virtual processors and low-latency communication channels as portable OS abstractions for programming near-data processing accelerators, demonstrated on real hardware for memory-intensive workloads.
-
PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores
Profile-guided opcode labeling removes consistently independent loads from the MDP working set, cutting queries 79%, false dependencies 77%, and raising small-core IPC 1.47% on SPEC2017 intspeed.
-
Learning-Optimized Qubit Mapping and Reuse to Minimize Inter-Core Communication in Modular Quantum Architectures
QARMA applies transformer-augmented reinforcement learning to qubit allocation and reuse in modular quantum systems, reporting up to 86% average reduction in inter-core communications versus optimized Qiskit baselines.
-
The EDGE Language: Extended General Einsums for Graph Algorithms
EDGE extends Einsum notation with graph-specific operations to create a unified tensor-algebra framework for expressing and manipulating graph algorithms.
-
Managing Classical Processing Requirements for Quantum Error Correction
A two-level decoder scheduling framework reduces classical processing requirements for quantum error correction by 10-40% on fault-tolerant benchmarks by managing bursty workloads as shared resources.