Humble, Alexander McCaskey, Dmitry I

· 2023 · DOI 10.1109/mm

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

open at publisher browse 7 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

TuniQ: Autotuning Compilation Passes for Quantum Workloads at Scale for Effectiveness and Efficiency

quant-ph · 2026-05-12 · unverdicted · novelty 7.0

TuniQ uses RL with a dual-encoder, shaped rewards, and action masking to autotune quantum compilation passes, improving fidelity and speed over Qiskit while generalizing across backends and scaling to large circuits.

SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators

cs.AI · 2025-11-05 · unverdicted · novelty 7.0

SnapStream deploys sparse KV attention in a production inference system on dataflow accelerators, delivering 4x on-chip memory savings for DeepSeek-671B at 128k context with up to 1832 tokens/sec and minimal accuracy loss on LongBench-v2, AIME24, and LiveCodeBench.

ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration

cs.AR · 2026-05-21 · unverdicted · novelty 6.0

ACALSim is a new simulation framework with customizable threading, event-driven execution, and shared-memory model that reports over 14x speedup versus SST and enables simulation of large LLaMA models that SST cannot complete.

GPIR: Enabling Practical Private Information Retrieval with GPUs

cs.CR · 2026-04-06 · unverdicted · novelty 6.0

GPIR achieves up to 297 times higher throughput than prior GPU PIR systems by fusing operations in stages and using pipelined transposed layouts to cut DRAM traffic during batched lattice-based queries.

Managing Classical Processing Requirements for Quantum Error Correction

quant-ph · 2024-06-26 · unverdicted · novelty 5.0

A two-level decoder scheduling framework reduces classical processing requirements for quantum error correction by 10-40% on fault-tolerant benchmarks by managing bursty workloads as shared resources.

Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL

cs.DC · 2026-05-13 · unverdicted · novelty 4.0

Heterogeneous SYCL-based CG and Cholesky solvers deliver up to 32% and 29% faster runtimes than GPU-only versions for large matrices across multiple GPU vendors.

Beyond Silicon: Materials, Mechanisms, and Methods for Physical Neural Computing

cs.NE · 2026-04-10

citing papers explorer

Showing 7 of 7 citing papers.

TuniQ: Autotuning Compilation Passes for Quantum Workloads at Scale for Effectiveness and Efficiency quant-ph · 2026-05-12 · unverdicted · none · ref 29
TuniQ uses RL with a dual-encoder, shaped rewards, and action masking to autotune quantum compilation passes, improving fidelity and speed over Qiskit while generalizing across backends and scaling to large circuits.
SnapStream: Efficient Long Sequence Decoding on Dataflow Accelerators cs.AI · 2025-11-05 · unverdicted · none · ref 16
SnapStream deploys sparse KV attention in a production inference system on dataflow accelerators, delivering 4x on-chip memory savings for DeepSeek-671B at 128k context with up to 1832 tokens/sec and minimal accuracy loss on LongBench-v2, AIME24, and LiveCodeBench.
ACALSim: A Scalable Parallel Simulation Framework for High-Performance System Design Space Exploration cs.AR · 2026-05-21 · unverdicted · none · ref 8
ACALSim is a new simulation framework with customizable threading, event-driven execution, and shared-memory model that reports over 14x speedup versus SST and enables simulation of large LLaMA models that SST cannot complete.
GPIR: Enabling Practical Private Information Retrieval with GPUs cs.CR · 2026-04-06 · unverdicted · none · ref 19
GPIR achieves up to 297 times higher throughput than prior GPU PIR systems by fusing operations in stages and using pipelined transposed layouts to cut DRAM traffic during batched lattice-based queries.
Managing Classical Processing Requirements for Quantum Error Correction quant-ph · 2024-06-26 · unverdicted · none · ref 5
A two-level decoder scheduling framework reduces classical processing requirements for quantum error correction by 10-40% on fault-tolerant benchmarks by managing bursty workloads as shared resources.
Comparing the Performance of Heterogeneous Conjugate Gradient and Cholesky Solvers on Various Hardware Using SYCL cs.DC · 2026-05-13 · unverdicted · none · ref 7
Heterogeneous SYCL-based CG and Cholesky solvers deliver up to 32% and 29% faster runtimes than GPU-only versions for large matrices across multiple GPU vendors.
Beyond Silicon: Materials, Mechanisms, and Methods for Physical Neural Computing cs.NE · 2026-04-10 · unreviewed · ref 17

Humble, Alexander McCaskey, Dmitry I

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer