PrISM uses a Sampled History Queue to correlate row samples across windows, solving the non-selection problem in probabilistic RowHammer mitigation and cutting slowdown from 10.7% to 1.5% at threshold 250 versus prior methods.
Title resolution pending
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
SCIN uses an in-switch accelerator for direct memory access and 8-bit in-network quantization during All-Reduce, delivering up to 8.7x faster small-message reduction and 1.74x TTFT speedup on LLaMA-2 models.
Qurts extends Rust with lifetime-parameterized types to provide a uniform framework for automatic quantum uncomputation by allowing temporary affine usage of quantum values.
TokenStack's heterogeneous HBM-PIM design with base-die control and topology-aware KV placement delivers 1.62x higher geometric-mean token throughput and 1.70x SLO-compliant serving capacity than AttAcc while cutting per-token energy by 30-47%.
WaveTune introduces a wave-aware bilinear latency predictor and wave-structured sparse sampling to enable fast runtime auto-tuning of GPU kernels, achieving up to 1.83x kernel speedup and 1.33x TTFT reduction with drastically lower overhead.
A compact QUBO encoding derived via ILP reduces logical variables by thousands in AES, MD5, SHA1 and SHA256, with over 8x reduction for AES-256.
Wattlytics is a public web platform that integrates benchmark-driven GPU performance scaling, DVFS-aware power modeling, and TCO analysis to support informed HPC cluster design and procurement decisions.
citing papers explorer
-
Loaded Dice: Solving the Non-Selection Problem for Scalable Probabilistic RowHammer Defense
PrISM uses a Sampled History Queue to correlate row samples across windows, solving the non-selection problem in probabilistic RowHammer mitigation and cutting slowdown from 10.7% to 1.5% at threshold 250 versus prior methods.
-
A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network
SCIN uses an in-switch accelerator for direct memory access and 8-bit in-network quantization during All-Reduce, delivering up to 8.7x faster small-message reduction and 1.74x TTFT speedup on LLaMA-2 models.
-
Qurts: Automatic Quantum Uncomputation by Affine Types with Lifetime
Qurts extends Rust with lifetime-parameterized types to provide a uniform framework for automatic quantum uncomputation by allowing temporary affine usage of quantum values.
-
TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
TokenStack's heterogeneous HBM-PIM design with base-die control and topology-aware KV placement delivers 1.62x higher geometric-mean token throughput and 1.70x SLO-compliant serving capacity than AttAcc while cutting per-token energy by 30-47%.
-
WaveTune: Wave-aware Bilinear Modeling for Efficient GPU Kernel Auto-tuning
WaveTune introduces a wave-aware bilinear latency predictor and wave-structured sparse sampling to enable fast runtime auto-tuning of GPU kernels, achieving up to 1.83x kernel speedup and 1.33x TTFT reduction with drastically lower overhead.
-
A compact QUBO encoding of computational logic formulae demonstrated on cryptography constructions
A compact QUBO encoding derived via ILP reduces logical variables by thousands in AES, MD5, SHA1 and SHA256, with over 8x reduction for AES-256.
-
Wattlytics: A Web Platform for Co-Optimizing Performance, Energy, and TCO in HPC Clusters
Wattlytics is a public web platform that integrates benchmark-driven GPU performance scaling, DVFS-aware power modeling, and TCO analysis to support informed HPC cluster design and procurement decisions.