Rogers, Evan Schneider, Jean-Luc Vay, and P

doi: 10 · 2023 · arXiv 1784.360708

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods

cs.DC · 2026-04-02 · unverdicted · novelty 7.0

Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.

Effective Model Pruning: Measure The Redundancy of Model Components

cs.LG · 2025-09-30 · unverdicted · novelty 7.0

EMP maps importance scores to effective sample size N_eff and prunes the lowest N - N_eff components, with a derived lower bound on retained effective mass and upper bound on loss increase.

XMAGNET -- Stir before serving: a Lagrangian perspective on mixing-driven condensation in the intracluster medium

astro-ph.GA · 2026-05-01 · unverdicted · novelty 6.0

Lagrangian tracers show mixing with low-entropy seeds drives most condensation in cluster cores; magnetic fields cause earlier divergence, higher vorticity, lower Mach numbers, and slower cold-cloud motion via tension.

PICO: Performance Insights for Collective Operations

cs.DC · 2025-08-22 · unverdicted · novelty 6.0

PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.

HieraSparse: Hierarchical Semi-Structured Sparse KV Attention

cs.DC · 2026-04-18 · unverdicted · novelty 5.0

HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and

Sustaining Exascale Performance: Lessons from HPL and HPL-MxP on Aurora

cs.DC · 2026-04-10 · unverdicted · novelty 4.0

Aurora reached 1.01 EF/s FP64 HPL and 11.64 EF/s HPL-MxP through locality-aware mapping, CPU-GPU pipelining, mixed-precision orchestration, and hybrid resilience on a large Intel GPU-based system.

citing papers explorer

Showing 6 of 6 citing papers.

Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods cs.DC · 2026-04-02 · unverdicted · none · ref 11
Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.
Effective Model Pruning: Measure The Redundancy of Model Components cs.LG · 2025-09-30 · unverdicted · none · ref 2
EMP maps importance scores to effective sample size N_eff and prunes the lowest N - N_eff components, with a derived lower bound on retained effective mass and upper bound on loss increase.
XMAGNET -- Stir before serving: a Lagrangian perspective on mixing-driven condensation in the intracluster medium astro-ph.GA · 2026-05-01 · unverdicted · none · ref 159
Lagrangian tracers show mixing with low-entropy seeds drives most condensation in cluster cores; magnetic fields cause earlier divergence, higher vorticity, lower Mach numbers, and slower cold-cloud motion via tension.
PICO: Performance Insights for Collective Operations cs.DC · 2025-08-22 · unverdicted · none · ref 4
PICO is a benchmarking framework for collective operations that decouples portable setup from platform execution, supplies reference MPI implementations, and shows default choices can be up to 5x slower with up to 44% end-to-end training time reductions in simulator replays.
HieraSparse: Hierarchical Semi-Structured Sparse KV Attention cs.DC · 2026-04-18 · unverdicted · none · ref 66
HieraSparse delivers a hierarchical semi-structured sparse KV attention system that achieves 1.2x KV compression and 4.57x decode attention speedup versus prior unstructured sparsity methods at equivalent sparsity, plus up to 1.85x prefill speedup and 1.37x/1.77x speedups with magnitude pruning and
Sustaining Exascale Performance: Lessons from HPL and HPL-MxP on Aurora cs.DC · 2026-04-10 · unverdicted · none · ref 12
Aurora reached 1.01 EF/s FP64 HPL and 11.64 EF/s HPL-MxP through locality-aware mapping, CPU-GPU pipelining, mixed-precision orchestration, and hybrid resilience on a large Intel GPU-based system.

Rogers, Evan Schneider, Jean-Luc Vay, and P

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer