arXiv preprint arXiv:1903.01611 , year=

· 1903 · arXiv 1903.01611

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

read on arXiv browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Testing Equality of Conditional Distributions via Generative Models

stat.ME · 2026-06-05 · unverdicted · novelty 7.0

A generative-model-based test for equality of conditional distributions that uses cross-generation, an RKHS-indexed supremum statistic, and multiplier bootstrap, with claimed double robustness to generator errors.

Toy Combinatorial Interpretability Models Reveal Lottery Tickets in Early Feature Space

cs.LG · 2026-05-18 · unverdicted · novelty 7.0

In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

cs.LG · 2022-05-27 · accept · novelty 7.0

FlashAttention reduces GPU high-bandwidth memory accesses in self-attention via tiling, delivering exact attention with lower IO complexity, 2-3x wall-clock speedups on models like GPT-2, and the ability to train on sequences up to 64K long.

STARFISH: faST Accuracy Recovery in pruned networks From Internal State Healing

cs.LG · 2026-05-31 · unverdicted · novelty 5.0

STARFISH recovers accuracy in pruned neural networks by optimizing internal state alignment to the original model with a minimal unlabeled calibration set, outperforming prior recovery methods especially at high pruning ratios.

DBLP: Phase-Aware Bounded-Loss Transport for Burst-Resilient Distributed ML Training

cs.LG · 2026-05-03 · unverdicted · novelty 5.0

DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts while maintaining comparable test accuracy.

Finding Sparse Subnetworks in One Training Cycle via Progressive Magnitude-Based Pruning

cs.CV · 2026-06-10 · unverdicted · novelty 4.0

Progressive magnitude-based pruning finds sparse subnetworks in one training cycle, reporting higher accuracy than LTH, SNIP, and GraSP at high sparsity on CIFAR-10 and MNIST.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Finding Sparse Subnetworks in One Training Cycle via Progressive Magnitude-Based Pruning cs.CV · 2026-06-10 · unverdicted · none · ref 8
Progressive magnitude-based pruning finds sparse subnetworks in one training cycle, reporting higher accuracy than LTH, SNIP, and GraSP at high sparsity on CIFAR-10 and MNIST.

arXiv preprint arXiv:1903.01611 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer