A generative-model-based test for equality of conditional distributions that uses cross-generation, an RKHS-indexed supremum statistic, and multiplier bootstrap, with claimed double robustness to generator errors.
arXiv preprint arXiv:1903.01611 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
In a combinatorial toy setting, winning lottery tickets preserve families of compatible feature locations in early feature space that balance proximity to final codes with low interference, rather than specific weight subnetworks.
FlashAttention reduces GPU high-bandwidth memory accesses in self-attention via tiling, delivering exact attention with lower IO complexity, 2-3x wall-clock speedups on models like GPT-2, and the ability to train on sequences up to 64K long.
STARFISH recovers accuracy in pruned neural networks by optimizing internal state alignment to the original model with a minimal unlabeled calibration set, outperforming prior recovery methods especially at high pruning ratios.
DBLP is a training-phase-aware bounded-loss transport protocol that reduces end-to-end distributed ML training time by 24.4% on average (up to 33.9%) and achieves up to 5.88x communication speedup during microbursts while maintaining comparable test accuracy.
Progressive magnitude-based pruning finds sparse subnetworks in one training cycle, reporting higher accuracy than LTH, SNIP, and GraSP at high sparsity on CIFAR-10 and MNIST.
citing papers explorer
-
Finding Sparse Subnetworks in One Training Cycle via Progressive Magnitude-Based Pruning
Progressive magnitude-based pruning finds sparse subnetworks in one training cycle, reporting higher accuracy than LTH, SNIP, and GraSP at high sparsity on CIFAR-10 and MNIST.