TAH-Quant introduces tile-wise adaptive Hadamard quantization for activations in pipeline parallelism, achieving 3-4 bit compression with up to 4.3x throughput speedup and O(1/sqrt(T)) convergence matching SGD.
Improving training time and gpu utilization in geo-distributed language model training
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
TAH-QUANT: Effective Activation Quantization in Pipeline Parallelism over Slow Network
TAH-Quant introduces tile-wise adaptive Hadamard quantization for activations in pipeline parallelism, achieving 3-4 bit compression with up to 4.3x throughput speedup and O(1/sqrt(T)) convergence matching SGD.