FlashSinkhorn delivers up to 32x forward and 161x end-to-end speedups for entropic OT on A100 GPUs via IO-aware Triton kernels that fuse log-domain updates and streaming transport application.
Stabilized sparse scaling algorithms for entropy regularized transport problems.SIAM Journal on Scientific Computing, 41(3):A1443–A1481, 2019
2 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
A decentralized Sinkhorn algorithm approximates Wasserstein barycenters using local gossip protocols, event-triggered transmissions, and b-bit quantization, with proven convergence to a neighborhood of the centralized entropic solution under mild assumptions.
citing papers explorer
-
FlashSinkhorn: IO-Aware Entropic Optimal Transport on GPU
FlashSinkhorn delivers up to 32x forward and 161x end-to-end speedups for entropic OT on A100 GPUs via IO-aware Triton kernels that fuse log-domain updates and streaming transport application.
-
Geometry-Aware Decentralized Sinkhorn for Wasserstein Barycenters
A decentralized Sinkhorn algorithm approximates Wasserstein barycenters using local gossip protocols, event-triggered transmissions, and b-bit quantization, with proven convergence to a neighborhood of the centralized entropic solution under mild assumptions.