MoE-Hub enables seamless MoE communication overlap via hardware-accelerated destination-agnostic data transmission, delivering 1.40x-3.08x per-layer and 1.21x-1.98x end-to-end speedups over prior systems.
Verifying semantic equivalence of large models with equality saturation
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 2polarities
background 2representative citing papers
TACO compresses tensor-parallel intermediate tensors with an adaptive FP8 scheme and fused kernels, yielding up to 1.87X throughput gains on GPT and Qwen models with near-lossless accuracy.
Spark Policy Toolkit supplies semantic contracts plus mapInPandas/mapInArrow inference and executor-side split search so policy learning remains correct and fast on Spark clusters up to tens of millions of rows.
citing papers explorer
-
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems
MoE-Hub enables seamless MoE communication overlap via hardware-accelerated destination-agnostic data transmission, delivering 1.40x-3.08x per-layer and 1.21x-1.98x end-to-end speedups over prior systems.
-
TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training
TACO compresses tensor-parallel intermediate tensors with an adaptive FP8 scheme and fused kernels, yielding up to 1.87X throughput gains on GPT and Qwen models with near-lossless accuracy.
-
Spark Policy Toolkit: Semantic Contracts and Scalable Execution for Policy Learning in Spark
Spark Policy Toolkit supplies semantic contracts plus mapInPandas/mapInArrow inference and executor-side split search so policy learning remains correct and fast on Spark clusters up to tens of millions of rows.