GPUOS delivers up to 15.3x speedup over standard PyTorch by running a single persistent kernel that receives tasks from a host queue and injects JIT-compiled operators at runtime via NVRTC and device function pointers.
Pytorch/xla eager mode (r2.4).https://docs.pytorch
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GPUOS: A GPU Operating System Primitive for Transparent Operation Fusion
GPUOS delivers up to 15.3x speedup over standard PyTorch by running a single persistent kernel that receives tasks from a host queue and injects JIT-compiled operators at runtime via NVRTC and device function pointers.