Title resolution pending

· 2025 · arXiv 2512.22219

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 2

citation-polarity summary

background 1 contest 1

representative citing papers

Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs

cs.AR · 2026-04-15 · unverdicted · novelty 7.0

Fleet adds a Chiplet-task level to GPU task models, enabling per-chiplet scheduling and cooperative cache reuse in persistent megakernels, yielding 1.3-1.5x lower LLM decode latency and up to 37% less HBM traffic on AMD MI350 hardware.

MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems

cs.AR · 2026-05-07 · unverdicted · novelty 6.0

MoE-Hub enables seamless MoE communication overlap via hardware-accelerated destination-agnostic data transmission, delivering 1.40x-3.08x per-layer and 1.21x-1.98x end-to-end speedups over prior systems.

Eliminating Hidden Serialization in Multi-Node Megakernel Communication

cs.DC · 2026-05-01 · conditional · novelty 6.0

Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.

DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference

cs.DC · 2026-04-28 · unverdicted · novelty 6.0

DAK enables direct GPU access to remote memory for LLM inference via TMA repurposing and a greedy offloading algorithm, achieving up to 3x gains over prefetching baselines on NVLink-C2C and 1.8x on PCIe.

Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Ada-MK fuses LLM operators into persistent MegaKernels via MLIR DAG search and 3D shared-memory modeling, delivering up to 23.6% higher single-batch throughput than TensorRT-LLM on NVIDIA L20.

citing papers explorer

Showing 5 of 5 citing papers.

Fleet: Hierarchical Task-based Abstraction for Megakernels on Multi-Die GPUs cs.AR · 2026-04-15 · unverdicted · none · ref 3
Fleet adds a Chiplet-task level to GPU task models, enabling per-chiplet scheduling and cooperative cache reuse in persistent megakernels, yielding 1.3-1.5x lower LLM decode latency and up to 37% less HBM traffic on AMD MI350 hardware.
MoE-Hub: Taming Software Complexity for Seamless MoE Overlap with Hardware-Accelerated Communication on Multi-GPU Systems cs.AR · 2026-05-07 · unverdicted · none · ref 9
MoE-Hub enables seamless MoE communication overlap via hardware-accelerated destination-agnostic data transmission, delivering 1.40x-3.08x per-layer and 1.21x-1.98x end-to-end speedups over prior systems.
Eliminating Hidden Serialization in Multi-Node Megakernel Communication cs.DC · 2026-05-01 · conditional · none · ref 9
Perseus removes serialization bottlenecks in multi-node megakernel MoE communication via batched per-destination fences and hardware fence flags, delivering up to 10.3x speedup on proxy transports and matching or exceeding GPU-direct RDMA.
DAK: Direct-Access-Enabled GPU Memory Offloading with Optimal Efficiency for LLM Inference cs.DC · 2026-04-28 · unverdicted · none · ref 6
DAK enables direct GPU access to remote memory for LLM inference via TMA repurposing and a greedy offloading algorithm, achieving up to 3x gains over prefetching baselines on NVLink-C2C and 1.8x on PCIe.
Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference cs.CL · 2026-05-12 · unverdicted · none · ref 1
Ada-MK fuses LLM operators into persistent MegaKernels via MLIR DAG search and 3D shared-memory modeling, delivering up to 23.6% higher single-batch throughput than TensorRT-LLM on NVIDIA L20.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer