Title resolution pending

Shaoxun Zeng, Minhui Xie, Shiwei Gao, Youmin Chen, Youyou Lu · 2025 · arXiv 9940.370728

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

Gleaner: A Semantically-Rich and Efficient Online Sampler for Microservice Diagnostics

cs.SE · 2026-04-18 · unverdicted · novelty 7.0

Gleaner replaces slow graph-based trace analysis with bag-of-edges set operations plus log semantics and alarm-driven diversity to deliver faster, higher-fidelity sampling that improves RCA accuracy even at 1% rates.

UCCL-Zip: Lossless Compression Supercharged GPU Communication

cs.DC · 2026-04-19 · unverdicted · novelty 6.0

UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.

EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices

cs.OS · 2026-04-10 · unverdicted · novelty 6.0

EdgeFlow reduces mobile LLM cold-start latency up to 4.07x versus llama.cpp, MNN, and llm.npu by NPU-aware adaptive quantization, SIMD-friendly packing, and synergistic granular CPU-NPU pipelining at comparable accuracy.

Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start

cs.DC · 2026-04-08 · unverdicted · novelty 6.0

Foundry uses template-based CUDA graph context materialization to reduce LLM serving cold-start latency by up to 99% while preserving CUDA graph throughput gains.

FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters

cs.DC · 2025-10-13 · unverdicted · novelty 5.0

FlexPipe introduces runtime pipeline refactoring for LLMs to achieve higher resource efficiency and lower latency in serverless GPU clusters with fragmentation.

citing papers explorer

Showing 5 of 5 citing papers.

Gleaner: A Semantically-Rich and Efficient Online Sampler for Microservice Diagnostics cs.SE · 2026-04-18 · unverdicted · none · ref 17
Gleaner replaces slow graph-based trace analysis with bag-of-edges set operations plus log semantics and alarm-driven diversity to deliver faster, higher-fidelity sampling that improves RCA accuracy even at 1% rates.
UCCL-Zip: Lossless Compression Supercharged GPU Communication cs.DC · 2026-04-19 · unverdicted · none · ref 3
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
EdgeFlow: Fast Cold Starts for LLMs on Mobile Devices cs.OS · 2026-04-10 · unverdicted · none · ref 72
EdgeFlow reduces mobile LLM cold-start latency up to 4.07x versus llama.cpp, MNN, and llm.npu by NPU-aware adaptive quantization, SIMD-friendly packing, and synergistic granular CPU-NPU pipelining at comparable accuracy.
Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start cs.DC · 2026-04-08 · unverdicted · none · ref 55
Foundry uses template-based CUDA graph context materialization to reduce LLM serving cold-start latency by up to 99% while preserving CUDA graph throughput gains.
FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters cs.DC · 2025-10-13 · unverdicted · none · ref 58
FlexPipe introduces runtime pipeline refactoring for LLMs to achieve higher resource efficiency and lower latency in serverless GPU clusters with fragmentation.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer