Title resolution pending

Yinmin Zhong, Shengyu Liu, Junda Chen, Jianbo Hu, Yibo Zhu, Xuanzhe Liu, Xin Jin, Hao Zhang · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.

Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start

cs.DC · 2026-04-08 · unverdicted · novelty 6.0

Foundry uses template-based CUDA graph context materialization to reduce LLM serving cold-start latency by up to 99% while preserving CUDA graph throughput gains.

citing papers explorer

Showing 2 of 2 citing papers.

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models cs.DC · 2026-04-08 · unverdicted · none · ref 56
InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.
Foundry: Template-Based CUDA Graph Context Materialization for Fast LLM Serving Cold Start cs.DC · 2026-04-08 · unverdicted · none · ref 58
Foundry uses template-based CUDA graph context materialization to reduce LLM serving cold-start latency by up to 99% while preserving CUDA graph throughput gains.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer