Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Sun, Jeff Huang, Cody H Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez, et al · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

other 1

citation-polarity summary

unclear 1

representative citing papers

FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows

cs.DC · 2026-05-08 · unverdicted · novelty 7.0

FATE reduces normalized makespan and P95 latency in real LLM workflow DAGs to 0.675 and 0.677 by jointly preserving multiple future execution states, outperforming RoundRobin by 32.5% and the strongest baseline by 8.9%.

How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning

cs.LG · 2026-05-17 · conditional · novelty 6.0

Mu-GRPO enables substantially more off-policy GRPO training for LLMs via relaxed clipping and negative-advantage veto in large staged batches, matching standard GRPO performance at ~2x training speed.

MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.

SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

SceneGraphVLM generates dynamic scene graphs from video using compact VLMs, TOON serialization, and hallucination-aware RL to improve precision and achieve one-second latency.

RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation

cs.IR · 2026-05-08 · unverdicted · novelty 6.0

RRCM trains an LLM to dynamically retrieve from collaborative and meta memories using group relative policy optimization driven by final top-k recommendation quality.

Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

PBKV predicts agent invocations in dynamic LLM workflows to manage KV-cache reuse, delivering up to 1.85x speedup over LRU and 1.26x over KVFlow.

citing papers explorer

Showing 6 of 6 citing papers.

FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows cs.DC · 2026-05-08 · unverdicted · none · ref 2
FATE reduces normalized makespan and P95 latency in real LLM workflow DAGs to 0.675 and 0.677 by jointly preserving multiple future execution states, outperforming RoundRobin by 32.5% and the strongest baseline by 8.9%.
How Off-Policy Can GRPO Be? Mu-GRPO for Efficient LLM Reinforcement Learning cs.LG · 2026-05-17 · conditional · none · ref 37
Mu-GRPO enables substantially more off-policy GRPO training for LLMs via relaxed clipping and negative-advantage veto in large staged batches, matching standard GRPO performance at ~2x training speed.
MILM: Large Language Models for Multimodal Irregular Time Series with Informative Sampling cs.LG · 2026-05-13 · unverdicted · none · ref 71
MILM fine-tunes LLMs on XML-encoded multimodal irregular time series via a two-stage process that exploits informative sampling patterns to achieve top performance on EHR classification datasets.
SceneGraphVLM: Dynamic Scene Graph Generation from Video with Vision-Language Models cs.CV · 2026-05-13 · unverdicted · none · ref 49
SceneGraphVLM generates dynamic scene graphs from video using compact VLMs, TOON serialization, and hallucination-aware RL to improve precision and achieve one-second latency.
RRCM: Ranking-Driven Retrieval over Collaborative and Meta Memories for LLM Recommendation cs.IR · 2026-05-08 · unverdicted · none · ref 34
RRCM trains an LLM to dynamically retrieve from collaborative and meta memories using group relative policy optimization driven by final top-k recommendation quality.
Efficient Serving for Dynamic Agent Workflows with Prediction-based KV-Cache Management cs.LG · 2026-05-07 · unverdicted · none · ref 6
PBKV predicts agent invocations in dynamic LLM workflows to manage KV-cache reuse, delivering up to 1.85x speedup over LRU and 1.26x over KVFlow.

Sglang: Efficient execution of structured language model programs.Advances in neural information processing systems, 37: 62557–62583

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer