Canonical reference

BurstGPT: A real-world workload dataset to optimize LLM serving systems

Yuxin Wang, Yuhan Chen, Zeyu Li, Xueze Kang, Yuchu Fang, Yeju Zhou, Yang Zheng, Zhenheng Tang, Xin He, Rui Guo, Xin Wang, Qiang Wang, Amelie Chi Zhou, Xiaowen Chu · 2025 · arXiv 1896.373741

Canonical reference. 80% of citing Pith papers cite this work as background.

10 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 10 citing papers

citation-role summary

background 4 dataset 1

citation-polarity summary

background 4 use dataset 1

representative citing papers

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

cs.DC · 2026-05-01 · unverdicted · novelty 7.0 · 2 refs

SAGA introduces workflow-atomic scheduling for compound AI agents, achieving 1.64x lower task completion time and 1.22x better memory utilization than vLLM on a 64-GPU cluster at the cost of 30% lower peak throughput.

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.

DiLaServe: High SLO Attainment Serving for Diffusion Language Models

cs.LG · 2026-06-27 · unverdicted · novelty 6.0

DiLaServe improves SLO attainment for diffusion language models by up to 56.6 percentage points and reduces latency by up to 46% with less than 1% accuracy drop via deadline-aware scheduling and dynamic reconfiguration.

LiveServe: Interaction-Aware Serving for Real-Time Omni-Modal LLMs

cs.DC · 2026-06-22 · unverdicted · novelty 6.0

LiveServe exposes audio playback and barge-in signals to the scheduler and KV manager, lowering P90 audio TTFP by 1.55x on average and raising completed-request throughput by 1.15x on two Omni-LMs.

IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents

cs.CL · 2026-05-27 · unverdicted · novelty 6.0

IPO-Mine releases a toolkit and large multimodal dataset for structured analysis of IPO filings and shows state-of-the-art models diverge from human judgments on chart quality and misleadingness.

LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

LEAPBench shows trajectory scoring changes best-model rankings on 53% of tasks, LLMs do not beat Bayesian optimization, and domain-aware prompting underperforms domain-agnostic on biology tasks aligned with published literature.

The Energy Cost of Execution-Idle in GPU Clusters

cs.DC · 2026-04-06 · unverdicted · novelty 6.0

Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.

Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning

eess.SY · 2026-04-08 · unverdicted · novelty 5.0

High-resolution power profiles for AI workloads on H100 GPUs are measured and scaled to whole-facility energy demand using a bottom-up model, with the dataset made public.

A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN

cs.NI · 2026-03-30 · unverdicted · novelty 5.0 · 2 refs

Techno-economic framework shows that GPU AI-RAN deployments can offset extra costs via AI revenue for up to 8x ROI across scenarios with varying token depreciation, demand, and GPU densities.

Daily and Weekly Periodicity in Large Language Model Performance and Its Implications for Research

stat.AP · 2026-02-06 · unverdicted · novelty 5.0

GPT-4o exhibits daily and weekly periodic fluctuations in performance on a fixed physics task, accounting for about 20% of observed variance.

citing papers explorer

Showing 10 of 10 citing papers.

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters cs.DC · 2026-05-01 · unverdicted · none · ref 66 · 2 links
SAGA introduces workflow-atomic scheduling for compound AI agents, achieving 1.64x lower task completion time and 1.22x better memory utilization than vLLM on a 64-GPU cluster at the cost of 30% lower peak throughput.
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models cs.DC · 2026-04-08 · unverdicted · none · ref 40
InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.
DiLaServe: High SLO Attainment Serving for Diffusion Language Models cs.LG · 2026-06-27 · unverdicted · none · ref 53
DiLaServe improves SLO attainment for diffusion language models by up to 56.6 percentage points and reduces latency by up to 46% with less than 1% accuracy drop via deadline-aware scheduling and dynamic reconfiguration.
LiveServe: Interaction-Aware Serving for Real-Time Omni-Modal LLMs cs.DC · 2026-06-22 · unverdicted · none · ref 52
LiveServe exposes audio playback and barge-in signals to the scheduler and KV manager, lowering P90 audio TTFP by 1.55x on average and raising completed-request throughput by 1.15x on two Omni-LMs.
IPO-Mine: A Toolkit and Dataset for Section-Structured Analysis of Long, Multimodal IPO Documents cs.CL · 2026-05-27 · unverdicted · none · ref 19
IPO-Mine releases a toolkit and large multimodal dataset for structured analysis of IPO filings and shows state-of-the-art models diverge from human judgments on chart quality and misleadingness.
LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design cs.LG · 2026-05-14 · unverdicted · none · ref 6
LEAPBench shows trajectory scoring changes best-model rankings on 53% of tasks, LLMs do not beat Bayesian optimization, and domain-aware prompting underperforms domain-agnostic on biology tasks aligned with published literature.
The Energy Cost of Execution-Idle in GPU Clusters cs.DC · 2026-04-06 · unverdicted · none · ref 56
Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.
Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning eess.SY · 2026-04-08 · unverdicted · none · ref 22
High-resolution power profiles for AI workloads on H100 GPUs are measured and scaled to whole-facility energy demand using a bottom-up model, with the dataset made public.
A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN cs.NI · 2026-03-30 · unverdicted · none · ref 3 · 2 links
Techno-economic framework shows that GPU AI-RAN deployments can offset extra costs via AI revenue for up to 8x ROI across scenarios with varying token depreciation, demand, and GPU densities.
Daily and Weekly Periodicity in Large Language Model Performance and Its Implications for Research stat.AP · 2026-02-06 · unverdicted · none · ref 17
GPT-4o exhibits daily and weekly periodic fluctuations in performance on a fixed physics task, accounting for about 20% of observed variance.

BurstGPT: A real-world workload dataset to optimize LLM serving systems

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer