Canonical reference

BurstGPT: A real-world workload dataset to optimize LLM serving systems

· 2025 · arXiv 1896.373741

Canonical reference. 80% of citing Pith papers cite this work as background.

7 Pith papers citing it

Background 80% of classified citations

read on arXiv browse 7 citing papers

citation-role summary

background 4 dataset 1

citation-polarity summary

background 4 use dataset 1

representative citing papers

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters

cs.DC · 2026-05-01 · unverdicted · novelty 7.0

SAGA reduces AI agent task completion time by 1.64x on 64-GPU clusters by scheduling at the full workflow level with execution graphs, affinity batching, and completion-time fairness.

InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.

LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

LEAPBench shows trajectory scoring changes best-model rankings on 53% of tasks, LLMs do not beat Bayesian optimization, and domain-aware prompting underperforms domain-agnostic on biology tasks aligned with published literature.

The Energy Cost of Execution-Idle in GPU Clusters

cs.DC · 2026-04-06 · unverdicted · novelty 6.0

Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.

Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning

eess.SY · 2026-04-08 · unverdicted · novelty 5.0

High-resolution power profiles for AI workloads on H100 GPUs are measured and scaled to whole-facility energy demand using a bottom-up model, with the dataset made public.

A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN

cs.NI · 2026-03-30 · unverdicted · novelty 5.0 · 2 refs

Techno-economic framework shows that GPU AI-RAN deployments can offset extra costs via AI revenue for up to 8x ROI across scenarios with varying token depreciation, demand, and GPU densities.

Daily and Weekly Periodicity in Large Language Model Performance and Its Implications for Research

stat.AP · 2026-02-06 · unverdicted · novelty 5.0

GPT-4o exhibits daily and weekly periodic fluctuations in performance on a fixed physics task, accounting for about 20% of observed variance.

citing papers explorer

Showing 7 of 7 citing papers.

SAGA: Workflow-Atomic Scheduling for AI Agent Inference on GPU Clusters cs.DC · 2026-05-01 · unverdicted · none · ref 66
SAGA reduces AI agent task completion time by 1.64x on 64-GPU clusters by scheduling at the full workflow level with execution graphs, affinity batching, and completion-time fairness.
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models cs.DC · 2026-04-08 · unverdicted · none · ref 40
InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.
LEAP: Trajectory-Level Evaluation of LLMs in Iterative Scientific Design cs.LG · 2026-05-14 · unverdicted · none · ref 6
LEAPBench shows trajectory scoring changes best-model rankings on 53% of tasks, LLMs do not beat Bayesian optimization, and domain-aware prompting underperforms domain-agnostic on biology tasks aligned with published literature.
The Energy Cost of Execution-Idle in GPU Clusters cs.DC · 2026-04-06 · unverdicted · none · ref 56
Execution-idle accounts for 19.7% of GPU execution time and 10.7% of energy in a large cluster, motivating power management that treats it as a distinct operating state.
Measurement of Generative AI Workload Power Profiles for Whole-Facility Data Center Infrastructure Planning eess.SY · 2026-04-08 · unverdicted · none · ref 22
High-resolution power profiles for AI workloads on H100 GPUs are measured and scaled to whole-facility energy demand using a bottom-up model, with the dataset made public.
A Techno-Economic Framework for Cost Modeling and Revenue Opportunities in Open and Programmable AI-RAN cs.NI · 2026-03-30 · unverdicted · none · ref 3 · 2 links
Techno-economic framework shows that GPU AI-RAN deployments can offset extra costs via AI revenue for up to 8x ROI across scenarios with varying token depreciation, demand, and GPU densities.
Daily and Weekly Periodicity in Large Language Model Performance and Its Implications for Research stat.AP · 2026-02-06 · unverdicted · none · ref 17
GPT-4o exhibits daily and weekly periodic fluctuations in performance on a fixed physics task, accounting for about 20% of observed variance.

BurstGPT: A real-world workload dataset to optimize LLM serving systems

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer