Vidur: A large-scale simulation framework for llm inference

· 2024 · arXiv 2405.05465

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference

cs.AR · 2025-04-14 · unverdicted · novelty 7.0

MIST is a new simulator for heterogeneous multi-stage LLM inference that combines hardware traces with analytical models to explore configuration trade-offs in hybrid CPU-accelerator systems.

Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation

cs.DC · 2026-05-08 · unverdicted · novelty 6.0 · 2 refs

Dooly reduces LLM inference profiling GPU-hours by 56.4% across 12 models while keeping simulation MAPE under 5% for TTFT and 8% for TPOT by making profiling configuration-agnostic and redundancy-aware.

PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction

cs.PF · 2026-01-21 · unverdicted · novelty 6.0

PipeWeave predicts GPU kernel performance with 6.1% average error and end-to-end inference with 8.5% error by feeding analytical pipeline features into ML, cutting prior method errors by 4-7x across 11 GPUs.

Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference

cs.DC · 2026-05-16 · unverdicted · novelty 5.0

Charon is a unified modular simulator that predicts LLM training and inference performance with under 5.35% error and identifies throughput improvements over baselines in a real deployment case.

citing papers explorer

Showing 4 of 4 citing papers.

MIST: A Co-Design Framework for Heterogeneous, Multi-Stage LLM Inference cs.AR · 2025-04-14 · unverdicted · none · ref 5
MIST is a new simulator for heterogeneous multi-stage LLM inference that combines hardware traces with analytical models to explore configuration trade-offs in hybrid CPU-accelerator systems.
Dooly: Configuration-Agnostic, Redundancy-Aware Profiling for LLM Inference Simulation cs.DC · 2026-05-08 · unverdicted · none · ref 3 · 2 links
Dooly reduces LLM inference profiling GPU-hours by 56.4% across 12 models while keeping simulation MAPE under 5% for TTFT and 8% for TPOT by making profiling configuration-agnostic and redundancy-aware.
PipeWeave: Synergizing Analytical and Learning Models for Unified GPU Performance Prediction cs.PF · 2026-01-21 · unverdicted · none · ref 1
PipeWeave predicts GPU kernel performance with 6.1% average error and end-to-end inference with 8.5% error by feeding analytical pipeline features into ML, cutting prior method errors by 4-7x across 11 GPUs.
Charon: A Unified and Fine-Grained Simulator for Large-Scale LLM Training and Inference cs.DC · 2026-05-16 · unverdicted · none · ref 1
Charon is a unified modular simulator that predicts LLM training and inference performance with under 5.35% error and identifies throughput improvements over baselines in a real deployment case.

Vidur: A large-scale simulation framework for llm inference

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer