pith. machine review for the scientific record. sign in

arxiv: 2512.14098 · v3 · submitted 2025-12-16 · 💻 cs.LG · cs.DC

Recognition: unknown

Cornfigurator: Automated Planning for Any-to-Any Multimodal Model Serving

Authors on Pith no claims yet
classification 💻 cs.LG cs.DC
keywords cornfiguratorany-to-anymodelservingdeploymentmodelsmultimodalplans
0
0 comments X
read the original abstract

Any-to-Any models are an emerging class of multimodal models that accept combinations of text and multimodal data as input and generate them as output, introducing heterogeneous computation paths and component scaling characteristics. There are existing mechanisms for deploying Any-to-Any models--or special cases of them--for inference serving, but they either require manual effort and expertise to tune, or do not generalize to generic Any-to-Any models. We present Cornfigurator, the first deployment planner for generic Any-to-Any model inference serving. The goal of Cornfigurator is to maximize the overall goodput of serving the model, defined as the throughput of requests meeting their latency targets. To do so, based on model and workload characteristics, Cornfigurator explores the full spectrum of deployment strategies, from colocation to disaggregation and mixing different strategies. Cornfigurator performs coarse-to-fine statistical evaluation to efficiently navigate the large space of candidate plans. Plans generated by Cornfigurator either match or deliver 1.12$\times$-6.32$\times$ higher goodput compared to existing systems and expert-tuned deployment plans.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

    cs.DC 2026-04 unverdicted novelty 6.0

    Scepsy schedules arbitrary multi-LLM agentic workflows on GPU clusters by constructing Aggregate LLM Pipelines from stable per-LLM execution time shares, then searching fractional GPU allocations, tensor parallelism, ...

  2. GENSERVE: Efficient Co-Serving of Heterogeneous Diffusion Model Workloads

    cs.DC 2026-04 unverdicted novelty 6.0

    GENSERVE improves SLO attainment by up to 44% for co-serving heterogeneous T2I and T2V diffusion workloads via step-level preemption, elastic parallelism, and joint scheduling.

  3. Cornserve: A Distributed Serving System for Any-to-Any Multimodal Models

    cs.LG 2026-03 unverdicted novelty 6.0

    Cornserve introduces a task abstraction and record-and-replay runtime for Any-to-Any multimodal models, achieving up to 3.81x higher throughput and 5.79x lower tail latency through component disaggregation and direct ...

  4. Position: LLM Serving Needs Mathematical Optimization and Algorithmic Foundations, Not Just Heuristics

    cs.DC 2026-05 accept novelty 4.0

    LLM serving requires mathematical optimization and algorithms with provable guarantees rather than generic heuristics that fail unpredictably on LLM workloads.