BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Dis- aggregated LLM Serving in AI Infrastructure

Yiyuan He, Minxian Xu, Jingfeng Wu, Jianmin Hu, Chong Ma, Min Shen, Le Chen, Chengzhong Xu, Lin Qu, Kejiang Ye · 2026 · DOI 10.1002/spe.70054

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics

cs.DC · 2026-05-31 · unverdicted · novelty 7.0

On a real multi-node H100 cluster the authors show that for MLA, routing the ~1 KB compressed query row is cheaper than moving cache chunks and supply a topology-aware cost model accurate to ~7% on IBGDA fabrics.

Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda

cs.DC · 2026-04-19 · unverdicted · novelty 2.0

This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Move the Query, Not the Cache: Characterizing Cross-Instance Latent Attention Redistribution Across GPU Fabrics cs.DC · 2026-05-31 · unverdicted · none · ref 13
On a real multi-node H100 cluster the authors show that for MLA, routing the ~1 KB compressed query row is cheaper than moving cache chunks and supply a topology-aware cost model accurate to ~7% on IBGDA fabrics.
Cloud-native and Distributed Systems for Efficient and Scalable Large Language Models -- A Research Agenda cs.DC · 2026-04-19 · unverdicted · none · ref 75
This research agenda argues that cloud-native architectures, microservices, autoscaling, and emerging trends like serverless inference and federated learning are required to make large language models efficient and scalable.

BanaServe: Unified KV Cache and Dynamic Module Migration for Balancing Dis- aggregated LLM Serving in AI Infrastructure

fields

years

verdicts

representative citing papers

citing papers explorer