Beehive: Sub-second elasticity for web services with semi-faas execution

Ziming Zhao, Mingyu Wu, Jiawei Tang, Binyu Zang, Zhaoguo Wang, Haibo Chen · 2023 · arXiv 5693.357575

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving

cs.DC · 2025-05-29 · conditional · novelty 7.0

GreenCache dynamically manages LLM KV cache resources to reduce carbon emissions by 15.1% on average (up to 25.3%) while meeting latency constraints for over 90% of requests on real traces.

Spandana: Reconciling Strict SLOs with Low Cost under Fine-Grained Load Fluctuations

cs.DC · 2026-06-29 · unverdicted · novelty 6.0

Spandana decouples SLO enforcement from cost optimization via per-VM request steering between VMs and FaaS, reporting 76-86% utilization and 5-44% cost reduction versus baselines.

PALS: Power-Aware LLM Serving for Mixture-of-Experts Models

cs.AI · 2026-05-20 · unverdicted · novelty 6.0

PALS adds dynamic GPU power capping to LLM serving frameworks like vLLM, jointly tuning it with batch size via offline models and feedback control to improve energy efficiency up to 26.3% and cut QoS violations 4-7x on dense and MoE models.

Flare: Leveraging Serverless Elasticity to Absorb Microservice Load Spikes

cs.DC · 2026-05-22 · unverdicted · novelty 4.0

Flare proposes routing microservice spike load selectively to serverless while keeping steady load on VMs, with claimed minimal integration changes.

Sustainability-Constrained Workload Orchestration for Sovereign AI Infrastructure: A Joint Compute-Network Optimization Framework

cs.NI · 2026-04-07 · unverdicted · novelty 4.0

Introduces the Feasible Sovereign Operating Region (FSOR) as a construct for workloads sustainable under physical and regulatory limits, along with a joint compute-network optimization framework that enforces sustainability as hard constraints.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving cs.DC · 2025-05-29 · conditional · none · ref 1
GreenCache dynamically manages LLM KV cache resources to reduce carbon emissions by 15.1% on average (up to 25.3%) while meeting latency constraints for over 90% of requests on real traces.

Beehive: Sub-second elasticity for web services with semi-faas execution

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer