GreenCache dynamically manages LLM KV cache resources to reduce carbon emissions by 15.1% on average (up to 25.3%) while meeting latency constraints for over 90% of requests on real traces.
The sunk carbon fallacy: Rethinking carbon footprint metrics for effective carbon-aware scheduling,
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
method 1polarities
use method 1representative citing papers
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
GPU data centers participating in power grid frequency regulation via the EcoCenter framework can achieve exogenous carbon savings exceeding their operational emissions.
vMODB co-designs a Virtual Micro Service programming model with a system that unifies event and data management to enforce ACID properties in distributed asynchronous applications, outperforming eventual-consistency frameworks by up to 3x on two benchmarks.
citing papers explorer
-
Cache Your Prompt When It's Green: Carbon-Aware Caching for Large Language Model Serving
GreenCache dynamically manages LLM KV cache resources to reduce carbon emissions by 15.1% on average (up to 25.3%) while meeting latency constraints for over 90% of requests on real traces.
-
UCCL-Zip: Lossless Compression Supercharged GPU Communication
UCCL-Zip adds lossless compression to GPU communication to reduce LLM bottlenecks while preserving exact numerical correctness.
-
Coordinating GPU Data Centers and Power Grid Regulation Service for Exogenous Carbon Benefits
GPU data centers participating in power grid frequency regulation via the EcoCenter framework can achieve exogenous carbon savings exceeding their operational emissions.
-
vMODB: Unifying Event and Data Management for Distributed Asynchronous Applications
vMODB co-designs a Virtual Micro Service programming model with a system that unifies event and data management to enforce ACID properties in distributed asynchronous applications, outperforming eventual-consistency frameworks by up to 3x on two benchmarks.