KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.DC 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
KAIROS: Stateful, Context-Aware Power-Efficient Agentic Inference Serving
KAIROS reduces power by 27% on average (up to 39.8%) for agentic AI inference by using long-lived context to jointly manage GPU frequency, concurrency, and request routing across instances.