TokenStack's heterogeneous HBM-PIM design with base-die control and topology-aware KV placement delivers 1.62x higher geometric-mean token throughput and 1.70x SLO-compliant serving capacity than AttAcc while cutting per-token energy by 30-47%.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3roles
background 3polarities
background 3representative citing papers
A new taxonomy for dynamics-aware microservice management, synthesized from 84 systems, finds that production dynamics are often only partially modeled and that reported performance gains depend on evaluation realism.
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.
citing papers explorer
-
TokenStack: A Heterogeneous HBM-PIM Architecture and Runtime for Efficient LLM Inference
TokenStack's heterogeneous HBM-PIM design with base-die control and topology-aware KV placement delivers 1.62x higher geometric-mean token throughput and 1.70x SLO-compliant serving capacity than AttAcc while cutting per-token energy by 30-47%.
-
Adaptive Management of Microservices in Dynamic Computing Environments: A Taxonomy and Future Directions
A new taxonomy for dynamics-aware microservice management, synthesized from 84 systems, finds that production dynamics are often only partially modeled and that reported performance gains depend on evaluation realism.
-
Agentic Architect: An Agentic AI Framework for Architecture Design Exploration and Optimization
An LLM-driven agentic system evolves microarchitectural policies for cache replacement, data prefetching, and branch prediction, producing designs that match or exceed prior state-of-the-art in IPC on standard benchmarks.