Title resolution pending

Lianmin Zheng, Liangsheng Yin, Zhiqiang Xie, Chuyue Livia Sun, Jeff Huang, Cody Hao Yu, Shiyi Cao, Christos Kozyrakis, Ion Stoica, Joseph E Gonzalez, et al · 2024

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Sieve: Dynamic Expert-Aware PIM Acceleration for Evolving Mixture-of-Experts Models

cs.AR · 2026-05-11 · conditional · novelty 8.0

Sieve dynamically schedules MoE experts across GPU and PIM hardware to handle bimodal token distributions, achieving 1.3x to 1.6x gains in throughput and interactivity over static prior PIM systems on three large models.

CachePrune: Privacy-Aware and Fine-Grained KV Cache Sharing for Efficient LLM Inference

cs.CR · 2026-05-22 · unverdicted · novelty 7.0

CachePrune enables fine-grained, token-level KV cache reuse across LLM requests by masking sensitive segments, eliminating direct side-channel leakage while cutting TTFT by 4.5x and raising hit rates by 44% versus prior coarse-grained methods.

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching

cs.LG · 2026-04-20 · accept · novelty 7.0

FlashFPS accelerates FPS via candidate/iteration pruning and inter-layer caching, delivering 5.16x GPU speedup and 2.69x on accelerators with negligible accuracy loss.

MTServe: Efficient Serving for Generative Recommendation Models with Hierarchical Caches

cs.LG · 2026-04-24 · unverdicted · novelty 6.0

MTServe achieves up to 3.1x speedup for generative recommendation model serving by using hierarchical caches with host RAM and system optimizations while keeping cache hit ratios above 98.5%.

TOPCELL: Topology Optimization of Standard Cell via LLMs

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

TOPCELL reformulates standard cell topology optimization as an LLM generative task with GRPO fine-tuning, outperforming base models and matching exhaustive solvers with 85.91x speedup in 2nm/7nm industrial flows.

DeepStack: Scalable and Accurate Design Space Exploration for Distributed 3D-Stacked AI Accelerators

cs.AR · 2026-04-06 · conditional · novelty 6.0

DeepStack introduces a fast performance model and hierarchical search method for co-optimizing 3D DRAM stacking, interconnects, and distributed scheduling in AI accelerators, delivering up to 9.5x throughput gains over baselines.

citing papers explorer

Showing 1 of 1 citing paper after filters.

FlashFPS: Efficient Farthest Point Sampling for Large-Scale Point Clouds via Pruning and Caching cs.LG · 2026-04-20 · accept · none · ref 27
FlashFPS accelerates FPS via candidate/iteration pruning and inter-layer caching, delivering 5.16x GPU speedup and 2.69x on accelerators with negligible accuracy loss.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer