Gated Subspace Inference accelerates transformer linear layers 3-10x via low-rank cached subspace computation and per-token gating to skip residuals while preserving output distribution to high accuracy.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
RankGuide uses tensor-rank analysis of consecutive hidden states to route between small and large reasoning models and steer generations, reducing latency up to 1.75x while maintaining competitive accuracy on reasoning benchmarks.
citing papers explorer
-
Gated Subspace Inference for Transformer Acceleration
Gated Subspace Inference accelerates transformer linear layers 3-10x via low-rank cached subspace computation and per-token gating to skip residuals while preserving output distribution to high accuracy.
-
RankGuide: Tensor-Rank-Guided Routing and Steering for Efficient Reasoning
RankGuide uses tensor-rank analysis of consecutive hidden states to route between small and large reasoning models and steer generations, reducing latency up to 1.75x while maintaining competitive accuracy on reasoning benchmarks.