Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.
Advances in Neural Information Processing Systems , year =
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 2polarities
background 2representative citing papers
Apple MPS transformer decoding shows abrupt latency spikes up to 21x in narrow decoding-budget intervals due to KV cache and execution regime shifts, absent on CPU and CUDA.
SDG-MoE introduces learned signed interaction graphs and disagreement-gated deliberation among experts in MoE architectures, yielding 19.8% better validation perplexity than the strongest baseline.
SCALE-LoRA proposes a post-retrieval audit framework using sparse residual composition and disagreement-based reliability signals to improve open-pool LoRA adapter reuse on tasks like BIG-Bench Hard.
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.
citing papers explorer
-
Tensor Cache: Eviction-conditioned Associative Memory for Transformers
Tensor Cache augments sliding-window attention with an eviction-fed outer-product associative memory and a training correction to improve long-context performance under bounded memory.
-
Non-Monotonic Latency in Apple MPS Decoding: KV Cache Interactions and Execution Regimes
Apple MPS transformer decoding shows abrupt latency spikes up to 21x in narrow decoding-budget intervals due to KV cache and execution regime shifts, absent on CPU and CUDA.
-
SDG-MoE: Signed Debate Graph Mixture-of-Experts
SDG-MoE introduces learned signed interaction graphs and disagreement-gated deliberation among experts in MoE architectures, yielding 19.8% better validation perplexity than the strongest baseline.
-
SCALE-LoRA: Auditing Post-Retrieval LoRA Composition with Residual Merging and View Reliability
SCALE-LoRA proposes a post-retrieval audit framework using sparse residual composition and disagreement-based reliability signals to improve open-pool LoRA adapter reuse on tasks like BIG-Bench Hard.
-
Language models fail at extended rule following
LLMs fail at extended counting of repeated characters due to finite internal states, with abrupt errors persisting across model scales and inference methods.
- BioFormer: Rethinking Cross-Subject Generalization via Spectral Structural Alignment in Biomedical Time-Series