An MTJ-based logic-in-memory design performs fully parallel stochastic bit-stream generation and arithmetic without external random number generators by exploiting device stochasticity.
Hassan Najafi, Sercan Aygun, and Marc Riedel
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
roles
background 2polarities
background 2representative citing papers
RetroInfer introduces the wave index and wave buffer to realize sparse KV-cache attention for long-context LLM inference with up to 4.4X throughput gains while matching full-attention accuracy.
citing papers explorer
-
Maximizing Memory-Level Parallelism via Integrated Stochastic Logic-in-Memory Architectures
An MTJ-based logic-in-memory design performs fully parallel stochastic bit-stream generation and arithmetic without external random number generators by exploiting device stochasticity.
-
RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference
RetroInfer introduces the wave index and wave buffer to realize sparse KV-cache attention for long-context LLM inference with up to 4.4X throughput gains while matching full-attention accuracy.