CIMple delivers a 32 kb digital SRAM-based compute-in-memory accelerator for transformer self-attention that reaches 26.1 TOPS/W at 0.85 V in 28 nm with INT8 precision using dual-banked architecture and LUT-based split softmax.
23.8 an 88.36tops/w bit-level-weight-compressed large-language- model accelerator with cluster-aligned int-fp-gemm and bi-dimensional workflow reformulation,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CIMple: Standard-cell SRAM-based CIM with LUT-based split softmax for attention acceleration
CIMple delivers a 32 kb digital SRAM-based compute-in-memory accelerator for transformer self-attention that reaches 26.1 TOPS/W at 0.85 V in 28 nm with INT8 precision using dual-banked architecture and LUT-based split softmax.