RCW-CIM reduces Llama2-7B decoding latency by 21.59% and prefill latency by 49.76% via minimized weight updates and DRAM accesses, delivering 3.28 TOPS and 42.3 TOPS/W on a fabricated 22 nm chip.
16.4 an 89tops/w and 16.3tops/mm2 all-digital sram- based full-precision compute-in memory macro in 22nm for machine- learning edge applications,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
RCW-CIM: A Digital CIM-based LLM Accelerator with Read-Compute/Write
RCW-CIM reduces Llama2-7B decoding latency by 21.59% and prefill latency by 49.76% via minimized weight updates and DRAM accesses, delivering 3.28 TOPS and 42.3 TOPS/W on a fabricated 22 nm chip.