TrilinearCIM enables complete in-memory Transformer attention computation via DG-FeFET three-operand MAC without runtime NVM reprogramming, delivering up to 46.6% energy reduction and 20.4% latency improvement on BERT and ViT benchmarks at 37.3% area cost.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Trilinear Compute-in-Memory Architecture for Energy-Efficient Transformer Acceleration
TrilinearCIM enables complete in-memory Transformer attention computation via DG-FeFET three-operand MAC without runtime NVM reprogramming, delivering up to 46.6% energy reduction and 20.4% latency improvement on BERT and ViT benchmarks at 37.3% area cost.