An 88.36TOPS/W Bit-Level-Weight-Compressed Large-Language-Model Accelerator with Cluster-Aligned INT-FP-GEMM and Bi-Dimensional Workﬂow Reformulation

Seunghyun Moon, Mao Li, Gregory K · 2025 · arXiv 9661.2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Genetic programming evolves heterogeneous layer-specific scalar functions to approximate layer normalization in pre-trained ViTs, capturing 91.6% variance versus 70.2% for uniform baselines and recovering 84.25% ImageNet Top-1 accuracy after 20 epochs of adaptation.

31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding

cs.AR · 2026-05-10 · unverdicted · novelty 5.0 · 3 refs

A ReRAM-on-logic stacked chip delivers 14.08-135.69 tokens/s LLM inference with block-clustered compression and adaptive parallel speculative decoding, yielding 4.46-7.17x speedup over standard methods.

citing papers explorer

Showing 2 of 2 citing papers.

Evolving Layer-Specific Scalar Functions for Hardware-Aware Transformer Adaptation cs.CV · 2026-05-13 · unverdicted · none · ref 34
Genetic programming evolves heterogeneous layer-specific scalar functions to approximate layer normalization in pre-trained ViTs, capturing 91.6% variance versus 70.2% for uniform baselines and recovering 84.25% ImageNet Top-1 accuracy after 20 epochs of adaptation.
31.1 A 14.08-to-135.69Token/s ReRAM-on-Logic Stacked Outlier-Free Large-Language-Model Accelerator with Block-Clustered Weight-Compression and Adaptive Parallel-Speculative-Decoding cs.AR · 2026-05-10 · unverdicted · none · ref 4 · 3 links
A ReRAM-on-logic stacked chip delivers 14.08-135.69 tokens/s LLM inference with block-clustered compression and adaptive parallel speculative decoding, yielding 4.46-7.17x speedup over standard methods.

An 88.36TOPS/W Bit-Level-Weight-Compressed Large-Language-Model Accelerator with Cluster-Aligned INT-FP-GEMM and Bi-Dimensional Workﬂow Reformulation

fields

years

verdicts

representative citing papers

citing papers explorer