AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
BWLA is the first post-training quantization method for LLMs that achieves 1-bit weights paired with low-bit activations such as 6 bits, using OKT to reshape weights and suppress activation tails plus PSP for low-rank refinement.
XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.
ExecuTorch is a unified PyTorch-native deployment framework that enables seamless on-device execution of AI models across heterogeneous hardware while preserving original PyTorch semantics.
citing papers explorer
-
AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.
-
BWLA: Breaking the Barrier of W1AX Post-Training Quantization for LLMs
BWLA is the first post-training quantization method for LLMs that achieves 1-bit weights paired with low-bit activations such as 6 bits, using OKT to reshape weights and suppress activation tails plus PSP for low-rank refinement.
-
XPERT: Expert Knowledge Transfer for Effective Training of Language Models
XPERT extracts and reuses cross-domain expert knowledge from pre-trained MoE LLMs via inference analysis and tensor decomposition to improve performance and convergence in downstream language model training.
-
ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device
ExecuTorch is a unified PyTorch-native deployment framework that enables seamless on-device execution of AI models across heterogeneous hardware while preserving original PyTorch semantics.
- Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild