Vec-LUT delivers up to 4.2x speedup over prior LUT methods for parallel ultra-low-bit LLM inference on edge devices by unifying lookups across tokens and adding cache-aware tensor layouts.
bitnet.cpp: BitNet for Everyone
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
BitRL enables on-device RL agents via 1-bit quantized language models, delivering 10-16x memory reduction and 3-5x energy efficiency gains with 85-98% retained performance.
citing papers explorer
-
Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices
Vec-LUT delivers up to 4.2x speedup over prior LUT methods for parallel ultra-low-bit LLM inference on edge devices by unifying lookups across tokens and adding cache-aware tensor layouts.
-
BitRL: Reinforcement Learning with 1-bit Quantized Language Models for Resource-Constrained Edge Deployment
BitRL enables on-device RL agents via 1-bit quantized language models, delivering 10-16x memory reduction and 3-5x energy efficiency gains with 85-98% retained performance.