A lightweight shared-memory technique for NF4 dequantization kernels yields 2.0-2.2x kernel speedup and 1.54x end-to-end gains on models up to 70B parameters while using only 64 bytes of shared memory per block.
Training language models to follow instructions with human feedback
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 2
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
The paper proposes a bidirectional continuum between LLMs and control systems, covering LLM-assisted controller design, control-based LLM steering, and state-space modeling of LLMs.
citing papers explorer
-
Fast NF4 Dequantization Kernels for Large Language Model Inference
A lightweight shared-memory technique for NF4 dequantization kernels yields 2.0-2.2x kernel speedup and 1.54x end-to-end gains on models up to 70B parameters while using only 64 bytes of shared memory per block.
-
When control meets large language models: From words to dynamics
The paper proposes a bidirectional continuum between LLMs and control systems, covering LLM-assisted controller design, control-based LLM steering, and state-space modeling of LLMs.