and Keutzer, Kurt , title =

Gholami, Amir, Kim, Sehoon, Dong, Zhen, Yao, Zhewei, Mahoney, Michael W · 2021 · arXiv 2103.13630

24 Pith papers cite this work. Polarity classification is still indexing.

24 Pith papers citing it

read on arXiv browse 24 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

cs.LG · 2025-07-24 · unverdicted · novelty 8.0

GPTQ is equivalent to Babai's nearest plane algorithm for CVP on the Hessian lattice of layer inputs, yielding geometric interpretation, inherited error bounds, and improved clipping-free quantization with GPU kernels.

The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

cs.CC · 2026-05-28 · unverdicted · novelty 7.0

Verification of fixed-precision quantized FNNs is NP-complete under both LP and BV specifications, matching the rational case, while dynamic quantization with BV specs has established upper bounds complementing known PSPACE-hardness.

When Bits Break Recourse: Counterfactual-Faithful Quantization

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

CFQ trains quantizer parameters and mixed-precision allocation to preserve counterfactual recourse validity, cost, and direction on Adult, German Credit, and COMPAS while matching accuracy of standard quantizers.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.

DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling

cs.LG · 2025-09-03 · unverdicted · novelty 7.0

DPQuant uses epoch-wise probabilistic layer rotation and DP loss sensitivity to quantize only a changing subset of layers, reducing accuracy degradation from quantization noise in DP-SGD and delivering up to 2.21x throughput gains with under 2% accuracy drop.

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

cs.LG · 2022-10-31 · unverdicted · novelty 7.0

GPTQ quantizes 175B-parameter GPT models to 3-4 bits per weight in one shot using approximate second-order information, achieving negligible accuracy degradation and 3-4x inference speedups.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score

cs.LG · 2026-06-11 · unverdicted · novelty 6.0

Introduces TQS metric and TQS-PTQ framework that uses dynamical-systems stability to enable a priori, calibration-free mixed-precision post-training quantization for time-series models.

Minimum Distortion Quantization with Specified Output Distribution

cs.IT · 2026-06-09 · unverdicted · novelty 6.0

Derives optimal quantizer form X=σ(F^{-1}(F_W(W))) with permutation σ minimizing MMSE under specified output distribution P_X, using majorization.

Machine learning enables experimental access to photon-by-photon arrival times in scintillation detectors

physics.ins-det · 2026-05-27 · unverdicted · novelty 6.0

Deep learning extracts photon-by-photon arrival times from scintillation detector waveforms using unsupervised training with a physically informed model, enabling improved timing resolution and photon classification in experiments.

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

P2F generates low-rank parameter increments for LLM fingerprinting directly from textual descriptions in a single forward pass.

Rethink the Role of Neural Decoders in Quantum Error Correction

quant-ph · 2026-05-12 · unverdicted · novelty 6.0

Neural decoders for surface-code QEC achieve practical microsecond FPGA latency when trained on large datasets with appropriate inductive biases and INT4 quantization, rather than relying on architectural complexity.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.

Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models

cs.CL · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Litespark-Inference delivers custom SIMD kernels for ternary LLMs achieving up to 95.81x throughput versus PyTorch on CPUs by using integer addition/subtraction instead of floating-point math.

Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

Output-aware EM initialization for codebooks in additive quantization avoids poor optimization basins and yields better 2-bit compressed LLMs across Llama and Qwen models.

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

cs.CL · 2023-06-01 · conditional · novelty 6.0

AWQ quantizes LLM weights to low bits by scaling salient channels based on activation statistics, outperforming prior methods on language, coding, math, and multi-modal benchmarks.

PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference

cs.AR · 2026-06-08 · unverdicted · novelty 5.0

PALUTE is a new PIM accelerator using in-DRAM LUTs on M3D DRAM that reports 1264 TPS at 0.16 W with 12.8x energy efficiency gains over CHIME for quantized edge LLM inference.

The Thermodynamic Costs of Simple Linear Regression

cond-mat.stat-mech · 2026-05-18 · unverdicted · novelty 5.0

Thermodynamic lower bounds are approximated for exact and SGD linear regression, producing energy-aware scaling laws for optimal training dataset size given a target generalization error.

Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence

cs.SE · 2026-03-27 · unverdicted · novelty 5.0

Empirical case study on a flagship Android device profiles energy, latency, and quality trade-offs across eight LLMs, revealing a quantization energy paradox and identifying mid-sized models as practical sweet spots.

$\text{Log}_\text{b}$Quant: Quantizing Language Models in Logarithmic Space

cs.CL · 2026-07-01 · unverdicted · novelty 4.0

Log_b Quant is an adjustable-base logarithmic quantization technique that outperforms tensor-wise asymmetric linear quantization at 4-bit precision on language model benchmarks while providing memory savings.

Memristor Technologies for Dynamic Vision Sensors: A Critical Assessment and Research Roadmap

cs.AR · 2026-05-13 · accept · novelty 4.0

A structured review concludes that end-to-end DVS-memristor integration for analog in-memory event-driven computing remains an open challenge at TRL 2-5, with half of surveyed applications resting on projections rather than demonstrations.

Harnessing Photonics for Machine Intelligence

physics.optics · 2026-04-12 · unverdicted · novelty 4.0

Photonic computing can reshape AI acceleration through optical bandwidth and parallelism, but requires cross-layer co-design and electronic-photonic design automation to move from prototypes to scalable systems.

K-Quantization and its Impact on Output Performance

cs.CL · 2026-05-19 · unverdicted · novelty 3.0

Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.

You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations

cs.CL · 2025-11-09

citing papers explorer

Showing 24 of 24 citing papers.

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm cs.LG · 2025-07-24 · unverdicted · none · ref 6
GPTQ is equivalent to Babai's nearest plane algorithm for CVP on the Hessian lattice of layer inputs, yielding geometric interpretation, inherited error bounds, and improved clipping-free quantization with GPU kernels.
The Complexity of Verifying Feedforward Neural Networks in Quantised Settings cs.CC · 2026-05-28 · unverdicted · none · ref 5
Verification of fixed-precision quantized FNNs is NP-complete under both LP and BV specifications, matching the rational case, while dynamic quantization with BV specs has established upper bounds complementing known PSPACE-hardness.
When Bits Break Recourse: Counterfactual-Faithful Quantization cs.LG · 2026-05-16 · unverdicted · none · ref 11
CFQ trains quantizer parameters and mixed-precision allocation to preserve counterfactual recourse validity, cost, and direction on Adult, German Credit, and COMPAS while matching accuracy of standard quantizers.
Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis cs.LG · 2026-05-15 · unverdicted · none · ref 297
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.
DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling cs.LG · 2025-09-03 · unverdicted · none · ref 18
DPQuant uses epoch-wise probabilistic layer rotation and DP loss sensitivity to quantize only a changing subset of layers, reducing accuracy degradation from quantization noise in DP-SGD and delivering up to 2.21x throughput gains with under 2% accuracy drop.
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers cs.LG · 2022-10-31 · unverdicted · none · ref 6
GPTQ quantizes 175B-parameter GPT models to 3-4 bits per weight in one shot using approximate second-order information, achieving negligible accuracy degradation and 3-4x inference speedups.
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale cs.LG · 2022-08-15 · conditional · none · ref 135
LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.
Quantizing Time-Series Models As Dynamical Systems: Trajectory-Based Quantization Sensitivity Score cs.LG · 2026-06-11 · unverdicted · none · ref 34
Introduces TQS metric and TQS-PTQ framework that uses dynamical-systems stability to enable a priori, calibration-free mixed-precision post-training quantization for time-series models.
Minimum Distortion Quantization with Specified Output Distribution cs.IT · 2026-06-09 · unverdicted · none · ref 16
Derives optimal quantizer form X=σ(F^{-1}(F_W(W))) with permutation σ minimizing MMSE under specified output distribution P_X, using majorization.
Machine learning enables experimental access to photon-by-photon arrival times in scintillation detectors physics.ins-det · 2026-05-27 · unverdicted · none · ref 53
Deep learning extracts photon-by-photon arrival times from scintillation detector waveforms using unsupervised training with a physically informed model, enabling improved timing resolution and photon classification in experiments.
Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation cs.CR · 2026-05-18 · unverdicted · none · ref 3
P2F generates low-rank parameter increments for LLM fingerprinting directly from textual descriptions in a single forward pass.
Rethink the Role of Neural Decoders in Quantum Error Correction quant-ph · 2026-05-12 · unverdicted · none · ref 12
Neural decoders for surface-code QEC achieve practical microsecond FPGA latency when trained on large datasets with appropriate inductive biases and INT4 quantization, rather than relying on architectural complexity.
LoKA: Low-precision Kernel Applications for Recommendation Models At Scale cs.LG · 2026-05-11 · unverdicted · none · ref 34 · 2 links
LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.
Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models cs.CL · 2026-05-07 · unverdicted · none · ref 15 · 2 links
Litespark-Inference delivers custom SIMD kernels for ternary LLMs achieving up to 95.81x throughput versus PyTorch on CPUs by using integer addition/subtraction instead of floating-point math.
Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization cs.CL · 2026-04-09 · unverdicted · none · ref 12
Output-aware EM initialization for codebooks in additive quantization avoids poor optimization basins and yields better 2-bit compressed LLMs across Llama and Qwen models.
AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration cs.CL · 2023-06-01 · conditional · none · ref 16
AWQ quantizes LLM weights to low bits by scaling salient channels based on activation statistics, outperforming prior methods on language, coding, math, and multi-modal benchmarks.
PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference cs.AR · 2026-06-08 · unverdicted · none · ref 10
PALUTE is a new PIM accelerator using in-DRAM LUTs on M3D DRAM that reports 1264 TPS at 0.16 W with 12.8x energy efficiency gains over CHIME for quantized edge LLM inference.
The Thermodynamic Costs of Simple Linear Regression cond-mat.stat-mech · 2026-05-18 · unverdicted · none · ref 77
Thermodynamic lower bounds are approximated for exact and SGD linear regression, producing energy-aware scaling laws for optimal training dataset size given a target generalization error.
Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence cs.SE · 2026-03-27 · unverdicted · none · ref 59
Empirical case study on a flagship Android device profiles energy, latency, and quality trade-offs across eight LLMs, revealing a quantization energy paradox and identifying mid-sized models as practical sweet spots.
$\text{Log}_\text{b}$Quant: Quantizing Language Models in Logarithmic Space cs.CL · 2026-07-01 · unverdicted · none · ref 15
Log_b Quant is an adjustable-base logarithmic quantization technique that outperforms tensor-wise asymmetric linear quantization at 4-bit precision on language model benchmarks while providing memory savings.
Memristor Technologies for Dynamic Vision Sensors: A Critical Assessment and Research Roadmap cs.AR · 2026-05-13 · accept · none · ref 106
A structured review concludes that end-to-end DVS-memristor integration for analog in-memory event-driven computing remains an open challenge at TRL 2-5, with half of surveyed applications resting on projections rather than demonstrations.
Harnessing Photonics for Machine Intelligence physics.optics · 2026-04-12 · unverdicted · none · ref 24
Photonic computing can reshape AI acceleration through optical bandwidth and parallelism, but requires cross-layer co-design and electronic-photonic design automation to move from prototypes to scalable systems.
K-Quantization and its Impact on Output Performance cs.CL · 2026-05-19 · unverdicted · none · ref 53
Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.
You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations cs.CL · 2025-11-09 · unreviewed · ref 20

and Keutzer, Kurt , title =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer