hub

W., and Keutzer, K

· 2021 · arXiv 2103.13630

23 Pith papers cite this work. Polarity classification is still indexing.

23 Pith papers citing it

read on arXiv browse 23 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

cs.LG · 2025-07-24 · unverdicted · novelty 8.0

GPTQ is equivalent to Babai's nearest plane algorithm for CVP on the Hessian lattice of layer inputs, yielding geometric interpretation, inherited error bounds, and improved clipping-free quantization with GPU kernels.

The Complexity of Verifying Feedforward Neural Networks in Quantised Settings

cs.CC · 2026-05-28 · unverdicted · novelty 7.0

Verification of fixed-precision quantized FNNs is NP-complete under both LP and BV specifications, matching the rational case, while dynamic quantization with BV specs has established upper bounds complementing known PSPACE-hardness.

When Bits Break Recourse: Counterfactual-Faithful Quantization

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

CFQ trains quantizer parameters and mixed-precision allocation to preserve counterfactual recourse validity, cost, and direction on Adult, German Credit, and COMPAS while matching accuracy of standard quantizers.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.

DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling

cs.LG · 2025-09-03 · unverdicted · novelty 7.0

DPQuant uses epoch-wise probabilistic layer rotation and DP loss sensitivity to quantize only a changing subset of layers, reducing accuracy degradation from quantization noise in DP-SGD and delivering up to 2.21x throughput gains with under 2% accuracy drop.

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

cs.LG · 2022-10-31 · unverdicted · novelty 7.0

GPTQ quantizes 175B-parameter GPT models to 3-4 bits per weight in one shot using approximate second-order information, achieving negligible accuracy degradation and 3-4x inference speedups.

LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale

cs.LG · 2022-08-15 · conditional · novelty 7.0

LLM.int8() performs 8-bit inference for transformers up to 175B parameters with no accuracy loss by combining vector-wise quantization for most features with 16-bit mixed-precision handling of systematic outlier dimensions.

Minimum Distortion Quantization with Specified Output Distribution

cs.IT · 2026-06-09 · unverdicted · novelty 6.0

Derives optimal quantizer form X=σ(F^{-1}(F_W(W))) with permutation σ minimizing MMSE under specified output distribution P_X, using majorization.

Machine learning enables experimental access to photon-by-photon arrival times in scintillation detectors

physics.ins-det · 2026-05-27 · unverdicted · novelty 6.0

Deep learning extracts photon-by-photon arrival times from scintillation detector waveforms using unsupervised training with a physically informed model, enabling improved timing resolution and photon classification in experiments.

Prompt2Fingerprint: Plug-and-Play LLM Fingerprinting via Text-to-Weight Generation

cs.CR · 2026-05-18 · unverdicted · novelty 6.0

P2F generates low-rank parameter increments for LLM fingerprinting directly from textual descriptions in a single forward pass.

Rethink the Role of Neural Decoders in Quantum Error Correction

quant-ph · 2026-05-12 · unverdicted · novelty 6.0

Neural decoders for surface-code QEC achieve practical microsecond FPGA latency when trained on large datasets with appropriate inductive biases and INT4 quantization, rather than relying on architectural complexity.

LoKA: Low-precision Kernel Applications for Recommendation Models At Scale

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 2 refs

LoKA enables practical FP8 use in numerically sensitive large recommendation models via online profiling of activations, reusable model modifications for stability, and dynamic kernel dispatching.

Litespark Inference For CPUs: Ultra-Fast SIMD Framework for Ternary (1.58-bit) Language Models

cs.CL · 2026-05-07 · unverdicted · novelty 6.0 · 2 refs

Litespark-Inference delivers custom SIMD kernels for ternary LLMs achieving up to 95.81x throughput versus PyTorch on CPUs by using integer addition/subtraction instead of floating-point math.

Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

cs.CL · 2026-04-09 · unverdicted · novelty 6.0

Output-aware EM initialization for codebooks in additive quantization avoids poor optimization basins and yields better 2-bit compressed LLMs across Llama and Qwen models.

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

cs.CL · 2023-06-01 · conditional · novelty 6.0

AWQ quantizes LLM weights to low bits by scaling salient channels based on activation statistics, outperforming prior methods on language, coding, math, and multi-modal benchmarks.

PALUTE: Processing-In-Memory Acceleration via Lookup Table for Edge LLM Inference

cs.AR · 2026-06-08 · unverdicted · novelty 5.0

PALUTE is a new PIM accelerator using in-DRAM LUTs on M3D DRAM that reports 1264 TPS at 0.16 W with 12.8x energy efficiency gains over CHIME for quantized edge LLM inference.

The Thermodynamic Costs of Simple Linear Regression

cond-mat.stat-mech · 2026-05-18 · unverdicted · novelty 5.0

Thermodynamic lower bounds are approximated for exact and SGD linear regression, producing energy-aware scaling laws for optimal training dataset size given a target generalization error.

Sustainability Is Not Linear: Quantifying Performance, Energy, and Privacy Trade-offs in On-Device Intelligence

cs.SE · 2026-03-27 · unverdicted · novelty 5.0

Empirical case study on a flagship Android device profiles energy, latency, and quality trade-offs across eight LLMs, revealing a quantization energy paradox and identifying mid-sized models as practical sweet spots.

$\text{Log}_\text{b}$Quant: Quantizing Language Models in Logarithmic Space

cs.CL · 2026-07-01 · unverdicted · novelty 4.0

Log_b Quant is an adjustable-base logarithmic quantization technique that outperforms tensor-wise asymmetric linear quantization at 4-bit precision on language model benchmarks while providing memory savings.

Memristor Technologies for Dynamic Vision Sensors: A Critical Assessment and Research Roadmap

cs.AR · 2026-05-13 · accept · novelty 4.0

A structured review concludes that end-to-end DVS-memristor integration for analog in-memory event-driven computing remains an open challenge at TRL 2-5, with half of surveyed applications resting on projections rather than demonstrations.

Harnessing Photonics for Machine Intelligence

physics.optics · 2026-04-12 · unverdicted · novelty 4.0

Photonic computing can reshape AI acceleration through optical bandwidth and parallelism, but requires cross-layer co-design and electronic-photonic design automation to move from prototypes to scalable systems.

K-Quantization and its Impact on Output Performance

cs.CL · 2026-05-19 · unverdicted · novelty 3.0

Empirical evaluation of quantization effects on eight LLMs across bit widths, showing performance generally declines at lower precision but with model-size-dependent resilience and acceptable accuracy at 2 bits for many cases.

You Had One Job: Per-Task Quantization Using LLMs' Hidden Representations

cs.CL · 2025-11-09

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

W., and Keutzer, K

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer