Title resolution pending

Vage Egiazarian, Roberto L · 2026 · arXiv 2509.23202

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1 other 1

citation-polarity summary

background 1 unclear 1

representative citing papers

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling

cs.CL · 2025-12-01 · conditional · novelty 7.0

Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.

Decomposing MXFP4 quantization error for LLM reinforcement learning: reducible bias, recoverable deadzone, and an irreducible floor

cs.LG · 2026-05-19 · unverdicted · novelty 6.0 · 3 refs

MXFP4 quantization error decomposes into scale bias, deadzone truncation, and grid noise; mode-targeted corrections recover BF16 accuracy within 0.7% on Qwen2.5-3B and exceed it by 1.0% on Qwen3-30B-A3B.

Pretraining large language models with MXFP4 on Native FP4 Hardware

cs.LG · 2026-05-11 · unverdicted · novelty 6.0 · 3 refs

Weight gradient FP4 quantization drives LLM pretraining divergence, which deterministic Hadamard rotations can stabilize on native MXFP4 hardware.

Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

DASH-Q uses a stable diagonal curvature estimate and weighted least squares to achieve robust ultra-low-bit post-training quantization of LLMs, improving zero-shot accuracy by 7% on average over baselines.

AdaHOP: Fast and Accurate Low-Precision Training via Outlier-Pattern-Aware Rotation

cs.LG · 2026-04-02 · unverdicted · novelty 6.0

AdaHOP applies pattern-aware Hadamard transforms and selective outlier extraction to enable from-scratch MXFP4 training of LLMs at BF16 quality with up to 3.6X memory compression and 1.46X speedup.

MixFP4: Enhancing NVFP4 with Adaptive FP4/INT4 Block Representations

cs.AR · 2026-05-29 · unverdicted · novelty 5.0

MixFP4 extends NVFP4 by adaptively selecting between two FP4 micro-formats per block using repurposed scale sign bits and a unified E2M2 compute path, claiming better accuracy than standard NVFP4 at 3.1% area and 1.5% power overhead.

Not All NVFP4 QAT Recipes Are Equal: How Architecture and Scale Shape Model Quality for Anomaly Segmentation

cs.CV · 2026-05-26 · unverdicted · novelty 5.0

Attention-based architectures like Swin Transformer show greater robustness to FP4 QAT recipe choice than CNNs across model scales in anomaly segmentation, with architecture having the largest impact.

Timestep-Aware SVDQuant-GPTQ for W4A4 Quantization of Wan2.2-I2V

cs.CV · 2026-05-26 · unverdicted · novelty 5.0

Timestep- and expert-aware W4A4 quantization framework for Wan2.2-I2V MoE DiT using SVDQuant-GPTQ achieves 59.3% peak GPU memory reduction with 0.9% VBench and 2.3% Imaging Quality drops.

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mix-Quant quantizes prefilling to NVFP4 and keeps BF16 for decoding in agentic LLMs, achieving up to 3x prefilling speedup while largely preserving task performance on long-context and agentic benchmarks.

Finer is Better (with the Right Scaling)

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

The block-size paradox in LLM microscaling is caused by underflow in subnormal E4M3 scaling factors; preventing underflow and using 4-over-6 selection resolves it, with brute-force confirming MSE strictly improves as blocks get finer.

TACO: Efficient Communication Compression of Intermediate Tensors for Scalable Tensor-Parallel LLM Training

cs.DC · 2026-04-27 · unverdicted · novelty 5.0

TACO compresses tensor-parallel intermediate tensors with an adaptive FP8 scheme and fused kernels, yielding up to 1.87X throughput gains on GPT and Qwen models with near-lossless accuracy.

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling cs.CL · 2025-12-01 · conditional · none · ref 26
Four Over Six adaptively scales blocks in NVFP4 quantization to smaller FP4 values, making representable value distributions more uniform and reducing quantization error especially for near-maximal values.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer