Pith Number

pith:KCRU7TYZ

pith:2026:KCRU7TYZSHBOFDOHIYB74RMM5E

not attested not anchored not stored refs resolved

Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference

Chengqiu Hu, Fangzheng Miao, Jun Li, Junyi Fan, Lingchao Zheng, Qichen Liao, Rui Shi, Yuwei Fan

Decomposing BF16 activations into low-precision scales lets quantized weights multiply directly via native GEMM.

arxiv:2605.13915 v1 · 2026-05-13 · stat.ML · cs.AI · cs.LG

Open paper page JSON Open Graph Bundle Merged state What is a Pith Number?

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

For INT8 weights (W4A16), two-pass INT8 decomposition achieves near 16 effective bits. For MXFP4 weights (W4A16), two-pass MXFP4 decomposition yields near 6.6 effective bits with error bound 1/64 per block surpassing single-pass MXFP8 while maintaining the same effective GEMM compute time.

C2weakest assumption

That the multi-scale activation decomposition can be implemented with native hardware-accelerated GEMM without introducing pipeline stalls or accuracy loss beyond the derived bounds, and that the closed-form latency models accurately predict real hardware behavior.

C3one line summary

MSD eliminates dequantization from the GEMM path by decomposing BF16 activations into multiple low-precision parts that multiply directly with INT8 or MXFP4 weights, achieving near-16 effective bits for INT8 and 6.6 for MXFP4 with reduced HBM traffic.

References

30 extracted · 30 resolved · 6 Pith anchors

[1] DeepSeek.FlashMLA: Efficient MLA for Large Language Models. Technical Report, 2024. https://github.com/deepseek-ai/FlashMLA 2024

[2] Technical Blog, 2025.https://github.com/deepseek-ai/FlashMLA/blob/main/docs/ 20250929-hopper-fp8-sparse-deep-dive.md 2025

[3] GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers 2022 · arXiv:2210.17323

[4] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration 2023 · arXiv:2306.00978

[5] Castro, Jiale Chen, Torsten Hoefler, and Dan Alistarh 2024

Receipt and verification

First computed	2026-05-17T23:39:18.763747Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

50a34fcf1991c2e28dc74603fe458ce90fbb394a604a1a63b5a39ad28dfaed34

Aliases

arxiv: 2605.13915 · arxiv_version: 2605.13915v1 · doi: 10.48550/arxiv.2605.13915 · pith_short_12: KCRU7TYZSHBO · pith_short_16: KCRU7TYZSHBOFDOH · pith_short_8: KCRU7TYZ

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/KCRU7TYZSHBOFDOHIYB74RMM5E \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 50a34fcf1991c2e28dc74603fe458ce90fbb394a604a1a63b5a39ad28dfaed34

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "ab7b7c8573619592f3a7b9ec9deb8268e5e2cf91535a29a1df75056ca742f810",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "stat.ML",
    "submitted_at": "2026-05-13T09:49:56Z",
    "title_canon_sha256": "07c5f209588171790c38ceed67b05b60fde9b7ca4dc7724170c0db4ee2f5a181"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2605.13915",
    "kind": "arxiv",
    "version": 1
  }
}