Enhancing distributed inference performance with the nvidia inference transfer library,

NVIDIA

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

cs.CL · 2026-05-19 · unverdicted · novelty 5.0

Mix-Quant quantizes prefilling to NVFP4 and keeps BF16 for decoding in agentic LLMs, achieving up to 3x prefilling speedup while largely preserving task performance on long-context and agentic benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs cs.CL · 2026-05-19 · unverdicted · none · ref 25
Mix-Quant quantizes prefilling to NVFP4 and keeps BF16 for decoding in agentic LLMs, achieving up to 3x prefilling speedup while largely preserving task performance on long-context and agentic benchmarks.

Enhancing distributed inference performance with the nvidia inference transfer library,

fields

years

verdicts

representative citing papers

citing papers explorer