Nemotron 3 Nano: Open, efficient mixture-of-experts hybrid Mamba-Transformer model for Agentic reasoning, 2025

NVIDIA · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter

cs.DC · 2026-04-16 · unverdicted · novelty 6.0

PrfaaS enables practical cross-datacenter prefill-decode disaggregation for hybrid-attention models via selective offloading, bandwidth-aware scheduling, and cache-aware placement, yielding 54% higher throughput and 64% lower P90 TTFT than homogeneous baselines in a 1T-parameter case study.

citing papers explorer

Showing 1 of 1 citing paper.

Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter cs.DC · 2026-04-16 · unverdicted · none · ref 18
PrfaaS enables practical cross-datacenter prefill-decode disaggregation for hybrid-attention models via selective offloading, bandwidth-aware scheduling, and cache-aware placement, yielding 54% higher throughput and 64% lower P90 TTFT than homogeneous baselines in a 1T-parameter case study.

Nemotron 3 Nano: Open, efficient mixture-of-experts hybrid Mamba-Transformer model for Agentic reasoning, 2025

fields

years

verdicts

representative citing papers

citing papers explorer