pith. sign in

hub Canonical reference

Nemotron-h: A family of accurate and efficient hybrid mamba-transformer models

Canonical reference. 86% of citing Pith papers cite this work as background.

16 Pith papers citing it
Background 86% of classified citations

hub tools

citation-role summary

background 6 baseline 1

citation-polarity summary

years

2026 11 2025 5

representative citing papers

VibeServe: Can AI Agents Build Bespoke LLM Serving Systems?

cs.AI · 2026-05-07 · unverdicted · novelty 8.0

VibeServe demonstrates that AI agents can synthesize bespoke LLM serving systems end-to-end, remaining competitive with vLLM in standard settings while outperforming it in six non-standard scenarios involving unusual models, workloads, or hardware.

Flash PD-SSM: Memory-Optimized Structured Sparse State-Space Models

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Flash PD-SSM achieves FSA-level expressivity by discretely selecting one matrix from a trainable set of structured sparse transition matrices at each time step while preserving the runtime and memory efficiency of standard structured SSMs.

The Routing and Filtering Structure of Attention

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

Attention decomposes into low-rank routing and symmetric filtering; disentangled S-D attention reveals a spectral cascade allowing early-layer linearization at under 5% perplexity cost.

The Impossibility Triangle of Long-Context Modeling

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

No model can achieve efficiency, compactness, and recall capacity scaling with sequence length at once, as any two imply a strict bound of O(poly(d)/log V) on recallable facts.

Short window attention enables long-term memorization

cs.LG · 2025-09-29 · unverdicted · novelty 6.0

Short sliding windows in hybrid attention-xLSTM models boost long-context performance by encouraging long-term memory use, and stochastic window sizing improves both short and long tasks.

NVIDIA Nemotron 3: Efficient and Open Intelligence

cs.CL · 2025-12-24 · unverdicted · novelty 5.0

NVIDIA releases the Nemotron 3 model family with hybrid Mamba-Transformer architecture, LatentMoE, NVFP4 training, MTP layers, and multi-environment RL post-training for reasoning and agentic tasks.

Small Language Models are the Future of Agentic AI

cs.AI · 2025-06-02 · unverdicted · novelty 5.0

Small language models are sufficiently capable, more suitable, and far more economical than large models for the repetitive tasks that dominate agentic AI systems.

citing papers explorer

Showing 16 of 16 citing papers.