pith. sign in

hub Canonical reference

Ming-omni: A unified multimodal model for perception and generation

Canonical reference. 100% of citing Pith papers cite this work as background.

14 Pith papers citing it
Background 100% of classified citations

hub tools

citation-role summary

background 8

citation-polarity summary

years

2026 13 2025 1

roles

background 8

polarities

background 8

representative citing papers

When Vision Speaks for Sound

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Video MLLMs show an audio-visual Clever Hans effect relying on visual-acoustic correlations rather than audio verification; Thud interventions diagnose it and a 10K-sample preference alignment improves intervention performance by 28 points.

Accelerating Compound LLM Training Workloads with Maestro

cs.DC · 2026-05-11 · unverdicted · novelty 6.0

Maestro accelerates compound LLM training via section graphs for per-component configuration and wavefront scheduling for dynamic execution, reducing GPU consumption by ~40% in real deployments.

SMoES: Soft Modality-Guided Expert Specialization in MoE-VLMs

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

SMoES improves MoE-VLM performance and efficiency via soft modality-guided expert routing and inter-bin mutual information regularization, yielding 0.9-4.2% task gains and 56% communication reduction.

Context Unrolling in Omni Models

cs.CV · 2026-04-23 · unverdicted · novelty 5.0

Omni is a multimodal model whose native training on diverse data types enables context unrolling, allowing explicit reasoning across modalities to better approximate shared knowledge and improve downstream performance.

citing papers explorer

Showing 14 of 14 citing papers.