pith. sign in

A survey on multimodal large language models.National Science Review

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

fields

cs.CV 2 cs.CL 1

years

2026 2 2025 1

verdicts

UNVERDICTED 3

roles

background 2

polarities

background 2

representative citing papers

PersonaVLM: Long-Term Personalized Multimodal LLMs

cs.CL · 2026-03-20 · unverdicted · novelty 6.0

PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

Cambrian-S: Towards Spatial Supersensing in Video

cs.CV · 2025-11-06 · unverdicted · novelty 6.0

Cambrian-S introduces VSI-SUPER benchmarks for long-horizon spatial recall and counting, shows data scaling yields 30% gains on existing tests, and demonstrates a self-supervised next-latent predictor using surprise outperforms baselines on the new spatial supersensing tasks.

citing papers explorer

Showing 3 of 3 citing papers.

  • Through the Lens of Character: Resolving Modality-Role Interference in Multimodal Role-Playing Agent cs.CV · 2026-05-10 · unverdicted · none · ref 21

    CAVI framework uses character-guided token pruning, orthogonal feature modulation, and modality-adaptive role steering to resolve modality-role interference in multimodal RPAs.

  • PersonaVLM: Long-Term Personalized Multimodal LLMs cs.CL · 2026-03-20 · unverdicted · none · ref 47

    PersonaVLM adds memory extraction, multi-turn retrieval-based reasoning, and personality inference to multimodal LLMs, yielding 22.4% gains on a new long-term personalization benchmark and outperforming GPT-4o.

  • Cambrian-S: Towards Spatial Supersensing in Video cs.CV · 2025-11-06 · unverdicted · none · ref 155

    Cambrian-S introduces VSI-SUPER benchmarks for long-horizon spatial recall and counting, shows data scaling yields 30% gains on existing tests, and demonstrates a self-supervised next-latent predictor using surprise outperforms baselines on the new spatial supersensing tasks.