Advances in Neural Information Processing Systems , volume=

Cambrian-1: A fully open, vision-centric exploration of multimodal llms , author=

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata

cs.CV · 2026-05-20 · conditional · novelty 7.0

WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.

Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning

cs.CL · 2026-05-19 · conditional · novelty 7.0

AutoTool uses reinforcement learning with dual-mode rewards to train multimodal LLMs to adaptively choose between tool-assisted and text-centric reasoning, yielding accuracy and efficiency gains on V* and POPE benchmarks.

Deep Pre-Alignment for VLMs

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.

Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency in 360° environments.

citing papers explorer

Showing 4 of 4 citing papers.

WikiVQABench: A Knowledge-Grounded Visual Question Answering Benchmark from Wikipedia and Wikidata cs.CV · 2026-05-20 · conditional · none · ref 127
WikiVQABench is a human-curated collection of Wikipedia-based VQA items that require both visual evidence and external knowledge from Wikidata to answer correctly.
Are Tools Always Beneficial? Learning to Invoke Tools Adaptively for Dual-Mode Multimodal LLM Reasoning cs.CL · 2026-05-19 · conditional · none · ref 72
AutoTool uses reinforcement learning with dual-mode rewards to train multimodal LLMs to adaptively choose between tool-assisted and text-centric reasoning, yielding accuracy and efficiency gains on V* and POPE benchmarks.
Deep Pre-Alignment for VLMs cs.CV · 2026-05-14 · unverdicted · none · ref 159
Deep Pre-Alignment uses a small VLM perceiver instead of ViT to pre-align visual features with LLM text space, yielding 1.9-3.0 point gains on multimodal benchmarks and 32.9% less language forgetting.
Beyond Thinking: Imagining in 360$^\circ$ for Humanoid Visual Search cs.CV · 2026-05-09 · unverdicted · none · ref 43
Imagining in 360° decouples visual search into a single-step probabilistic semantic layout predictor and an actor, removing the need for multi-turn CoT reasoning and trajectory annotations while improving efficiency in 360° environments.

Advances in Neural Information Processing Systems , volume=

fields

years

verdicts

representative citing papers

citing papers explorer