Title resolution pending

Alina Kuznetsova, Hassan Rom, Neil Alldrin, Jasper Uijlings, Ivan Krasin, Jordi Pont-Tuset, Shahab Kamali, Stefan Popov, Matteo Malloci, Alexander Kolesnikov, et al · 1956

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

dataset 2

citation-polarity summary

use dataset 2

representative citing papers

CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models

cs.CV · 2026-05-19 · unverdicted · novelty 6.0

CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.

SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.

Emu3.5: Native Multimodal Models are World Learners

cs.CV · 2025-10-30 · unverdicted · novelty 6.0

Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.

Steering Visual Generation in Unified Multimodal Models with Understanding Supervision

cs.CV · 2026-05-07 · unverdicted · novelty 5.0

Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.

LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models

cs.CV · 2025-05-21 · unverdicted · novelty 5.0

LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.

citing papers explorer

Showing 5 of 5 citing papers.

CPC-VAR:Continual Personalized and Compositional Generation in Visual Autoregressive Models cs.CV · 2026-05-19 · unverdicted · none · ref 18
CPC-VAR adds Gradient-based Concept Neuron Selection for continual single-concept learning and a context-aware multi-branch composition strategy to reduce forgetting and entanglement in VAR-based personalized image generation.
SpatialForge: Bootstrapping 3D-Aware Spatial Reasoning from Open-World 2D Images cs.CV · 2026-05-12 · unverdicted · none · ref 46
SpatialForge synthesizes 10 million spatial QA pairs from in-the-wild 2D images to train VLMs for better depth ordering, layout, and viewpoint-dependent reasoning.
Emu3.5: Native Multimodal Models are World Learners cs.CV · 2025-10-30 · unverdicted · none · ref 46
Emu3.5 is a native multimodal world model pre-trained on over 10 trillion vision-language tokens with next-token prediction, post-trained via reinforcement learning, and accelerated by Discrete Diffusion Adaptation for efficient interleaved generation and world exploration.
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision cs.CV · 2026-05-07 · unverdicted · none · ref 23
Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.
LENS: Multi-level Evaluation of Multimodal Reasoning with Large Language Models cs.CV · 2025-05-21 · unverdicted · none · ref 45
LENS is a new multi-level benchmark dataset for evaluating MLLMs on perception-to-reasoning tasks using the same images across all levels with recent social media content.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer