Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S

Hanoona Rasheed, Muhammad Maaz, Sahal Shaji Mullappilly, Abdelrahman Shaker, Salman Khan, Hisham Cholakkal, Rao M · 2024 · arXiv 2311.03356

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1 method 1

citation-polarity summary

background 2

representative citing papers

AnchorSeg: Language Grounded Query Banks for Reasoning Segmentation

cs.CV · 2026-04-20 · unverdicted · novelty 7.0

AnchorSeg uses ordered query banks of latent reasoning tokens plus a spatial anchor token and a Token-Mask Cycle Consistency loss to achieve 67.7% gIoU and 68.1% cIoU on the ReasonSeg benchmark.

MINGLE: VLMs for Semantically Complex Region Detection in Urban Scenes

cs.CV · 2025-09-16 · unverdicted · novelty 6.0

MINGLE is a modular pipeline that combines off-the-shelf detection tools with VLM reasoning to localize socially connected groups in urban scenes and is supported by a new 100K-image dataset.

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

cs.CV · 2024-01-29 · conditional · novelty 6.0

MoE-LLaVA applies mixture-of-experts sparsity to LVLMs via MoE-Tuning, delivering LLaVA-1.5-7B level visual understanding and better hallucination resistance with only ~3B active parameters.

A Survey on Multimodal Large Language Models

cs.CV · 2023-06-23 · accept · novelty 3.0

This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

OmniSch: A Multimodal PCB Schematic Benchmark For Structured Diagram Visual Reasoning

cs.CV · 2026-03-31

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Survey on Multimodal Large Language Models cs.CV · 2023-06-23 · accept · none · ref 142
This survey organizes the architectures, training strategies, data, evaluation methods, extensions, and challenges of Multimodal Large Language Models.

Anwer, Erix Xing, Ming-Hsuan Yang, and Fahad S

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer