Chatrex: Taming multimodal llm for joint perception and understanding

Qing Jiang, Gen Luo, Yuqin Yang, Yuda Xiong, Yihao Chen, Zhaoyang Zeng, Tianhe Ren, Lei Zhang · 2024 · arXiv 2411.18363

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues

cs.CV · 2026-04-27 · unverdicted · novelty 6.0

Language-guided semantic cues from MLLM visual pipelines, steered by text embeddings, refine object semantics and boost grounding accuracy against occlusion and small objects.

Grounding Everything in Tokens for Multimodal Large Language Models

cs.CV · 2025-12-11 · unverdicted · novelty 5.0

GETok partitions images with grid tokens and refines locations via offset tokens to enable better native 2D spatial reasoning in MLLMs.

citing papers explorer

Showing 2 of 2 citing papers.

Robust Grounding with MLLMs Against Occlusion and Small Objects via Language-Guided Semantic Cues cs.CV · 2026-04-27 · unverdicted · none · ref 13
Language-guided semantic cues from MLLM visual pipelines, steered by text embeddings, refine object semantics and boost grounding accuracy against occlusion and small objects.
Grounding Everything in Tokens for Multimodal Large Language Models cs.CV · 2025-12-11 · unverdicted · none · ref 18
GETok partitions images with grid tokens and refines locations via offset tokens to enable better native 2D spatial reasoning in MLLMs.

Chatrex: Taming multimodal llm for joint perception and understanding

fields

years

verdicts

representative citing papers

citing papers explorer