Llava-uhd: an lmm perceiving any aspect ratio and high- resolution images

Zonghao Guo, Ruyi Xu, Yuan Yao, Junbo Cui, Zanlin Ni, Chunjiang Ge, Tat-Seng Chua, Zhiyuan Liu, Gao Huang · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

MiniCPM-o 4.5 uses the Omni-Flow streaming framework to deliver real-time full-duplex omni-modal interaction with proactive behavior in a 9B model that approaches Gemini 2.5 Flash performance.

MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe

cs.LG · 2025-09-16 · unverdicted · novelty 5.0

An 8B MLLM reaches state-of-the-art efficiency and performance under 30B by combining a unified 3D resampler, joint document-text training, and hybrid RL for reasoning modes.

citing papers explorer

Showing 2 of 2 citing papers.

MiniCPM-o 4.5: Towards Real-Time Full-Duplex Omni-Modal Interaction cs.CL · 2026-04-30 · unverdicted · none · ref 5
MiniCPM-o 4.5 uses the Omni-Flow streaming framework to deliver real-time full-duplex omni-modal interaction with proactive behavior in a 9B model that approaches Gemini 2.5 Flash performance.
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe cs.LG · 2025-09-16 · unverdicted · none · ref 13
An 8B MLLM reaches state-of-the-art efficiency and performance under 30B by combining a unified 3D resampler, joint document-text training, and hybrid RL for reasoning modes.

Llava-uhd: an lmm perceiving any aspect ratio and high- resolution images

fields

years

verdicts

representative citing papers

citing papers explorer