Vidi2.5: Large Multimodal Models for Video Understanding and Creation.arXiv preprint arXiv:2511.19529, 2026

Vidi Team, Chia-Wen Kuo, Chuang Huang, Dawei Du, Fan Chen, Fanding Lei, Feng Gao, Guang Chen, Haoji Zhang, Haojun Zhao, Jin Liu, Jingjing Zhuge, Lili Fang, Lingxi Zhang, Longyin Wen, Lu Guo, Lu Xu, Lusha Li, Qihang Fan, Rachel Deng, Shaobo · 2026 · arXiv 2511.19529

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models

cs.CV · 2026-07-02 · unverdicted · novelty 7.0

AnyGroundBench is a domain-adaptation benchmark for spatio-temporal video grounding across animal, industry, sports, surgery, and public security domains that finds 15 state-of-the-art VLMs fail in zero-shot and ICL settings.

citing papers explorer

Showing 1 of 1 citing paper.

AnyGroundBench: A Specialized-Domain Benchmark for Video Grounding in Vision-Language Models cs.CV · 2026-07-02 · unverdicted · none · ref 59
AnyGroundBench is a domain-adaptation benchmark for spatio-temporal video grounding across animal, industry, sports, surgery, and public security domains that finds 15 state-of-the-art VLMs fail in zero-shot and ICL settings.

Vidi2.5: Large Multimodal Models for Video Understanding and Creation.arXiv preprint arXiv:2511.19529, 2026

fields

years

verdicts

representative citing papers

citing papers explorer