PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Cheng Cui, Ting Sun, Suyin Liang, Tingquan Gao, Zelun Zhang, Jiaxuan Liu, Xueqing Wang, Changda Zhou, Hongen Liu, Manhui Lin, Yue Zhang, Yubo Zhang, Handong Zheng, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, Yanjun Ma · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation

cs.CV · 2026-05-19 · unverdicted · novelty 7.0

MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation featuring four dimensions, challenging scenarios, and an adaptive hybrid evaluation framework that achieves 91.5% Spearman correlation with human judgments.

citing papers explorer

Showing 1 of 1 citing paper after filters.

MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation cs.CV · 2026-05-19 · unverdicted · none · ref 12
MSAVBench is the first comprehensive benchmark for multi-shot audio-video generation featuring four dimensions, challenging scenarios, and an adaptive hybrid evaluation framework that achieves 91.5% Spearman correlation with human judgments.

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

fields

years

verdicts

representative citing papers

citing papers explorer