Sail-vl2 technical report

Weijie Yin, Yongjie Ye, Fangxun Shu, Yue Liao, Zijian Kang, Hongyuan Dong, Haiyang Yu, Dingkang Yang, Jiacong Wang, Han Wang, et al · 2025 · arXiv 2509.14033

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs

cs.CV · 2025-11-18 · unverdicted · novelty 8.0

MVI-Bench supplies the first taxonomy and dataset focused on misleading visual inputs to measure LVLM robustness, with tests on 18 models revealing clear weaknesses.

VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text?

cs.CV · 2026-02-04 · conditional · novelty 7.0

VISTA-Bench shows vision-language models degrade on visualized text in images compared to equivalent pure text, with larger gaps under increased perceptual difficulty.

POINTS-Long: Adaptive Dual-Mode Visual Reasoning in MLLMs

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

POINTS-Long is a dual-mode multimodal large language model that uses dynamic visual token scaling to retain 97.7-99.7% accuracy on long-form tasks with 1/40 to 1/10th the tokens and supports streaming via detachable KV-cache.

Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval

cs.CV · 2026-03-13 · unverdicted · novelty 6.0

TPSNet combines CLIP text prompts and phase features as dual priors to deliver better semantic supervision and domain alignment than pseudo-label clustering in unsupervised cross-domain image retrieval.

OmniFysics: Towards Physical Intelligence Evolution via Omni-Modal Signal Processing and Network Optimization

cs.CV · 2026-02-05 · unverdicted · novelty 4.0

OmniFysics is an omni-modal network using a dynamic physical data engine and evolutive tuning to improve performance on multimodal benchmarks and physics-oriented tasks.

citing papers explorer

Showing 5 of 5 citing papers.

MVI-Bench: A Comprehensive Benchmark for Evaluating Robustness to Misleading Visual Inputs in LVLMs cs.CV · 2025-11-18 · unverdicted · none · ref 61
MVI-Bench supplies the first taxonomy and dataset focused on misleading visual inputs to measure LVLM robustness, with tests on 18 models revealing clear weaknesses.
VISTA-Bench: Do Vision-Language Models Really Understand Visualized Text as Well as Pure Text? cs.CV · 2026-02-04 · conditional · none · ref 19
VISTA-Bench shows vision-language models degrade on visualized text in images compared to equivalent pure text, with larger gaps under increased perceptual difficulty.
POINTS-Long: Adaptive Dual-Mode Visual Reasoning in MLLMs cs.CV · 2026-04-13 · unverdicted · none · ref 105
POINTS-Long is a dual-mode multimodal large language model that uses dynamic visual token scaling to retain 97.7-99.7% accuracy on long-form tasks with 1/40 to 1/10th the tokens and supports streaming via detachable KV-cache.
Text-Phase Synergy Network with Dual Priors for Unsupervised Cross-Domain Image Retrieval cs.CV · 2026-03-13 · unverdicted · none · ref 45
TPSNet combines CLIP text prompts and phase features as dual priors to deliver better semantic supervision and domain alignment than pseudo-label clustering in unsupervised cross-domain image retrieval.
OmniFysics: Towards Physical Intelligence Evolution via Omni-Modal Signal Processing and Network Optimization cs.CV · 2026-02-05 · unverdicted · none · ref 54
OmniFysics is an omni-modal network using a dynamic physical data engine and evolutive tuning to improve performance on multimodal benchmarks and physics-oriented tasks.

Sail-vl2 technical report

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer