Vintern-1b: An efficient multimodal large language model for vietnamese.arXiv preprint arXiv:2408.12480

Doan, K · 2024 · arXiv 2408.12480

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

cs.CV · 2024-12-06 · unverdicted · novelty 6.0

InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.

U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025

cs.CV · 2026-05-22 · unverdicted · novelty 3.0

U-CESE integrates three CESE modules into a unified clip-based pipeline with DAKE keyframe extraction and ReCap captioning to support consistent multimodal event retrieval across video sources.

Multimodal Contextualized Support for Enhancing Video Retrieval System

cs.CV · 2024-12-10 · unverdicted · novelty 3.0

Proposes a multimodal pipeline for video retrieval that incorporates information from multiple frames to enable higher-level abstraction beyond single-image object detection.

citing papers explorer

Showing 3 of 3 citing papers.

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling cs.CV · 2024-12-06 · unverdicted · none · ref 59
InternVL 2.5 is the first open-source MLLM to surpass 70% on the MMMU benchmark via model, data, and test-time scaling, with a 3.7-point gain from chain-of-thought reasoning.
U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025 cs.CV · 2026-05-22 · unverdicted · none · ref 9
U-CESE integrates three CESE modules into a unified clip-based pipeline with DAKE keyframe extraction and ReCap captioning to support consistent multimodal event retrieval across video sources.
Multimodal Contextualized Support for Enhancing Video Retrieval System cs.CV · 2024-12-10 · unverdicted · none · ref 4
Proposes a multimodal pipeline for video retrieval that incorporates information from multiple frames to enable higher-level abstraction beyond single-image object detection.

Vintern-1b: An efficient multimodal large language model for vietnamese.arXiv preprint arXiv:2408.12480

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer