SEED-Bench is a new benchmark of 19K multiple-choice questions for evaluating generative comprehension in multimodal LLMs across 12 image and video dimensions.
something something
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
dataset 1polarities
use dataset 1representative citing papers
Seer, a transformer-based PIDM pre-trained on large robotic datasets like DROID, outperforms prior methods on simulation and real-world robotic manipulation benchmarks with gains up to 43%.
InternVideo combines masked video modeling and video-language contrastive learning into a single foundation model that reaches state-of-the-art results on 39 video datasets including 91.1% top-1 on Kinetics-400.
citing papers explorer
-
SEED-Bench: Benchmarking Multimodal LLMs with Generative Comprehension
SEED-Bench is a new benchmark of 19K multiple-choice questions for evaluating generative comprehension in multimodal LLMs across 12 image and video dimensions.
-
Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
Seer, a transformer-based PIDM pre-trained on large robotic datasets like DROID, outperforms prior methods on simulation and real-world robotic manipulation benchmarks with gains up to 43%.
-
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
InternVideo combines masked video modeling and video-language contrastive learning into a single foundation model that reaches state-of-the-art results on 39 video datasets including 91.1% top-1 on Kinetics-400.