An early evaluation of gpt-4v (ision).arXiv preprint:2310.16534

An early evaluation of gpt-4v (ision) · 2023 · arXiv 2310.16534

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation

cs.CL · 2023-11-13 · unverdicted · novelty 6.0

AMBER is an LLM-free multi-dimensional benchmark for evaluating hallucinations in MLLMs across generative and discriminative tasks.

MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models

cs.CV · 2023-06-23 · unverdicted · novelty 6.0

MME is a manually annotated benchmark evaluating MLLMs on perception and cognition across 14 subtasks to avoid data leakage and support fair model comparisons.

citing papers explorer

Showing 2 of 2 citing papers.

AMBER: An LLM-free Multi-dimensional Benchmark for MLLMs Hallucination Evaluation cs.CL · 2023-11-13 · unverdicted · none · ref 10
AMBER is an LLM-free multi-dimensional benchmark for evaluating hallucinations in MLLMs across generative and discriminative tasks.
MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models cs.CV · 2023-06-23 · unverdicted · none · ref 50
MME is a manually annotated benchmark evaluating MLLMs on perception and cognition across 14 subtasks to avoid data leakage and support fair model comparisons.

An early evaluation of gpt-4v (ision).arXiv preprint:2310.16534

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer