pith. sign in
Pith Number

pith:LC2EMAER

pith:2024:LC2EMAER6QLZIBSPQPF7M7YUSU
not attested not anchored not stored refs resolved

LogicVista: Multimodal LLM Logical Reasoning Benchmark in Visual Contexts

Edward Sun, Tianyu Liu, Wei Wang, Yijia Xiao

LogicVista provides a benchmark of 448 visual questions to evaluate logical reasoning in multimodal LLMs across five tasks and nine capabilities.

arxiv:2407.04973 v1 · 2024-07-06 · cs.AI · cs.CL · cs.CV · cs.LG

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{LC2EMAER6QLZIBSPQPF7M7YUSU}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

LogicVista assesses the integrated logical reasoning capabilities of MLLMs in visual contexts across 5 logical reasoning tasks encompassing 9 different capabilities using a sample of 448 multiple-choice questions.

C2weakest assumption

The 448 questions and their human-written reasoning annotations accurately and comprehensively capture general logical cognition abilities in visual contexts without significant selection bias or coverage gaps.

C3one line summary

LogicVista is a new benchmark dataset with 448 visual logic questions that evaluates multimodal LLMs on five reasoning tasks covering nine capabilities.

References

58 extracted · 58 resolved · 0 Pith anchors

[1] OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir 2024
[2] Flamingo: a visual language model for few-shot learning, 2022 2022
[4] Minigpt-4: Enhancing vision-language understanding with advanced large language models, 2023 2023
[5] A survey on multimodal large language models, 2023 2023
[6] Mme: A comprehensive evaluation benchmark for multimodal large language models, 2023 2023

Formal links

2 machine-checked theorem links

Cited by

36 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:49.330835Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

58b4460091f41794064f83cbf67f14950cd0c7e06fadcd306614670051539af6

Aliases

arxiv: 2407.04973 · arxiv_version: 2407.04973v1 · doi: 10.48550/arxiv.2407.04973 · pith_short_12: LC2EMAER6QLZ · pith_short_16: LC2EMAER6QLZIBSP · pith_short_8: LC2EMAER
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/LC2EMAER6QLZIBSPQPF7M7YUSU \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 58b4460091f41794064f83cbf67f14950cd0c7e06fadcd306614670051539af6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "0ab50ae4f366a54860372ccf4025a2176a88d4c211d7be483104ed2a2994079d",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.CV",
      "cs.LG"
    ],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2024-07-06T06:48:16Z",
    "title_canon_sha256": "ea00387c1c41a9eac9493f08e683880dd321e728c00c721a755ec6efd2b5e40c"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2407.04973",
    "kind": "arxiv",
    "version": 1
  }
}