Pith Number

pith:FZIS5C7S

pith:2026:FZIS5C7SSQGNSRPQRNPIPZTHXX

not attested not anchored not stored refs resolved

MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval

Amir Mohammad Vahedi, Anna C. Doris, Daniele Grandi, Faez Ahmed, Hoang Anh Nguyen, Hongyi Xu, Kiarash Naghavi Khanghah

A multimodal retrieval framework improves accuracy on engineering document questions by 41 percent relative to standard RAG.

arxiv:2604.09552 v1 · 2026-01-31 · cs.IR · cs.AI · cs.CL

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{FZIS5C7SSQGNSRPQRNPIPZTHXX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluation on the DesignQA benchmark illustrates that this system improves average accuracy across all tasks with a relative gain of +41.1% from baseline RAG best results, which is a significant improvement in multimodal and reasoning-intensive tasks without complete rulebook ingestion.

C2weakest assumption

That ColPali retrieval plus the four hand-designed reasoning modes will generalize beyond the DesignQA benchmark and that the reported accuracy lift is not driven by benchmark-specific tuning or post-hoc pipeline selection.

C3one line summary

MCERF delivers a 41.1% relative accuracy gain on the DesignQA benchmark by combining ColPali vision-language retrieval with four specialized reasoning modes and dynamic routing.

References

56 extracted · 56 resolved · 4 Pith anchors

[1] Designqa: A multimodal benchmark for evaluating large language models’ understanding of engineering documentation, 2025

[2] Generative Models for Multimodal Docu- ment Understanding, 2023

[3] Layout-Aware Pre-training for Visually Rich Document Understanding, 2022

[4] ColPali: Efficient Document Retrieval with Vision Language Models 2024 · arXiv:2407.01449

[5] A Comprehensive Review of Vision- Language Models, 2023

Receipt and verification

First computed	2026-05-28T01:04:39.899281Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

2e512e8bf2940cd945f08b5e87e667bdd1497fc6fc56f92ec2bdd84acefbc6f6

Aliases

arxiv: 2604.09552 · arxiv_version: 2604.09552v1 · doi: 10.48550/arxiv.2604.09552 · pith_short_12: FZIS5C7SSQGN · pith_short_16: FZIS5C7SSQGNSRPQ · pith_short_8: FZIS5C7S

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/FZIS5C7SSQGNSRPQRNPIPZTHXX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2e512e8bf2940cd945f08b5e87e667bdd1497fc6fc56f92ec2bdd84acefbc6f6

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "5522691e294366b0398d3a66d0394dc7f76e5abeb560f03b1dbbd45de75b51f2",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.IR",
    "submitted_at": "2026-01-31T03:09:47Z",
    "title_canon_sha256": "02ad5f6e574ba78d567f088960071492a2cfbb8eb91441fff54233dd80f63c10"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.09552",
    "kind": "arxiv",
    "version": 1
  }
}