pith. sign in
Pith Number

pith:FZIS5C7S

pith:2026:FZIS5C7SSQGNSRPQRNPIPZTHXX
not attested not anchored not stored refs resolved

MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval

Amir Mohammad Vahedi, Anna C. Doris, Daniele Grandi, Faez Ahmed, Hoang Anh Nguyen, Hongyi Xu, Kiarash Naghavi Khanghah

A multimodal retrieval framework improves accuracy on engineering document questions by 41 percent relative to standard RAG.

arxiv:2604.09552 v1 · 2026-01-31 · cs.IR · cs.AI · cs.CL

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{FZIS5C7SSQGNSRPQRNPIPZTHXX}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Evaluation on the DesignQA benchmark illustrates that this system improves average accuracy across all tasks with a relative gain of +41.1% from baseline RAG best results, which is a significant improvement in multimodal and reasoning-intensive tasks without complete rulebook ingestion.

C2weakest assumption

That ColPali retrieval plus the four hand-designed reasoning modes will generalize beyond the DesignQA benchmark and that the reported accuracy lift is not driven by benchmark-specific tuning or post-hoc pipeline selection.

C3one line summary

MCERF delivers a 41.1% relative accuracy gain on the DesignQA benchmark by combining ColPali vision-language retrieval with four specialized reasoning modes and dynamic routing.

References

56 extracted · 56 resolved · 4 Pith anchors

[1] Designqa: A multimodal benchmark for evaluating large language models’ understanding of engineering documentation, 2025
[2] Generative Models for Multimodal Docu- ment Understanding, 2023
[3] Layout-Aware Pre-training for Visually Rich Document Understanding, 2022
[4] ColPali: Efficient Document Retrieval with Vision Language Models 2024 · arXiv:2407.01449
[5] A Comprehensive Review of Vision- Language Models, 2023
Receipt and verification
First computed 2026-05-28T01:04:39.899281Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

2e512e8bf2940cd945f08b5e87e667bdd1497fc6fc56f92ec2bdd84acefbc6f6

Aliases

arxiv: 2604.09552 · arxiv_version: 2604.09552v1 · doi: 10.48550/arxiv.2604.09552 · pith_short_12: FZIS5C7SSQGN · pith_short_16: FZIS5C7SSQGNSRPQ · pith_short_8: FZIS5C7S
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/FZIS5C7SSQGNSRPQRNPIPZTHXX \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 2e512e8bf2940cd945f08b5e87e667bdd1497fc6fc56f92ec2bdd84acefbc6f6
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "5522691e294366b0398d3a66d0394dc7f76e5abeb560f03b1dbbd45de75b51f2",
    "cross_cats_sorted": [
      "cs.AI",
      "cs.CL"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.IR",
    "submitted_at": "2026-01-31T03:09:47Z",
    "title_canon_sha256": "02ad5f6e574ba78d567f088960071492a2cfbb8eb91441fff54233dd80f63c10"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2604.09552",
    "kind": "arxiv",
    "version": 1
  }
}