pith. sign in
Pith Number

pith:NZL3RTPY

pith:2026:NZL3RTPYQXENOQKH7VSSR442QP
not attested not anchored not stored refs pending

MCP-Atlas: A Large-Scale Benchmark for Tool-Use Competency with Real MCP Servers

Andrew Park, Ben Hertzberg, Ben Levin, Bing Liu, Brad Kenstler, Chaithanya Bandi, Chetan Rane, Daniel Yue Zhang, Dan Rambado, Divyansh Agarwal, Ernesto Gabriel Hernandez Montoya, Geobio Boo, HiJae Kim, Ivan Salazar, Jeff Da, Manasi Sharma, Martin Dimakis, MohammadHossein Rezaei, Rafael Cruz, Razvan-Gabriel Dumitru, Sami Hassaan, Tejas Polakam, Vipul Gupta

MCP-Atlas introduces a benchmark with 36 real MCP servers, 220 tools, and 1,000 multi-step tasks to evaluate LLM tool-use competency.

arxiv:2602.00933 v3 · 2026-01-31 · cs.SE · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NZL3RTPYQXENOQKH7VSSR442QP}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

We introduce MCP-Atlas, a large-scale benchmark for evaluating tool-use competency, comprising 36 real MCP servers and 220 tools. It includes 1,000 tasks designed to assess tool-use competency in realistic, multi-step workflows. Evaluation results on frontier models reveal that top models achieve pass rates exceeding 50%.

C2weakest assumption

The claims-based rubric and internal diagnostics accurately measure genuine tool-use competency rather than surface-level answer matching or prompt-specific patterns.

C3one line summary

MCP-Atlas introduces a benchmark of 36 real MCP servers, 220 tools, and 1,000 natural-language tasks to measure LLM tool-use competency in multi-server workflows.

Cited by

5 papers in Pith

Receipt and verification
First computed 2026-05-21T01:04:21.964990Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

6e57b8cdf885c8d74147fd6528f39a83ccfb206273747ece7b146614becba0c8

Aliases

arxiv: 2602.00933 · arxiv_version: 2602.00933v3 · doi: 10.48550/arxiv.2602.00933 · pith_short_12: NZL3RTPYQXEN · pith_short_16: NZL3RTPYQXENOQKH · pith_short_8: NZL3RTPY
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NZL3RTPYQXENOQKH7VSSR442QP \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6e57b8cdf885c8d74147fd6528f39a83ccfb206273747ece7b146614becba0c8
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "cdd0676ecc194513dd6db5d09f33907f3323583685677fa107dfc90e827e5ab9",
    "cross_cats_sorted": [
      "cs.AI"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.SE",
    "submitted_at": "2026-01-31T23:19:39Z",
    "title_canon_sha256": "79eaa429ece9a131715aed626d552315f06f368e15ca4d251cc9efa9c7fa6686"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2602.00933",
    "kind": "arxiv",
    "version": 3
  }
}