Pith Number
pith:PQRUAG7C
pith:2025:PQRUAG7C2OCFREFZNGQJWSKI5I
not attested
not anchored
not stored
refs resolved
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese
A new benchmark shows most LLMs score below 20% when browsing the Chinese web for verifiable facts.
arxiv:2504.19314 v2 · 2025-04-27 · cs.CL
Record completeness
1
Bitcoin timestamp
2
Internet Archive
3
Author claim
· sign in to claim
4
Citations
5
Replications
Claims
C1strongest claim
Despite their strong conversational and retrieval capabilities, most models struggle severely: a large number achieve accuracy rates below 10%, and only a handful exceed 20%. Even the best-performing system, OpenAI's DeepResearch, reaches just 42.9%.
C2weakest assumption
The two-stage quality control protocol produces questions that are genuinely high-difficulty and have unique verifiable answers without hidden shortcuts or English leakage.
C3one line summary
BrowseComp-ZH is a new benchmark of 289 Chinese web questions where even the strongest LLM agents reach only 42.9% accuracy.
References
[1] From Local to Global: A Graph RAG Approach to Query-Focused Summarization
[2] arXiv preprint arXiv:2407.12468 (2024)
[3] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
[4] arXiv preprint arXiv:2411.19478 (2024)
[5] arXiv preprint arXiv:2502.15690 (2024)
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:12.902539Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519 (pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
7c23401be2d3845890b969a09b4948ea380c39eb900fe2bc41f90797b1ade5f8
Aliases
· ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/PQRUAG7C2OCFREFZNGQJWSKI5I \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7c23401be2d3845890b969a09b4948ea380c39eb900fe2bc41f90797b1ade5f8
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "78aa5bd77c508b2215bb6b4bdc00d4604a190ccd376fce9ce842115b491fc286",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CL",
"submitted_at": "2025-04-27T17:32:43Z",
"title_canon_sha256": "95280f255ef9a9626cd5d169bfca73a75d904d8baf0ab8b2a6939d97de709b07"
},
"schema_version": "1.0",
"source": {
"id": "2504.19314",
"kind": "arxiv",
"version": 2
}
}