BrowseComp-ZH is a new benchmark of 289 Chinese web questions where even the strongest LLM agents reach only 42.9% accuracy.
Advances in Neural Information Processing Systems 36 (2023), 52430–52452
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
BrowseComp-ZH: Benchmarking Web Browsing Ability of Large Language Models in Chinese
BrowseComp-ZH is a new benchmark of 289 Chinese web questions where even the strongest LLM agents reach only 42.9% accuracy.