pith. sign in

Recoverable Identifier

arXiv:2605.18630 · detector doi_compliance · incontrovertible · 2026-05-20 10:42:35.791125+00:00

advisory doi_compliance recoverable_identifier

DOI in the printed bibliography is fragmented by whitespace or line breaks. A longer candidate (10.48550/arxiv.2307.13854.URLhttps://openreview.net/forum?id=oKn9c6ytLx.15) was visible in the surrounding text but could not be confirmed against doi.org as printed.

Paper page Integrity report arXiv Try DOI

Evidence text

Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. WebArena: A realistic web environment for building autonomous agents. InInternational Conference on Learning Representations, 2024. doi: 10.48550/arxiv. 2307.13854. URLhttps://openreview.net/forum?id=oKn9c6ytLx. 15 A Limitations SCICONVBENCHis limited to four computational-science domains and to English-language, text- only prompts at undergraduate-to-early-graduate difficulty; the absolute numbers therefore should not be extrapolated to other domains, modalities, or research-level tasks. The dataset contains roughly 1,000 cases, reflecting the fact that scientific task-formulation data are sparse and substantially harder to construct than standard NLP corpora. We have not yet conducted a human clarification study. B Broader impacts SCICONVBENCHmay have positive societal effects by helping researchers and developers identify silent assumptions in scientific AI workflows before they lead to difficult-to-audit or irreproducible computational results. By measuring whether models ask clarifying questions before finalizing a task specification, the benchmark is intended to support more reliable human-AI interaction in scientific settings. The main negative impact is the possible misuse or overinterpretation of benchmark scores. High performance on SCICONVBENCHcould be taken as evidence that an assistant is ready for autonomous scientific

Evidence payload

{
  "printed_excerpt": "Shuyan Zhou, Frank F. Xu, Hao Zhu, Xuhui Zhou, Robert Lo, Abishek Sridhar, Xianyi Cheng, Yonatan Bisk, Daniel Fried, Uri Alon, and Graham Neubig. WebArena: A realistic web environment for building autonomous agents. InInternational Conferen",
  "reconstructed_doi": "10.48550/arxiv.2307.13854.URLhttps://openreview.net/forum?id=oKn9c6ytLx.15",
  "ref_index": 84,
  "resolved_title": null,
  "verdict_class": "incontrovertible"
}