arXiv preprint arXiv:2510.13394 , year=

Spatial-DISE: A Unified Benchmark for Evaluating Spatial Reasoning in Vision-Language Models , author= · arXiv 2510.13394

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows?

cs.AI · 2026-05-15 · unverdicted · novelty 7.0

SaaS-Bench provides 106 realistic professional tasks across 23 deployable SaaS platforms to evaluate LLM-based agents, finding that even the strongest models complete fewer than 4% of tasks end-to-end.

citing papers explorer

Showing 1 of 1 citing paper.

SaaS-Bench: Can Computer-Use Agents Leverage Real-World SaaS to Solve Professional Workflows? cs.AI · 2026-05-15 · unverdicted · none · ref 76
SaaS-Bench provides 106 realistic professional tasks across 23 deployable SaaS platforms to evaluate LLM-based agents, finding that even the strongest models complete fewer than 4% of tasks end-to-end.

arXiv preprint arXiv:2510.13394 , year=

fields

years

verdicts

representative citing papers

citing papers explorer