Title resolution pending

Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

GS-QA: A Benchmark for Geospatial Question Answering

cs.DB · 2026-05-21 · unverdicted · novelty 7.0

GS-QA is a new benchmark of 2,800 QA pairs on 28 templates using OSM and Wikipedia data to evaluate LLMs on spatial predicates, multi-source reasoning, and diverse answer types including distances and counts.

Training Computer Use Agents to Assess the Usability of Graphical User Interfaces

cs.CL · 2026-04-28 · unverdicted · novelty 7.0

uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.

citing papers explorer

Showing 2 of 2 citing papers.

GS-QA: A Benchmark for Geospatial Question Answering cs.DB · 2026-05-21 · unverdicted · none · ref 41
GS-QA is a new benchmark of 2,800 QA pairs on 28 templates using OSM and Wikipedia data to evaluate LLMs on spatial predicates, multi-source reasoning, and diverse answer types including distances and counts.
Training Computer Use Agents to Assess the Usability of Graphical User Interfaces cs.CL · 2026-04-28 · unverdicted · none · ref 41
uxCUA is a trained computer use agent that assesses GUI usability more accurately than larger models by learning to prioritize and execute important user interactions on labeled interface datasets.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer