We filter out any actions that do not have associated coordinate data, ensuring that only steps with specific visual grounding targets are included in the dataset

AndroidControl: Similarly, we use the human-annotated actions from the training set

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

cs.AI · 2024-10-07 · conditional · novelty 7.0

UGround is a universal visual grounding model for GUI agents that uses only screenshots to locate elements and outperforms existing agents despite lacking text-based inputs.

citing papers explorer

Showing 1 of 1 citing paper.

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents cs.AI · 2024-10-07 · conditional · none · ref 2
UGround is a universal visual grounding model for GUI agents that uses only screenshots to locate elements and outperforms existing agents despite lacking text-based inputs.

We filter out any actions that do not have associated coordinate data, ensuring that only steps with specific visual grounding targets are included in the dataset

fields

years

verdicts

representative citing papers

citing papers explorer