To enhance diversity, two captions per element are randomly selected from the available set of functional captions during data construction

Widget Caption: For each element in the training set, multiple functional captions are provided

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

cs.AI · 2024-10-07 · conditional · novelty 7.0

UGround is a universal visual grounding model for GUI agents that uses only screenshots to locate elements and outperforms existing agents despite lacking text-based inputs.

citing papers explorer

Showing 1 of 1 citing paper.

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents cs.AI · 2024-10-07 · conditional · none · ref 3
UGround is a universal visual grounding model for GUI agents that uses only screenshots to locate elements and outperforms existing agents despite lacking text-based inputs.

To enhance diversity, two captions per element are randomly selected from the available set of functional captions during data construction

fields

years

verdicts

representative citing papers

citing papers explorer