pith. sign in

hub Canonical reference

Gpt-4v in wonderland: Large multi- modal models for zero-shot smartphone gui navigation

Canonical reference. 83% of citing Pith papers cite this work as background.

15 Pith papers citing it
Background 83% of classified citations

hub tools

citation-role summary

background 5 method 1

citation-polarity summary

representative citing papers

SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

cs.HC · 2024-01-17 · unverdicted · novelty 6.0

SeeClick improves visual GUI agents via GUI grounding pre-training on automatically curated data and introduces the ScreenSpot benchmark, with results indicating that stronger grounding boosts downstream task performance.

DRS-GUI: Dynamic Region Search for Training-Free GUI Grounding

cs.AI · 2026-05-15 · unverdicted · novelty 5.0

DRS-GUI introduces a dynamic region search method with Focus/Shift/Scatter actions and MCTS-based planning that improves GUI grounding accuracy by 14% on ScreenSpot-Pro for both general and GUI-specific MLLMs without any training.

AppAgent: Multimodal Agents as Smartphone Users

cs.CV · 2023-12-21 · unverdicted · novelty 5.0

AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.

citing papers explorer

Showing 15 of 15 citing papers.