Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

Daiqing Wu; Haoyu Wen; Heyan Huang; Huaxi Ai; Jiashu Yao; Wangke Chen; Yuhang Guo; Zeming Liu

arxiv: 2606.04701 · v1 · pith:J7U4NC36new · submitted 2026-06-03 · 💻 cs.CV · cs.CL

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

Jiashu Yao , Heyan Huang , Daiqing Wu , Wangke Chen , Huaxi Ai , Haoyu Wen , Zeming Liu , Yuhang Guo This is my paper

classification 💻 cs.CV cs.CL

keywords agentsshort-videoliving-screen-nativelivingscreenplatformstaskaccuracyactions

0 comments

read the original abstract

GUI agents today assume a static screen, where the world is frozen between two actions. However, real interfaces such as short-video applications violate this assumption, as their content keeps playing, and a competent user must decide what to watch and for how long. We formalize this task as Living-Screen-Native GUI agents and introduce LivingScreen, the first benchmark instantiating it on short-video platforms, with a faithful browser-based environment, a three-tier task suite, and metrics that jointly score accuracy and information efficiency. Evaluating extensive frontier models, we find that none reaches the human cost-accuracy performance, and that their dominant failure mode is over- and under-observation, pointing to observation control as a missing capability axis for future GUI agents. All data and code will be available at https://github.com/BITHLP/LivingScreen.

This paper has not been read by Pith yet.

Benchmarking Living-Screen-Native GUI Agents on Short-Video Platforms

discussion (0)