MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents

Ruihan Chen , Qiming Li , Xiaocheng Feng , Weihong Zhong , Xiaoliang Yang , Yuxuan Gu , Zekun Zhou , Yunfei Lu

show 4 more authors

Haoyu Ren Kun Chen Dandan Tu Bing Qin

Authors on Pith no claims yet

classification 💻 cs.AI

keywords benchmarkscross-lingualagentscapabilitiesenglishgapsmultilingualnon-english

0 comments

read the original abstract

Large Vision-Language Models (LVLMs) have shown strong potential as multilingual Graphical User Interface (GUI) agents, as evidenced by existing GUI benchmarks. However, these benchmarks exhibit two primary limitations: (1) although Perception and Reasoning (P&R) capabilities are fundamental for GUI agents, current benchmarks lack fine-grained diagnostics to identify which specific capabilities lead to task failures, hindering targeted improvements; (2) existing benchmarks fail to provide a strictly aligned cross-lingual evaluation environment, introducing confounding factors that prevent isolating the language impact on GUI agent performance. To address these issues, we propose the Multilingual P&R GUI Benchmark (MPR-GUI-Bench), featuring strictly aligned environments across six languages and eight fine-grained P&R tasks. Our benchmark reveals consistent P&R gaps between English and non-English settings, particularly on reasoning-intensive tasks. To leverage the superior English P&R capabilities for bridging cross-lingual gaps, we identify layers sensitive to language and propose GUI-XLI, a GUI Cross-Lingual Intervention method that aligns non-English hidden states with their English counterparts at these layers during inference. Experiments show that GUI-XLI effectively reduces the cross-lingual gaps, with an average gain of 6.5% in non-English settings.

This paper has not been read by Pith yet.

MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents

discussion (0)