pith. machine review for the scientific record. sign in

arxiv: 2506.05079 · v4 · submitted 2025-06-05 · 💻 cs.SE

Recognition: unknown

Scenario-Guided LLM-based Mobile App GUI Testing

Authors on Pith no claims yet
classification 💻 cs.SE
keywords testingapproachesscenariosbusinesscompletionexecutioninformationllm-based
0
0 comments X
read the original abstract

The assurance of mobile app GUI has become increasingly important, as the GUI serves as the primary medium of interaction between users and apps. Although numerous automated GUI testing approaches have been developed with diverse strategies, a substantial gap remains between these approaches and the underlying app business logic. Most existing approaches focus on general exploration rather than the completion of specific testing scenarios, often resulting in missed coverage of critical functionalities. Inspired by the manual testing process, which treats business logic, driven testing scenarios as the fundamental unit of testing, this paper introduces an approach that leverages large language models (LLMs) to comprehend the semantics expressed in app GUIs and their contextual relevance to given testing scenarios. Building upon this capability, we propose ScenGen, a novel scenario-guided LLM-based GUI testing framework that employs a multi-agent collaboration mechanism to simulate and automate the phases of manual testing. ScenGen integrates five agents. The Observer perceives the app GUI state by extracting and structuring GUI widgets and layouts, thereby interpreting the semantic information presented in the GUI. This information is then passed to the Decider, which makes scenario-driven decisions with the guidance of LLMs to identify target widgets and determine appropriate actions toward fulfilling specific testing goals. The Executor executes the decided operations on the app, while the Supervisor verifies whether the execution results align with the intended testing scenario completion, ensuring traceability and consistency in test generation and execution. Finally, the Recorder records the corresponding GUI operations into the context memory as a knowledge base for subsequent decision-making and concurrently monitors runtime bug occurrences.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Towards Automated Crowdsourced Testing via Personified-LLM

    cs.SE 2026-03 unverdicted novelty 6.0

    PersonaTester uses LLMs guided by three-dimensional personas to replicate crowdworker testing patterns, yielding higher behavioral consistency, variability, and more bug detections than baseline LLM agents.

  2. WebMAC: A Multi-Agent Collaborative Framework for Scenario Testing of Web Systems

    cs.SE 2026-04 unverdicted novelty 5.0

    WebMAC uses three specialized multi-agent modules to clarify test scenarios, partition them for adequacy, and generate executable scripts, yielding 30-60% higher success rates and 29% better efficiency than SOTA on fo...