PropGen automates property generation for Android app testing via LLM synthesis from guided exploration and feedback refinement, yielding 912 valid properties and 25 previously unknown bugs across 12 apps.
hub Canonical reference
Droidbot-gpt: Gpt-powered UI automation for android.CoRR, abs/2304.07061
Canonical reference. 83% of citing Pith papers cite this work as background.
hub tools
citation-role summary
citation-polarity summary
representative citing papers
Proactive multi-window state triggering plus Set-of-Mark alignment and multimodal LLM reasoning detects GUI defects in Android apps, reporting 184% more text truncation, 87.2% F1 on occlusion, and 40 defect-prone apps at 10% FPR.
A new benchmark shows LLM smartphone agents achieve comparable success with screen text alone as with screenshots, but both fail often due to UI accessibility and reasoning gaps.
SkillDroid compiles LLM-guided GUI trajectories into parameterized skill templates and replays them via a matching cascade, reaching 85.3% success rate with 49% fewer LLM calls and improving from 87% to 91% over 150 rounds while the stateless baseline drops to 44%.
DynamicsLLM uses LLMs to generate execution traces that cover three times more code smell-related events than the prior Dynamics tool on 333 F-Droid Android apps, with a hybrid method adding 25.9% coverage for low-activity apps.
LDMDroid applies LLMs in a state-aware process to trigger data manipulation functions and uses visual cues to detect errors, finding 17 bugs across 24 Android apps with 14 developer confirmations.
A survey of 838 users finds that efficiency is rated as highly important for mobile app usability while seven other PACMAD+3 factors are rated moderately important.
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.
This survey discusses key components and challenges for Personal LLM Agents and reviews solutions for their capability, efficiency, and security.
citing papers explorer
-
From Exploration to Specification: LLM-Based Property Generation for Mobile App Testing
PropGen automates property generation for Android app testing via LLM synthesis from guided exploration and feedback refinement, yielding 912 valid properties and 25 previously unknown bugs across 12 apps.
-
Proactive Detection of GUI Defects in Multi-Window Scenarios via Multimodal Reasoning
Proactive multi-window state triggering plus Set-of-Mark alignment and multimodal LLM reasoning detects GUI defects in Android apps, reporting 184% more text truncation, 87.2% F1 on occlusion, and 40 defect-prone apps at 10% FPR.
-
Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. Screenshots
A new benchmark shows LLM smartphone agents achieve comparable success with screen text alone as with screenshots, but both fail often due to UI accessibility and reasoning gaps.
-
SkillDroid: Compile Once, Reuse Forever
SkillDroid compiles LLM-guided GUI trajectories into parameterized skill templates and replays them via a matching cascade, reaching 85.3% success rate with 49% fewer LLM calls and improving from 87% to 91% over 150 rounds while the stateless baseline drops to 44%.
-
DynamicsLLM: a Dynamic Analysis-based Tool for Generating Intelligent Execution Traces Using LLMs to Detect Android Behavioural Code Smells
DynamicsLLM uses LLMs to generate execution traces that cover three times more code smell-related events than the prior Dynamics tool on 333 F-Droid Android apps, with a hybrid method adding 25.9% coverage for low-activity apps.
-
LDMDroid: Leveraging LLMs for Detecting Data Manipulation Errors in Android Apps
LDMDroid applies LLMs in a state-aware process to trigger data manipulation functions and uses visual cues to detect errors, finding 17 bugs across 24 Android apps with 14 developer confirmations.
-
A survey on factors influencing mobile application usability through the lens of PACMAD+3 model
A survey of 838 users finds that efficiency is rated as highly important for mobile app usability while seven other PACMAD+3 factors are rated moderately important.
-
A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions
A survey of 87 agents for computer use and 33 datasets that introduces a three-dimensional taxonomy across domain, interaction, and agent perspectives and identifies six research gaps.
-
Large Language Model-Based Agents for Software Engineering: A Survey
A literature survey that collects and categorizes 124 papers on LLM-based agents for software engineering from SE and agent perspectives.
-
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
This survey discusses key components and challenges for Personal LLM Agents and reviews solutions for their capability, efficiency, and security.