Learnact: Few-shot mobile gui agent with a unified demonstration benchmark, 2025 a

Learnact: Few-shot mobile gui agent with a unified demonstration benchmark , author= · 2025 · arXiv 2504.13805

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

cs.AI · 2025-09-08 · conditional · novelty 7.0

MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.

MetaPS: Adaptive Programmatic Strategy Selection for Market Agents

cs.AI · 2026-06-21 · unverdicted · novelty 6.0

MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.

How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs

cs.AI · 2026-06-18 · unverdicted · novelty 5.0

Hierarchically grouped demonstrations raise pass rates from 76.7% to 90.7% on 43 vague-description tasks while flat logs show smaller non-significant gains.

SE-GA: Memory-Augmented Self-Evolution for GUI Agents

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

SE-GA combines Test-Time Memory Extension for dynamic context retrieval with Memory-Augmented Self-Evolution training to reach 89.0% on ScreenSpot and 75.8% on AndroidControl-High.

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

cs.AI · 2026-06-18 · unverdicted · novelty 4.0

Trajectory mining produces readable skill clusters with high purity but GRPO training on them improves skill-step accuracy only from 18.5% to 20.5% and underperforms frequency priors.

citing papers explorer

Showing 4 of 4 citing papers after filters.

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management cs.AI · 2026-04-15 · unverdicted · none · ref 28
RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.
MetaPS: Adaptive Programmatic Strategy Selection for Market Agents cs.AI · 2026-06-21 · unverdicted · none · ref 134
MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.
How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs cs.AI · 2026-06-18 · unverdicted · none · ref 8
Hierarchically grouped demonstrations raise pass rates from 76.7% to 90.7% on 43 vague-description tasks while flat logs show smaller non-significant gains.
Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining cs.AI · 2026-06-18 · unverdicted · none · ref 20
Trajectory mining produces readable skill clusters with high purity but GRPO training on them improves skill-step accuracy only from 18.5% to 20.5% and underperforms frequency priors.

Learnact: Few-shot mobile gui agent with a unified demonstration benchmark, 2025 a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer