Learnact: Few-shot mobile gui agent with a unified demonstration benchmark, 2025 a

Learnact: Few-shot mobile gui agent with a unified demonstration benchmark , author= · 2025 · arXiv 2504.13805

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 support 1

representative citing papers

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.

OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents

cs.CL · 2026-04-27 · unverdicted · novelty 7.0

OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.

RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.

MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents

cs.AI · 2025-09-08 · conditional · novelty 7.0

MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.

MetaPS: Adaptive Programmatic Strategy Selection for Market Agents

cs.AI · 2026-06-21 · unverdicted · novelty 6.0

MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.

Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

cs.LG · 2026-04-12 · unverdicted · novelty 6.0

Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.

How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs

cs.AI · 2026-06-18 · unverdicted · novelty 5.0

Hierarchically grouped demonstrations raise pass rates from 76.7% to 90.7% on 43 vague-description tasks while flat logs show smaller non-significant gains.

SE-GA: Memory-Augmented Self-Evolution for GUI Agents

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

SE-GA combines Test-Time Memory Extension for dynamic context retrieval with Memory-Augmented Self-Evolution training to reach 89.0% on ScreenSpot and 75.8% on AndroidControl-High.

Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining

cs.AI · 2026-06-18 · unverdicted · novelty 4.0

Trajectory mining produces readable skill clusters with high purity but GRPO training on them improves skill-step accuracy only from 18.5% to 20.5% and underperforms frequency priors.

citing papers explorer

Showing 9 of 9 citing papers.

Bridging VideoQA and Video-Guided Agentic Tasks via Generalized Keyframe Extraction cs.CV · 2026-06-28 · unverdicted · none · ref 32
Introduces VG-GUIBench benchmark and TASKER keyframe extraction algorithm that improves performance on VideoQA and video-guided agentic tasks.
OS-SPEAR: A Toolkit for the Safety, Performance,Efficiency, and Robustness Analysis of OS Agents cs.CL · 2026-04-27 · unverdicted · none · ref 70
OS-SPEAR is a new evaluation toolkit that tests 22 OS agents and identifies trade-offs between efficiency and safety or robustness.
RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management cs.AI · 2026-04-15 · unverdicted · none · ref 28
RiskWebWorld is the first realistic interactive benchmark for GUI agents in e-commerce risk management, revealing a large gap between generalist and specialized models plus RL gains.
MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents cs.AI · 2025-09-08 · conditional · none · ref 22
MAS-Bench introduces 139 tasks, 88 predefined shortcuts, and 9 metrics to evaluate hybrid GUI-shortcut mobile agents, reporting up to 68.3% success and 39% efficiency gains over GUI-only baselines.
MetaPS: Adaptive Programmatic Strategy Selection for Market Agents cs.AI · 2026-06-21 · unverdicted · none · ref 134
MetaPS trains models via simulation rollouts to select from programmatic strategy libraries for market agents, yielding better performance than fixed or direct LLM baselines across model sizes.
Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents cs.LG · 2026-04-12 · unverdicted · none · ref 15
Skill-SD turns an agent's completed trajectories into dynamic natural-language skills that condition only the teacher in self-distillation, yielding 14-42% gains over RL and OPSD baselines on multi-turn agent benchmarks.
How Should Agents Read Demonstrations? Hierarchical Structure Beats Flat Action Logs cs.AI · 2026-06-18 · unverdicted · none · ref 8
Hierarchically grouped demonstrations raise pass rates from 76.7% to 90.7% on 43 vague-description tasks while flat logs show smaller non-significant gains.
SE-GA: Memory-Augmented Self-Evolution for GUI Agents cs.LG · 2026-05-16 · unverdicted · none · ref 21
SE-GA combines Test-Time Memory Extension for dynamic context retrieval with Memory-Augmented Self-Evolution training to reach 89.0% on ScreenSpot and 75.8% on AndroidControl-High.
Automating SKILL.md Generation for Computer-Using Agents via Interaction Trajectory Mining cs.AI · 2026-06-18 · unverdicted · none · ref 20
Trajectory mining produces readable skill clusters with high purity but GRPO training on them improves skill-step accuracy only from 18.5% to 20.5% and underperforms frequency priors.

Learnact: Few-shot mobile gui agent with a unified demonstration benchmark, 2025 a

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer