Os-genesis: Automating gui agent trajectory construction via reverse task synthesis

Qiushi Sun, Kanzhi Cheng, Zichen Ding, Chuanyang Jin, Yian Wang, Fangzhi Xu, Zhenyu Wu, Chengyou Jia, Liheng Chen, Zhoumianze Liu, Ben Kao, Guohao Li, Junxian He, Yu Qiao, Zhiyong Wu · 2024 · arXiv 2412.19723

11 Pith papers cite this work. Polarity classification is still indexing.

11 Pith papers citing it

read on arXiv browse 11 citing papers

citation-role summary

method 1

citation-polarity summary

background 1

representative citing papers

ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories

cs.CL · 2026-06-09 · conditional · novelty 7.0

ISE creates 23,132 execution-grounded multi-turn OS agent trajectories via intent simulation and live execution, improving agent performance on ClawEval from 19.3 to 37.7 pass@1 with Qwen3-8B.

FedGUI: Benchmarking Federated GUI Agents across Heterogeneous Platforms, Devices, and Operating Systems

cs.MA · 2026-04-16 · unverdicted · novelty 7.0

FedGUI is the first comprehensive benchmark for federated GUI agents that studies cross-platform, cross-device, cross-OS, and cross-source heterogeneity, with experiments showing performance gains from cross-platform collaboration and identifying platform and OS as the most influential factors.

Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration

cs.AI · 2026-05-29 · unverdicted · novelty 6.0

SCALE introduces three adversarial roles (Selector, Predictor, Judger) and a graph exploration method (SCALE-Hop) to enable MLLM-based web agents to self-discover limitations and improve, backed by the SCALE-20k dataset from 19 websites.

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization

cs.LG · 2026-04-15 · unverdicted · novelty 6.0

UI-Copilot adds a selective copilot for memory and math to GUI agents and trains tool use with separate single-turn and multi-turn optimization, yielding SOTA results on MemGUI-Bench and a 17.1% gain on AndroidWorld.

RISK: A Framework for GUI Agents in E-commerce Risk Management

cs.AI · 2025-09-26 · unverdicted · novelty 6.0

RISK introduces a dataset, benchmark, and R1-style RL fine-tuning for GUI agents that achieve 6.8-8.8% offline gains and 70.5% online task success in e-commerce risk management using 7.2% of baseline parameters.

Mobile-R1: Towards Interactive Capability for VLM-Based Mobile Agent via Systematic Training

cs.AI · 2025-06-25 · unverdicted · novelty 6.0

Mobile-R1 introduces a hierarchical three-stage curriculum that combines format alignment, verifiable action feedback, and multi-turn environment training to improve exploration and self-correction in VLM-based mobile agents, plus a new Chinese GUI dataset and benchmark.

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

cs.LG · 2025-06-11 · unverdicted · novelty 6.0

LPO optimizes GUI agent positional accuracy by combining information entropy for zone selection with a physical-distance reward inside a Group Relative Preference Optimization framework, claiming SOTA results on benchmarks and online tests.

AliyunConsoleAgent: Training Web Agents in Real-World Cloud Environments via Distillation and Reinforcement Learning

cs.AI · 2026-06-08 · unverdicted · novelty 5.0

AliyunConsoleAgent-32B reaches 63.52% success on a 278-task cloud console benchmark, closing to 1.82pp of frontier models at 92% lower cost via SFT distillation and GRPO RL.

SynthAgent: Adapting Web Agents with Synthetic Supervision

cs.LG · 2025-11-08 · unverdicted · novelty 5.0

SynthAgent uses dual refinement of synthetic tasks and trajectories to produce higher-quality training data that improves web agent adaptation to target environments.

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

cs.AI · 2025-07-28 · accept · novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review

cs.AI · 2025-04-28 · accept · novelty 4.0

A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.

citing papers explorer

Showing 5 of 5 citing papers after filters.

ISE: An Execution-Grounded Recipe for Multi-Turn OS-Agent Trajectories cs.CL · 2026-06-09 · conditional · none · ref 57
ISE creates 23,132 execution-grounded multi-turn OS agent trajectories via intent simulation and live execution, improving agent performance on ClawEval from 19.3 to 37.7 pass@1 with Qwen3-8B.
FedGUI: Benchmarking Federated GUI Agents across Heterogeneous Platforms, Devices, and Operating Systems cs.MA · 2026-04-16 · unverdicted · none · ref 3
FedGUI is the first comprehensive benchmark for federated GUI agents that studies cross-platform, cross-device, cross-OS, and cross-source heterogeneity, with experiments showing performance gains from cross-platform collaboration and identifying platform and OS as the most influential factors.
Learning to Adapt: Self-Improving Web Agent via Cognitive-Aware Exploration cs.AI · 2026-05-29 · unverdicted · none · ref 22
SCALE introduces three adversarial roles (Selector, Predictor, Judger) and a graph exploration method (SCALE-Hop) to enable MLLM-based web agents to self-discover limitations and improve, backed by the SCALE-20k dataset from 19 websites.
UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization cs.LG · 2026-04-15 · unverdicted · none · ref 3
UI-Copilot adds a selective copilot for memory and math to GUI agents and trains tool use with separate single-turn and multi-turn optimization, yielding SOTA results on MemGUI-Bench and a 17.1% gain on AndroidWorld.
AliyunConsoleAgent: Training Web Agents in Real-World Cloud Environments via Distillation and Reinforcement Learning cs.AI · 2026-06-08 · unverdicted · none · ref 22
AliyunConsoleAgent-32B reaches 63.52% success on a 278-task cloud console benchmark, closing to 1.82pp of frontier models at 92% lower cost via SFT distillation and GRPO RL.

Os-genesis: Automating gui agent trajectory construction via reverse task synthesis

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer