GUI Agents for Continual Game Generation

· 2026 · cs.SE · arXiv 2605.28258

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open full Pith review browse 4 citing papers arXiv PDF

abstract

Generating a game is not the same as making one that can be played. Despite advances in code generation, existing approaches treat game generation as one-shot translation from prompt to artifact, leaving interaction-level failures undetected. We argue that evaluating and improving game generation requires a player, and study two roles for graphical user interface (GUI) agents in this process: (1) as an objective evaluator, for which we introduce PlaytestArena, a new evaluation environment that pairs 200 browser-based game generation tasks across eight genres with rubrics of expected in-play behaviors, adjudicated by a GUI agent that loads each build in a browser and plays it; and (2) as a subjective playtester, for which we propose Play2Code, where a game agent and a GUI agent operate in a sustained loop with shared memory, turning game generation into a dialogue between coding and playing. Our experiments show that even frontier models struggle to generate playable games directly, while Play2Code achieves a 66.8\% rubric pass-rate, improving over single-pass and agentic-coding baselines by 37.1 and 14.6 points respectively. Further analysis shows that GUI playtester feedback is more traceable than a human report, yet idiosyncratic in ways reminiscent of human testers, establishing game playtesting as a critical testbed for interactive code generation. Our project website is available at https://continual-game-generation.vercel.app/.

representative citing papers

DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning

cs.CL · 2026-06-29 · unverdicted · novelty 6.0

DAIN reframes multimodal fusion as dynamic agent collaboration with sparse activation, claiming SOTA results including 2.6% accuracy gain on ADNI across five benchmarks.

A3M: Adaptive, Adversarial and Multi-Objective Learning for Strategic Bidding in Repeated Auctions

cs.CL · 2026-06-27 · unverdicted · novelty 5.0

A3M integrates adaptive DRL, adversarial opponent modeling, and multi-objective rewards to cut regret 30-40% versus baselines while remaining robust to strategy shifts in repeated auctions.

EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control

cs.CL · 2026-06-27 · unverdicted · novelty 4.0

EVLA combines a Unified Co-State Encoder and Electro-aware Structured Reasoning Chain with physics-guided training to produce energy-optimal driving decisions, reporting +5.6% accuracy gains over fine-tuned VLM baselines on a driving QA benchmark.

FinInvest-GTCN: Explainable Graph-Temporal-Causal Modeling for Risk-Aware Investment Decision Optimization

cs.CL · 2026-06-27 · unverdicted · novelty 4.0

FinInvest-GTCN combines graph, temporal, and causal networks with meta-causal adaptation to improve risk-adjusted predictions for VC investments, achieving RA-MSE of 2.51 and 18.7% higher simulated returns on proprietary data.

citing papers explorer

Showing 4 of 4 citing papers.

DAIN: Dynamic Agent-Based Interaction Network for Efficient and Collaborative Multimodal Reasoning cs.CL · 2026-06-29 · unverdicted · none · ref 23 · internal anchor
DAIN reframes multimodal fusion as dynamic agent collaboration with sparse activation, claiming SOTA results including 2.6% accuracy gain on ADNI across five benchmarks.
A3M: Adaptive, Adversarial and Multi-Objective Learning for Strategic Bidding in Repeated Auctions cs.CL · 2026-06-27 · unverdicted · none · ref 9 · internal anchor
A3M integrates adaptive DRL, adversarial opponent modeling, and multi-objective rewards to cut regret 30-40% versus baselines while remaining robust to strategy shifts in repeated auctions.
EVLA: An Electro-Aware Multimodal Assistant for Physically-Grounded Driving Reasoning and Control cs.CL · 2026-06-27 · unverdicted · none · ref 15 · internal anchor
EVLA combines a Unified Co-State Encoder and Electro-aware Structured Reasoning Chain with physics-guided training to produce energy-optimal driving decisions, reporting +5.6% accuracy gains over fine-tuned VLM baselines on a driving QA benchmark.
FinInvest-GTCN: Explainable Graph-Temporal-Causal Modeling for Risk-Aware Investment Decision Optimization cs.CL · 2026-06-27 · unverdicted · none · ref 24 · internal anchor
FinInvest-GTCN combines graph, temporal, and causal networks with meta-causal adaptation to improve risk-adjusted predictions for VC investments, achieving RA-MSE of 2.51 and 18.7% higher simulated returns on proprietary data.

GUI Agents for Continual Game Generation

fields

years

verdicts

representative citing papers

citing papers explorer