PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
Nitrogen: An open foundation model for generalist gaming agents
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 10roles
background 3polarities
background 3representative citing papers
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
SPIKE dual-controller framework raises success rates 5-9 points and cuts tokens 55% in StarDojo agents by reusing strategic plans across stable segments and escalating only at detected events.
Fine-tuning VLMs to output action sequences for puzzles causes emergent internal visual representations that improve performance when integrated into reasoning.
Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.
Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.
GROW decomposes trajectories into state-action samples to enable GRPO for multi-turn VLM agents and reports state-of-the-art results on more than 800 Minecraft tasks.
The paper organizes research on generalist game AI into Dataset, Model, Harness, and Benchmark pillars and charts a five-level progression from single-game mastery to agents that create and live inside game multiverses.
A preview system demonstrates real-time controllable world modeling at 14-15 FPS on RTX 4090 by adapting open video backbones with action pathways for keyboard/mouse control and multimodal features.
citing papers explorer
-
PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
PokeGym is a new benchmark that tests VLMs on long-horizon tasks in a complex 3D game using only visual observations, identifying deadlock recovery as the primary failure mode.
-
SCOPE: Simulating Cross-game Operations in Playable Environments for FPS World Models
SCOPE adds per-pixel action conditioning to pretrained video diffusion models and releases the CrossFPS multi-game dataset to support cross-game FPS world model simulation with zero-shot transfer.
-
SPIKE: An Adaptive Dual Controller Framework for Cost-Efficient Long-Horizon Game Agents
SPIKE dual-controller framework raises success rates 5-9 points and cuts tokens 55% in StarDojo agents by reusing strategic plans across stable segments and escalating only at detected events.
-
Odysseus: Scaling VLMs to 100+ Turn Decision-Making in Games via Reinforcement Learning
Odysseus adapts PPO with a turn-level critic and leverages pretrained VLM action priors to train agents achieving at least 3x average game progress over frontier models in long-horizon Super Mario Land.
-
Scalable Environments Drive Generalizable Agents
Generalizable agents require environment scaling via diverse executable rule-sets, distinguished from trajectory and task scaling in a new taxonomy.
-
GROW: Aligning GRPO with State-Action Modeling for Open-World VLM Agents
GROW decomposes trajectories into state-action samples to enable GRPO for multi-turn VLM agents and reports state-of-the-art results on more than 800 Minecraft tasks.
-
Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
The paper organizes research on generalist game AI into Dataset, Model, Harness, and Benchmark pillars and charts a five-level progression from single-game mastery to agents that create and live inside game multiverses.
-
DreamForge-World 0.1 Preview: A Low-Compute Real-Time Controllable World Model
A preview system demonstrates real-time controllable world modeling at 14-15 FPS on RTX 4090 by adapting open video backbones with action pathways for keyboard/mouse control and multimodal features.