pith. sign in

AVA: Attentive VLM Agent for Mastering StarCraft II

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it
abstract

We introduce AVACraft, a multimodal StarCraft II benchmark supporting both Multi-Agent Reinforcement Learning (MARL) and Vision-Language Model (VLM) paradigms. Unlike SMAC-family environments that rely on abstract state representations and exclude VLMs, AVACraft provides RGB visuals, natural language observations, and structured state information, enabling systematic comparison between training-based and zero-shot methods across 21 scenarios spanning micromanagement, coordination, and strategic planning. We establish comprehensive baselines: six MARL algorithms (IQL, QMIX, QTRAN, VDN, MAPPO, IPPO) with Swin-Transformer backbones trained for 5M steps, and multiple VLMs including proprietary (GPT-4o) and open-source (Qwen3-VL) models. Results reveal complementary strengths-MARL peaks at 19.3% win rate after 5M steps, while VLMs achieve 75-90% zero-shot with human-aligned decisions-exposing trade-offs between training efficiency, performance ceilings, interpretability, and deployment cost. Code: https://github.com/camel-ai/VLM-Play-StarCraft2.

citation-role summary

background 1

citation-polarity summary

fields

cs.CV 2 cs.AI 1

years

2026 3

verdicts

UNVERDICTED 3

roles

background 1

polarities

unclear 1

representative citing papers

citing papers explorer

Showing 3 of 3 citing papers.