AVA: Attentive VLM Agent for Mastering StarCraft II

· 2025 · cs.AI · arXiv 2503.05383

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We introduce AVACraft, a multimodal StarCraft II benchmark supporting both Multi-Agent Reinforcement Learning (MARL) and Vision-Language Model (VLM) paradigms. Unlike SMAC-family environments that rely on abstract state representations and exclude VLMs, AVACraft provides RGB visuals, natural language observations, and structured state information, enabling systematic comparison between training-based and zero-shot methods across 21 scenarios spanning micromanagement, coordination, and strategic planning. We establish comprehensive baselines: six MARL algorithms (IQL, QMIX, QTRAN, VDN, MAPPO, IPPO) with Swin-Transformer backbones trained for 5M steps, and multiple VLMs including proprietary (GPT-4o) and open-source (Qwen3-VL) models. Results reveal complementary strengths-MARL peaks at 19.3% win rate after 5M steps, while VLMs achieve 75-90% zero-shot with human-aligned decisions-exposing trade-offs between training efficiency, performance ceilings, interpretability, and deployment cost. Code: https://github.com/camel-ai/VLM-Play-StarCraft2.

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

cs.AI · 2026-04-22 · unverdicted · novelty 7.0

COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.

EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

EgoEsportsQA is a new egocentric video QA benchmark from esports matches that shows state-of-the-art Video-LLMs reach only 71.58% accuracy and struggle more with tactical reasoning than basic perception.

Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents

cs.CV · 2026-04-10 · unverdicted · novelty 6.0

Closed-loop VLM agents using multi-view reasoning, object-centered visualization, and single-axis rotation prediction achieve superior text-guided 6D pose rearrangement for target objects in scenes.

citing papers explorer

Showing 3 of 3 citing papers.

Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks cs.AI · 2026-04-22 · unverdicted · none · ref 16 · internal anchor
COSPLAY co-evolves an LLM decision agent with a skill bank agent to improve long-horizon game performance, reporting over 25.1% average reward gains versus frontier LLM baselines on single-player benchmarks.
EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports cs.CV · 2026-04-14 · unverdicted · none · ref 34 · internal anchor
EgoEsportsQA is a new egocentric video QA benchmark from esports matches that shows state-of-the-art Video-LLMs reach only 71.58% accuracy and struggle more with tactical reasoning than basic perception.
Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents cs.CV · 2026-04-10 · unverdicted · none · ref 26 · internal anchor
Closed-loop VLM agents using multi-view reasoning, object-centered visualization, and single-axis rotation prediction achieve superior text-guided 6D pose rearrangement for target objects in scenes.

AVA: Attentive VLM Agent for Mastering StarCraft II

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer