Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

Chang Lu; Kuihua Huang; Lumin Jiang; Runnan Qi; Xiangbei Liu; Xian Guo; Xiaojie Xu; Xuebo Zhang; Yanan Ni; Yongchun Fang

arxiv: 2502.13388 · v4 · submitted 2025-02-19 · 💻 cs.AI

Reflection of Episodes: Learning to Play Game from Expert and Self Experiences

Xiaojie Xu , Zongyuan Li , Chang Lu , Runnan Qi , Yanan Ni , Lumin Jiang , Xiangbei Liu , Xuebo Zhang

show 5 more authors

Yongchun Fang Kuihua Huang Xian Guo Zhanghua Wu Zhenya Li

This is my paper

Pith reviewed 2026-05-23 03:07 UTC · model grok-4.3

classification 💻 cs.AI

keywords Reflection of EpisodesStarCraft IILarge Language ModelsSelf-experienceKeyframe selectionGame AIReinforcement learning

0 comments

The pith

A reflection framework lets LLMs improve at StarCraft II by turning completed games into new self-experience using expert examples and keyframe summaries.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a Reflection of Episodes framework that lets large language models play complex real-time strategy games by selecting key frames from matches, deciding moves from both expert demonstrations and its own prior plays, and then reflecting after each completed game to create fresh self-experience for the next match. This process repeats so the model gradually builds better decision rules without requiring new external data each time. A sympathetic reader would care because the approach shows a route for language models to adapt in long-horizon, partially observable environments where full game histories are too long to process directly. The experiments report that the resulting agent defeats the built-in very hard opponent in the TextStarCraft II setting.

Core claim

The Reflection of Episodes framework first extracts key game information through a keyframe selection method, then makes decisions by consulting both expert experience and accumulated self-experience; after each game ends it reflects on the prior experience to generate new self-experience, and this loop enables the LLM to defeat the very hard difficulty built-in robot in TextStarCraft II.

What carries the argument

The Reflection of Episodes (ROE) framework, which uses keyframe selection to summarize game states and post-game reflection to convert expert and self-experience into updated self-experience for future decisions.

If this is right

The LLM can iteratively improve its policy in a complex RTS environment solely through post-game reflection rather than additional supervised fine-tuning.
Keyframe selection supplies adequate state information for the model to make effective mid-game choices without processing entire replays.
Expert experience combined with self-generated experience produces measurable gains against a fixed very hard opponent.
The same loop can be applied to other sequential decision tasks where full histories exceed context length.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method might generalize to other partially observable games or planning domains where agents can store and later query condensed past episodes.
It suggests that language models can bootstrap improvement from a small set of expert traces plus their own growing memory of past outcomes.
If keyframe selection proves robust, similar compression steps could reduce memory cost in long-horizon reinforcement learning outside games.

Load-bearing premise

Reflecting on expert and self-experience after a completed game will reliably produce new self-experience that improves the model's decisions in later games, and that the selected keyframes contain enough information to support those decisions.

What would settle it

Running the same agent for multiple additional games after the reported reflection cycles and finding no further wins against the very hard opponent, or finding that removing the reflection step leaves performance unchanged.

Figures

Figures reproduced from arXiv: 2502.13388 by Chang Lu, Kuihua Huang, Lumin Jiang, Runnan Qi, Xiangbei Liu, Xian Guo, Xiaojie Xu, Xuebo Zhang, Yanan Ni, Yongchun Fang, Zhanghua Wu, Zhenya Li, Zongyuan Li.

**Figure 1.** Figure 1: StarCraft II. A complex and dynamic real-time strategy game environment, which make it very suitable for artificial intelligence research. In June 2018, GPT-1 was released, demonstrating the potential of large language models in natural language understanding and generation tasks. In November 2022, OpenAI released ChatGPT[3], which can interact based on the context of the conversation, truly chat and commu… view at source ↗

**Figure 2.** Figure 2: Reflection of Episodes Framework. The framework consists of Text StarCraft 2 environment and reflection structure. After an episode, reflection structure generate new prompt and update it to the next game. Algorithm 1 Key Frame Selection Input: L2 summary of an Episode: L2_summary. Output: Key Frame in Episode: key_frame 1: Initilalize key_frame 2: Initilalize game_phase_transision 3: data = read_f ile(L2… view at source ↗

**Figure 3.** Figure 3: Reflection and Strategy Iteration. In three consecutive games, expert reflection and two generations of self-reflection took the game from defeat to victory. Replay analysis method To effectively analyze StarCraft replays and learn from them, we propose a replay analysis method, which is divided into the following three stages. First, When watch the Replay, focus on key aspects as "Opening Build Order"(T… view at source ↗

**Figure 4.** Figure 4: Detailed game analysis of Very Hard difficulty [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of baseline experiments in resources [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of baseline experiments in units [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of ablation experiments Through ablation experiments, we can find comparison between the two sides in terms of units after losing the method of keyframe selection or reflection. It can be seen that the ablation probe production rate slowed down in the medium term, and the army could not be effectively replenished after facing a wave of enemy attacks. 5 Conclusion In this paper, we propose a ROE … view at source ↗

read the original abstract

StarCraft II is a complex and dynamic real-time strategy (RTS) game environment, which is very suitable for artificial intelligence and reinforcement learning research. To address the problem of Large Language Model(LLM) learning in complex environments through self-reflection, we propose a Reflection of Episodes(ROE) framework based on expert experience and self-experience. This framework first obtains key information in the game through a keyframe selection method, then makes decisions based on expert experience and self-experience. After a game is completed, it reflects on the previous experience to obtain new self-experience. Finally, in the experiment, our method beat the robot under the Very Hard difficulty in TextStarCraft II. We analyze the data of the LLM in the process of the game in detail, verified its effectiveness.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ROE gives a clean high-level loop for LLM reflection in StarCraft but the abstract supplies no numbers, ablations, or retrieval details to back the Very Hard win claim.

read the letter

The main takeaway is that the authors describe a Reflection of Episodes framework that selects keyframes from games, feeds expert plus self-experience to an LLM for decisions, then reflects after each game to produce new self-experience. They state that this beats the Very Hard bot in TextStarCraft II. That is the result they want readers to take away. The combination of keyframe summarization with a closed reflection loop on both expert and self data is the concrete new piece; it is a reasonable way to let an LLM accumulate usable experience across episodes in a long-horizon, partially observable game. The high-level architecture is laid out clearly enough that someone could implement the skeleton from the description. The paper also flags that it analyzes the LLM's in-game behavior, which at least shows they looked at the traces rather than treating the agent as a black box. The central weakness is exactly the one the stress-test note flags: the abstract gives no win rate, no trial count, no variance, no ablation that removes the reflection step, and no account of how self-experience is stored or retrieved at decision time. Without those, it is impossible to know whether the reflection loop actually drove the reported win or whether the result rests on a single lucky game or on the expert experience alone. The lack of any baseline comparison makes the same problem worse. This is the sort of paper that would interest people already building LLM agents for RTS or other complex games; they could borrow the keyframe-plus-reflection pattern even if they end up rewriting the evaluation. It does not yet give enough evidence for someone outside that niche to treat the result as established. I would send it to peer review. The framework is coherent and the target environment is hard enough that referees can usefully ask for the missing controls and statistics rather than reject outright.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes the Reflection of Episodes (ROE) framework in which an LLM extracts game state via keyframe selection, conditions decisions on expert experience plus accumulated self-experience, and performs post-game reflection to generate additional self-experience. The central empirical claim is that this procedure enabled the agent to defeat the Very Hard difficulty bot in TextStarCraft II.

Significance. A convincingly demonstrated ability of an LLM agent to improve via post-episode reflection in a long-horizon, partially observable RTS setting would be of interest to the community working on LLM-based agents and self-improvement loops. The paper does not yet supply the quantitative controls or ablations needed to establish that the reflection component is responsible for the reported outcome.

major comments (3)

[Abstract] Abstract: the claim that the method 'beat the robot under the Very Hard difficulty' is presented without any win rate, number of independent trials, or baseline comparison, rendering it impossible to determine whether the result supports the effectiveness of the reflection mechanism.
[Method] Method section (description of decision-making and experience retrieval): the paper does not specify how self-experience generated by reflection is indexed, stored, or retrieved at decision time, which is load-bearing for the assumption that post-game reflection produces usable new experience.
[Experiments] Experiments: no ablation that removes the reflection step (or the self-experience component) is reported, so it is impossible to isolate whether the reported win depends on the ROE loop rather than expert experience alone or on a single lucky episode.

minor comments (2)

[Abstract] Abstract: 'Large Language Model(LLM)' is written without a space before the parenthesis and without prior expansion.
[Experiments] The manuscript would benefit from a clear statement of the total number of games played and whether the reflection loop was active across the entire evaluation set.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments, which highlight important areas for improving the clarity and rigor of our presentation of the ROE framework. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the method 'beat the robot under the Very Hard difficulty' is presented without any win rate, number of independent trials, or baseline comparison, rendering it impossible to determine whether the result supports the effectiveness of the reflection mechanism.

Authors: We agree that the abstract as currently written does not provide sufficient quantitative context for the reported outcome. The full manuscript contains experimental details on the win against the Very Hard bot, but these are not summarized in the abstract. In the revision we will expand the abstract to report the win rate, the number of independent trials performed, and explicit baseline comparisons (including expert-experience-only runs) so that readers can immediately assess the strength of the result. revision: yes
Referee: [Method] Method section (description of decision-making and experience retrieval): the paper does not specify how self-experience generated by reflection is indexed, stored, or retrieved at decision time, which is load-bearing for the assumption that post-game reflection produces usable new experience.

Authors: This observation is correct; the current method description focuses on the overall pipeline and does not detail the storage and retrieval mechanics for the self-experience generated by reflection. We will revise the method section to specify the indexing scheme (embedding-based), storage format, and retrieval procedure (similarity search over accumulated self-experience at decision time) so that the mechanism by which reflection contributes usable experience is fully explicit. revision: yes
Referee: [Experiments] Experiments: no ablation that removes the reflection step (or the self-experience component) is reported, so it is impossible to isolate whether the reported win depends on the ROE loop rather than expert experience alone or on a single lucky episode.

Authors: We acknowledge that the absence of an ablation isolating the reflection/self-experience component limits the ability to attribute success specifically to the ROE loop. The current experiments demonstrate that the full framework succeeds, but do not include the requested controls. We will add ablation experiments that compare performance with and without the reflection-generated self-experience (and with expert experience alone) in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method description with no derivation chain or equations

full rationale

The paper presents an empirical framework (ROE) for LLM-based gameplay in TextStarCraft II using keyframe selection, expert/self-experience, and post-game reflection. No equations, parameters, or mathematical derivations appear in the abstract or described method. The central claim is a reported experimental outcome (beating Very Hard bot) rather than a derived prediction from fitted inputs or self-referential definitions. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are present. The method is self-contained as a procedural description without reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific parameters, axioms, or invented entities are detailed.

pith-pipeline@v0.9.0 · 5700 in / 1038 out tokens · 31231 ms · 2026-05-23T03:07:25.955798+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
cs.CV 2026-05 unverdicted novelty 5.0

The paper organizes research on generalist game AI into Dataset, Model, Harness, and Benchmark pillars and charts a five-level progression from single-game mastery to agents that create and live inside game multiverses.
Towards Generalist Game Players: An Investigation of Foundation Models in the Game Multiverse
cs.CV 2026-05 unverdicted novelty 3.0

This work traces four eras of generalist game players across dataset, model, harness, and benchmark pillars and charts a five-level roadmap ending in agents that create and evolve within game multiverses.

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

Vinyals, O., et. al. Starcraft II: A new challenge for reinforcement learning. arxiv preprint, arXiv:1708.04782 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[2]

Vinyals, O., Babuschkin, I., et. al. AlphaStar: Mastering the real-time strategy game StarCraft II. DeepMind blog, 2, 20 (2019)

work page 2019
[3]

OpenAI ChatGPT team,https://openai.com/chatgpt/ (2022)

work page 2022
[4]

OpenAI GPT-4 team,https://openai.com/index/gpt-4/ (2023)

work page 2023
[5]

Lowe, R., Wu, Y ., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I

MaWeiyu,et.al.LLMsplayStarCraftII:Benchmarksandachainofsummarization approach. arXiv preprint, arXiv:2312.11865 (2023)

work page arXiv 2023
[6]

Peter Sunehag, et. al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint, arXiv: 1706.05296 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[7]

Rashid, Tabish, et. al. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research 21.178: 1-51 (2020)

work page 2020
[8]

Rashid, Tabish, et. al. Weighted Qmix: Expanding monotonic value function factori- sation for deep multi-agent reinforcement learning. Advances in neural information processing systems 33: 10199-10210 (2020)

work page 2020
[9]

Yu, Chao, et. al. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems 35: 24611-24624 (2022)

work page 2022
[10]

Lowe, Ryan, et. al. Multi-agent actor-critic for mixed cooperative-competitive en- vironments. Advances in neural information processing systems 30 (2017)

work page 2017
[11]

Liu, Ruo-Ze, et. al. On efficient reinforcement learning for full-length game of starcraft II. Journal of Artificial Intelligence Research 75: 213-260 (2022)

work page 2022
[12]

Anthropic Claude-2 team,https://www.anthropic.com/news/claude-2 (2023)

work page 2023
[13]

Meta Llama team,https://github.com/meta-llama/llama3 (2024)

work page 2024
[14]

PaLM 2 Technical Report

Anil, Rohan, et al. Palm 2 technical report. arXiv preprint, arXiv:2305.10403 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[15]

Shao, X., Jiang, W., et. al. SwarmBrain: Embodied agent for real-time strategy game StarCraft II via LLMs. arXiv preprint, arXiv:2401.17749 (2024)

work page arXiv 2024
[16]

Reflexion: Language Agents with Verbal Reinforcement Learning

Shinn, Noah, et al. Reflexion: Language agents with verbal reinforcement learning. arXiv preprint, cs.AI/2303.11366 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[17]

Agent-pro: Learning to evolve via policy-level reflection and optimization

Zhang, Wenqi, et al. Agent-pro: Learning to evolve via policy-level reflection and optimization. arXiv preprint, arXiv:2402.17574 (2024). Apendix A. All Prompt In this appendix, we will show all the prompts used during the experiment, as well as during the experiment, including system prompts, reflection prompts, etc. A.1 System Prompt You are an AI train...

work page arXiv 2024
[18]

Game Overview: Provide a brief overview of the current situation based on all the rounds

work page
[19]

Is it the early game, mid-game, or late game?

Current Game Stage: Determine the stage of the game based on the information of all rounds. Is it the early game, mid-game, or late game?

work page
[20]

3.2 Economy: Evaluate our economic condition, including resource collection and usage

Our Situation: Describe our current status in terms of: 3.1 Units and Buildings: Analyze the state of our units and buildings. 3.2 Economy: Evaluate our economic condition, including resource collection and usage. 3.3 Technology: Describe the status of our technological research and what technologies we have unlocked so far. Analyze our technology tree, i...

work page
[21]

Enemy's Strategy: Infer the enemy's potential strategy, based on the available information

work page
[22]

Key Information: Highlight the most important aspects from all rounds that have significantly influenced the game. {self.race_specific_prompt.get(self.race)} These are the lessons given by experts based on previous matches to help you play the game: {self.last_reflection} Here are some tips to help you analyze the game stage.In subsequent analysis, you ne...

work page
[23]

- Check the timing of your first Pylon, Gateway, Assimilator, and Cybernetics Core

**Opening Build Order**: - Ensure you followed a standard and efficient build order. - Check the timing of your first Pylon, Gateway, Assimilator, and Cybernetics Core

work page
[24]

Aim for constant Probes production

**Economy Management**: - Monitor your worker production. Aim for constant Probes production. - Check your expansion timing. A typical timing for your natural expansion is around 2:30 to 3:00 minutes

work page
[25]

Did you scout the enemy base early to see their build order? - Did you send a Probe or use an Observer to gather information about the enemy’s tech and army composition?

**Scouting**: - Review your scouting efforts. Did you scout the enemy base early to see their build order? - Did you send a Probe or use an Observer to gather information about the enemy’s tech and army composition?

work page
[26]

Avoid floating too many minerals and gas

**Macro Management**: - Check your resources. Avoid floating too many minerals and gas. - Ensure you're continuously producing units and expanding your infrastructure (Gateways, Robotics Facilities, etc.)

work page
[27]

Did you control your units effectively during battles? - Pay attention to spell usage, positioning, and focus fire

**Micro Management**: - Watch your army engagements. Did you control your units effectively during battles? - Pay attention to spell usage, positioning, and focus fire

work page
[28]

Did you have the right counters? - Ensure you tech up appropriately and adjust your unit mix based on what the opponent is building

**Army Composition**: - Evaluate your unit composition relative to the enemy’s. Did you have the right counters? - Ensure you tech up appropriately and adjust your unit mix based on what the opponent is building

work page
[29]

Upgrades can significantly affect the outcome of battles

**Upgrades**: - Check your upgrades timing. Upgrades can significantly affect the outcome of battles. - Ensure you research crucial upgrades like Warp Gate, Blink, and attack/armor upgrades

work page
[30]

**Decision Making**: - Review your decisions throughout the game. Did you expand at the right times? - Did you make effective use of harassment (e.g., Warp Prism drops) to disrupt the opponent’s economy? Then, After reviewing these aspects, make a list of key mistakes and areas for improvement. Here are some common points to look for: - Delayed expansion ...

work page
[31]

**Opening Build Order**:

work page
[32]

**Economy Management**:

work page
[33]

**Macro Management**:

work page
[34]

**Micro Management**:

work page
[35]

**Army Composition**:

work page
[36]

**Decision Making**:

work page
[37]

Our Reflection prompt

**Key time point and recommendation**(At least five, specific time point from time 0:00(important) to finish): """ Reflection Prompt Fig.A3. Our Reflection prompt. Appendix B. Reflection Iterations In this appendix, we will show the reflections and changes generated during the experiment of our method under Very Hard built-in AI. The marked part is the pa...

work page
[38]

- Construct a Pylon near your mineral line to avoid supply blockages

**Opening Build Order**: - Start the game by immediately building a Probe. - Construct a Pylon near your mineral line to avoid supply blockages. - Establish an early gateway to initiate unit production

work page
[39]

- Expand your economy by building additional Nexuses at optimal expansion timings

**Economy Management**: - Focus on continuous Probe production to maximize mineral and gas income. - Expand your economy by building additional Nexuses at optimal expansion timings. - Allocate resources efficiently between worker production and infrastructure development

work page
[40]

- Utilize Observers to scout enemy movements and unit compositions

**Scouting**: - Send out early scouting Probes to gather information about the enemy's base and tech choices. - Utilize Observers to scout enemy movements and unit compositions. - Maintain map control with Zealot or Stalker scouts to anticipate enemy strategies

work page
[41]

- Ensure consistent unit production from all structures to maintain a strong army presence

**Macro Management**: - Prioritize expanding infrastructure by adding more Gateways and tech structures like Robotics Facilities. - Ensure consistent unit production from all structures to maintain a strong army presence. - Use Chrono Boost effectively on key structures such as the Cybernetics Core and Forges for faster upgrades

work page
[42]

- Utilize Blink effectively with Stalkers for tactical advantages in battles

**Micro Management**: - Improve unit control during engagements by focusing on proper positioning and target prioritization. - Utilize Blink effectively with Stalkers for tactical advantages in battles. - Practice splitting your army and managing spellcaster units like High Templars efficiently

work page
[43]

- Consider incorporating Immortals to counter enemy armored units effectively

**Army Composition**: - Maintain a balanced unit composition with Zealots for frontline tanking and Stalkers for ranged damage. - Consider incorporating Immortals to counter enemy armored units effectively. - Adapt your unit mix based on enemy unit compositions and tech choices

work page
[44]

- Prioritize upgrades that align with your chosen army composition for maximum efficiency

**Upgrades**: - Research crucial upgrades like Ground Weapons and Armor to enhance your army's combat effectiveness. - Prioritize upgrades that align with your chosen army composition for maximum efficiency. - Continuously upgrade tech structures to unlock advanced units and abilities

work page
[45]

- Assess enemy scouting information to adapt your army composition and defensive structures

**Decision Making**: - Make timely decisions to expand strategically at appropriate timings to boost your economy. - Assess enemy scouting information to adapt your army composition and defensive structures. - Use harassment tactics like Warp Prism drops to disrupt the opponent's economy and keep them on the back foot

work page
[46]

Expert Reflection Fig.B1. Expert Reflection with our method against Very Hard built-in AI

**Key time points and recommendations**: - 00:30: Start the game by training a Probe and immediately building a Pylon to avoid supply blockages. - 03:45: Expand your economy by constructing additional Nexuses and assimilators to boost your income. - 07:00: Focus on researching essential upgrades like Warpgate technology and Protoss weapons/armor for stron...

work page
[47]

- Expand to new bases and construct additional Pylons to increase supply and resource gathering capacity

**Economy Management**: - Focus on continuous Probe production to maximize resource collection and support overall economy growth. - Expand to new bases and construct additional Pylons to increase supply and resource gathering capacity. - Efficiently manage resources to avoid floating minerals and gas while maintaining a steady production of units and inf...

work page
[48]

- Consider using observers or other scouting units to scout the enemy base and key locations for potential threats

**Scouting**: - Implement a proactive scouting strategy by sending Probes or units like Zealots to gather information about the enemy's tech and army composition. - Consider using observers or other scouting units to scout the enemy base and key locations for potential threats. - Adapt your strategy based on scouting information to make informed decisions...

work page
[49]

- Prioritize tech advancements and unit production by researching critical upgrades such as Warp Gate technology and other unit-specific enhancements

**Macro Management**: - Improve infrastructure development by constructing essential structures like Pylons, Gateways, and tech buildings in a timely manner. - Prioritize tech advancements and unit production by researching critical upgrades such as Warp Gate technology and other unit-specific enhancements. - Ensure a balanced distribution of resources be...

work page
[50]

- Practice effective spell usage, proper unit positioning, and target prioritization to gain an advantage in battles

**Micro Management**: - Enhance unit control and micro techniques during engagements to optimize positioning, focus fire, and utilize unit abilities effectively. - Practice effective spell usage, proper unit positioning, and target prioritization to gain an advantage in battles. - Pay attention to unit formations, flanking maneuvers, and retreat strategie...

work page
[52]

- Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities

**Upgrades**: - Prioritize essential upgrades such as Warp Gate technology, attack, and armor upgrades to enhance the effectiveness of your units in combat. - Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities. - Scout enemy upgrades and adjust your upgrade timings to remain competitive in battles

work page
[53]

- Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage

**Decision Making**: - Make effective decisions throughout the game, including timely expansions, tech advancements, and unit compositions based on scouting information. - Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage. - Adapt your strategy dynamically based on the evolving game state and en...

work page
[54]

Self Reflection-1 Fig.B2. Self Reflection1 with our method against Very Hard built-in AI

**Key Time Points and Recommendations**: - **00:30** - Start with a standard opening build order, focus on Probe production, and begin scouting with a Probe to gather information about the enemy's strategy. - **03:00** - Prioritize expanding to additional Nexuses for increased resource income and build more Pylons to support supply cap and unit production...

work page
[55]

- Expand to new bases and construct additional Pylons to increase supply cap and facilitate unit production

**Economy Management**: - Focus on continuous Probe production to maximize resource collection and support overall economy growth. - Expand to new bases and construct additional Pylons to increase supply cap and facilitate unit production. - Manage resources efficiently to avoid resource floating while maintaining a steady balance between mineral and gas income

work page
[56]

- Consider using observers or other scouting units to gather critical intel on the enemy's base and key locations

**Scouting**: - Implement a proactive scouting strategy by sending Probes or units like Zealots to gather information about the enemy's tech and army composition. - Consider using observers or other scouting units to gather critical intel on the enemy's base and key locations. - Adapt your strategy based on scouting information to make informed decisions ...

work page
[57]

- Prioritize tech advancements and unit production by researching essential upgrades like Warp Gate technology and other unit-specific enhancements

**Macro Management**: - Improve infrastructure development by constructing essential structures like Pylons, Gateways, and tech buildings in a timely manner. - Prioritize tech advancements and unit production by researching essential upgrades like Warp Gate technology and other unit-specific enhancements. - Maintain a balanced distribution of resources be...

work page
[58]

- Practice effective spell usage, proper unit positioning, and target prioritization in battles to gain an advantage

**Micro Management**: - Enhance unit control and micro techniques during engagements to optimize positioning, focus fire, and utilize unit abilities effectively. - Practice effective spell usage, proper unit positioning, and target prioritization in battles to gain an advantage. - Pay attention to unit formations, flanking maneuvers, and retreat strategie...

work page
[59]

- Tech up appropriately and diversify your unit mix based on the opponent's army composition

**Army Composition**: - Evaluate your unit composition relative to the enemy's and adjust accordingly to have the right counters. - Tech up appropriately and diversify your unit mix based on the opponent's army composition. - Consider incorporating a mix of Zealots, Stalkers, and other unit types to create a well-rounded army capable of handling various threats

work page
[60]

- Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities

**Upgrades**: - Prioritize essential upgrades such as Warp Gate technology, attack, and armor upgrades to enhance unit effectiveness in combat. - Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities. - Scout enemy upgrades and adjust your own upgrade timings to stay competitive in engagements

work page
[61]

- Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage

**Decision Making**: - Make effective decisions throughout the game, including timely expansions, tech advancements, and unit compositions based on scouting information. - Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage. - Adapt your strategy dynamically based on the evolving game state and en...

work page
[62]

- **03:00**: Expand to additional Nexuses for increased resource income and build more Pylons to support supply cap and unit production

**Key Time Points and Recommendations**: - **00:00**: Start with a standard opening build order, prioritize Probe production, and begin scouting with a Probe for crucial information. - **03:00**: Expand to additional Nexuses for increased resource income and build more Pylons to support supply cap and unit production. - **06:00**: Complete Warpgate resear...

work page

[1] [1]

Vinyals, O., et. al. Starcraft II: A new challenge for reinforcement learning. arxiv preprint, arXiv:1708.04782 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[2] [2]

Vinyals, O., Babuschkin, I., et. al. AlphaStar: Mastering the real-time strategy game StarCraft II. DeepMind blog, 2, 20 (2019)

work page 2019

[3] [3]

OpenAI ChatGPT team,https://openai.com/chatgpt/ (2022)

work page 2022

[4] [4]

OpenAI GPT-4 team,https://openai.com/index/gpt-4/ (2023)

work page 2023

[5] [5]

Lowe, R., Wu, Y ., Tamar, A., Harb, J., Abbeel, P., and Mordatch, I

MaWeiyu,et.al.LLMsplayStarCraftII:Benchmarksandachainofsummarization approach. arXiv preprint, arXiv:2312.11865 (2023)

work page arXiv 2023

[6] [6]

Peter Sunehag, et. al. Value-decomposition networks for cooperative multi-agent learning. arXiv preprint, arXiv: 1706.05296 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[7] [7]

Rashid, Tabish, et. al. Monotonic value function factorisation for deep multi-agent reinforcement learning. Journal of Machine Learning Research 21.178: 1-51 (2020)

work page 2020

[8] [8]

Rashid, Tabish, et. al. Weighted Qmix: Expanding monotonic value function factori- sation for deep multi-agent reinforcement learning. Advances in neural information processing systems 33: 10199-10210 (2020)

work page 2020

[9] [9]

Yu, Chao, et. al. The surprising effectiveness of ppo in cooperative multi-agent games. Advances in Neural Information Processing Systems 35: 24611-24624 (2022)

work page 2022

[10] [10]

Lowe, Ryan, et. al. Multi-agent actor-critic for mixed cooperative-competitive en- vironments. Advances in neural information processing systems 30 (2017)

work page 2017

[11] [11]

Liu, Ruo-Ze, et. al. On efficient reinforcement learning for full-length game of starcraft II. Journal of Artificial Intelligence Research 75: 213-260 (2022)

work page 2022

[12] [12]

Anthropic Claude-2 team,https://www.anthropic.com/news/claude-2 (2023)

work page 2023

[13] [13]

Meta Llama team,https://github.com/meta-llama/llama3 (2024)

work page 2024

[14] [14]

PaLM 2 Technical Report

Anil, Rohan, et al. Palm 2 technical report. arXiv preprint, arXiv:2305.10403 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[15] [15]

Shao, X., Jiang, W., et. al. SwarmBrain: Embodied agent for real-time strategy game StarCraft II via LLMs. arXiv preprint, arXiv:2401.17749 (2024)

work page arXiv 2024

[16] [16]

Reflexion: Language Agents with Verbal Reinforcement Learning

Shinn, Noah, et al. Reflexion: Language agents with verbal reinforcement learning. arXiv preprint, cs.AI/2303.11366 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[17] [17]

Agent-pro: Learning to evolve via policy-level reflection and optimization

Zhang, Wenqi, et al. Agent-pro: Learning to evolve via policy-level reflection and optimization. arXiv preprint, arXiv:2402.17574 (2024). Apendix A. All Prompt In this appendix, we will show all the prompts used during the experiment, as well as during the experiment, including system prompts, reflection prompts, etc. A.1 System Prompt You are an AI train...

work page arXiv 2024

[18] [18]

Game Overview: Provide a brief overview of the current situation based on all the rounds

work page

[19] [19]

Is it the early game, mid-game, or late game?

Current Game Stage: Determine the stage of the game based on the information of all rounds. Is it the early game, mid-game, or late game?

work page

[20] [20]

3.2 Economy: Evaluate our economic condition, including resource collection and usage

Our Situation: Describe our current status in terms of: 3.1 Units and Buildings: Analyze the state of our units and buildings. 3.2 Economy: Evaluate our economic condition, including resource collection and usage. 3.3 Technology: Describe the status of our technological research and what technologies we have unlocked so far. Analyze our technology tree, i...

work page

[21] [21]

Enemy's Strategy: Infer the enemy's potential strategy, based on the available information

work page

[22] [22]

Key Information: Highlight the most important aspects from all rounds that have significantly influenced the game. {self.race_specific_prompt.get(self.race)} These are the lessons given by experts based on previous matches to help you play the game: {self.last_reflection} Here are some tips to help you analyze the game stage.In subsequent analysis, you ne...

work page

[23] [23]

- Check the timing of your first Pylon, Gateway, Assimilator, and Cybernetics Core

**Opening Build Order**: - Ensure you followed a standard and efficient build order. - Check the timing of your first Pylon, Gateway, Assimilator, and Cybernetics Core

work page

[24] [24]

Aim for constant Probes production

**Economy Management**: - Monitor your worker production. Aim for constant Probes production. - Check your expansion timing. A typical timing for your natural expansion is around 2:30 to 3:00 minutes

work page

[25] [25]

Did you scout the enemy base early to see their build order? - Did you send a Probe or use an Observer to gather information about the enemy’s tech and army composition?

**Scouting**: - Review your scouting efforts. Did you scout the enemy base early to see their build order? - Did you send a Probe or use an Observer to gather information about the enemy’s tech and army composition?

work page

[26] [26]

Avoid floating too many minerals and gas

**Macro Management**: - Check your resources. Avoid floating too many minerals and gas. - Ensure you're continuously producing units and expanding your infrastructure (Gateways, Robotics Facilities, etc.)

work page

[27] [27]

Did you control your units effectively during battles? - Pay attention to spell usage, positioning, and focus fire

**Micro Management**: - Watch your army engagements. Did you control your units effectively during battles? - Pay attention to spell usage, positioning, and focus fire

work page

[28] [28]

Did you have the right counters? - Ensure you tech up appropriately and adjust your unit mix based on what the opponent is building

**Army Composition**: - Evaluate your unit composition relative to the enemy’s. Did you have the right counters? - Ensure you tech up appropriately and adjust your unit mix based on what the opponent is building

work page

[29] [29]

Upgrades can significantly affect the outcome of battles

**Upgrades**: - Check your upgrades timing. Upgrades can significantly affect the outcome of battles. - Ensure you research crucial upgrades like Warp Gate, Blink, and attack/armor upgrades

work page

[30] [30]

**Decision Making**: - Review your decisions throughout the game. Did you expand at the right times? - Did you make effective use of harassment (e.g., Warp Prism drops) to disrupt the opponent’s economy? Then, After reviewing these aspects, make a list of key mistakes and areas for improvement. Here are some common points to look for: - Delayed expansion ...

work page

[31] [31]

**Opening Build Order**:

work page

[32] [32]

**Economy Management**:

work page

[33] [33]

**Macro Management**:

work page

[34] [34]

**Micro Management**:

work page

[35] [35]

**Army Composition**:

work page

[36] [36]

**Decision Making**:

work page

[37] [37]

Our Reflection prompt

**Key time point and recommendation**(At least five, specific time point from time 0:00(important) to finish): """ Reflection Prompt Fig.A3. Our Reflection prompt. Appendix B. Reflection Iterations In this appendix, we will show the reflections and changes generated during the experiment of our method under Very Hard built-in AI. The marked part is the pa...

work page

[38] [38]

- Construct a Pylon near your mineral line to avoid supply blockages

**Opening Build Order**: - Start the game by immediately building a Probe. - Construct a Pylon near your mineral line to avoid supply blockages. - Establish an early gateway to initiate unit production

work page

[39] [39]

- Expand your economy by building additional Nexuses at optimal expansion timings

**Economy Management**: - Focus on continuous Probe production to maximize mineral and gas income. - Expand your economy by building additional Nexuses at optimal expansion timings. - Allocate resources efficiently between worker production and infrastructure development

work page

[40] [40]

- Utilize Observers to scout enemy movements and unit compositions

**Scouting**: - Send out early scouting Probes to gather information about the enemy's base and tech choices. - Utilize Observers to scout enemy movements and unit compositions. - Maintain map control with Zealot or Stalker scouts to anticipate enemy strategies

work page

[41] [41]

- Ensure consistent unit production from all structures to maintain a strong army presence

**Macro Management**: - Prioritize expanding infrastructure by adding more Gateways and tech structures like Robotics Facilities. - Ensure consistent unit production from all structures to maintain a strong army presence. - Use Chrono Boost effectively on key structures such as the Cybernetics Core and Forges for faster upgrades

work page

[42] [42]

- Utilize Blink effectively with Stalkers for tactical advantages in battles

**Micro Management**: - Improve unit control during engagements by focusing on proper positioning and target prioritization. - Utilize Blink effectively with Stalkers for tactical advantages in battles. - Practice splitting your army and managing spellcaster units like High Templars efficiently

work page

[43] [43]

- Consider incorporating Immortals to counter enemy armored units effectively

**Army Composition**: - Maintain a balanced unit composition with Zealots for frontline tanking and Stalkers for ranged damage. - Consider incorporating Immortals to counter enemy armored units effectively. - Adapt your unit mix based on enemy unit compositions and tech choices

work page

[44] [44]

- Prioritize upgrades that align with your chosen army composition for maximum efficiency

**Upgrades**: - Research crucial upgrades like Ground Weapons and Armor to enhance your army's combat effectiveness. - Prioritize upgrades that align with your chosen army composition for maximum efficiency. - Continuously upgrade tech structures to unlock advanced units and abilities

work page

[45] [45]

- Assess enemy scouting information to adapt your army composition and defensive structures

**Decision Making**: - Make timely decisions to expand strategically at appropriate timings to boost your economy. - Assess enemy scouting information to adapt your army composition and defensive structures. - Use harassment tactics like Warp Prism drops to disrupt the opponent's economy and keep them on the back foot

work page

[46] [46]

Expert Reflection Fig.B1. Expert Reflection with our method against Very Hard built-in AI

**Key time points and recommendations**: - 00:30: Start the game by training a Probe and immediately building a Pylon to avoid supply blockages. - 03:45: Expand your economy by constructing additional Nexuses and assimilators to boost your income. - 07:00: Focus on researching essential upgrades like Warpgate technology and Protoss weapons/armor for stron...

work page

[47] [47]

- Expand to new bases and construct additional Pylons to increase supply and resource gathering capacity

**Economy Management**: - Focus on continuous Probe production to maximize resource collection and support overall economy growth. - Expand to new bases and construct additional Pylons to increase supply and resource gathering capacity. - Efficiently manage resources to avoid floating minerals and gas while maintaining a steady production of units and inf...

work page

[48] [48]

- Consider using observers or other scouting units to scout the enemy base and key locations for potential threats

**Scouting**: - Implement a proactive scouting strategy by sending Probes or units like Zealots to gather information about the enemy's tech and army composition. - Consider using observers or other scouting units to scout the enemy base and key locations for potential threats. - Adapt your strategy based on scouting information to make informed decisions...

work page

[49] [49]

- Prioritize tech advancements and unit production by researching critical upgrades such as Warp Gate technology and other unit-specific enhancements

**Macro Management**: - Improve infrastructure development by constructing essential structures like Pylons, Gateways, and tech buildings in a timely manner. - Prioritize tech advancements and unit production by researching critical upgrades such as Warp Gate technology and other unit-specific enhancements. - Ensure a balanced distribution of resources be...

work page

[50] [50]

- Practice effective spell usage, proper unit positioning, and target prioritization to gain an advantage in battles

**Micro Management**: - Enhance unit control and micro techniques during engagements to optimize positioning, focus fire, and utilize unit abilities effectively. - Practice effective spell usage, proper unit positioning, and target prioritization to gain an advantage in battles. - Pay attention to unit formations, flanking maneuvers, and retreat strategie...

work page

[51] [52]

- Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities

**Upgrades**: - Prioritize essential upgrades such as Warp Gate technology, attack, and armor upgrades to enhance the effectiveness of your units in combat. - Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities. - Scout enemy upgrades and adjust your upgrade timings to remain competitive in battles

work page

[52] [53]

- Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage

**Decision Making**: - Make effective decisions throughout the game, including timely expansions, tech advancements, and unit compositions based on scouting information. - Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage. - Adapt your strategy dynamically based on the evolving game state and en...

work page

[53] [54]

Self Reflection-1 Fig.B2. Self Reflection1 with our method against Very Hard built-in AI

**Key Time Points and Recommendations**: - **00:30** - Start with a standard opening build order, focus on Probe production, and begin scouting with a Probe to gather information about the enemy's strategy. - **03:00** - Prioritize expanding to additional Nexuses for increased resource income and build more Pylons to support supply cap and unit production...

work page

[54] [55]

- Expand to new bases and construct additional Pylons to increase supply cap and facilitate unit production

**Economy Management**: - Focus on continuous Probe production to maximize resource collection and support overall economy growth. - Expand to new bases and construct additional Pylons to increase supply cap and facilitate unit production. - Manage resources efficiently to avoid resource floating while maintaining a steady balance between mineral and gas income

work page

[55] [56]

- Consider using observers or other scouting units to gather critical intel on the enemy's base and key locations

**Scouting**: - Implement a proactive scouting strategy by sending Probes or units like Zealots to gather information about the enemy's tech and army composition. - Consider using observers or other scouting units to gather critical intel on the enemy's base and key locations. - Adapt your strategy based on scouting information to make informed decisions ...

work page

[56] [57]

- Prioritize tech advancements and unit production by researching essential upgrades like Warp Gate technology and other unit-specific enhancements

**Macro Management**: - Improve infrastructure development by constructing essential structures like Pylons, Gateways, and tech buildings in a timely manner. - Prioritize tech advancements and unit production by researching essential upgrades like Warp Gate technology and other unit-specific enhancements. - Maintain a balanced distribution of resources be...

work page

[57] [58]

- Practice effective spell usage, proper unit positioning, and target prioritization in battles to gain an advantage

**Micro Management**: - Enhance unit control and micro techniques during engagements to optimize positioning, focus fire, and utilize unit abilities effectively. - Practice effective spell usage, proper unit positioning, and target prioritization in battles to gain an advantage. - Pay attention to unit formations, flanking maneuvers, and retreat strategie...

work page

[58] [59]

- Tech up appropriately and diversify your unit mix based on the opponent's army composition

**Army Composition**: - Evaluate your unit composition relative to the enemy's and adjust accordingly to have the right counters. - Tech up appropriately and diversify your unit mix based on the opponent's army composition. - Consider incorporating a mix of Zealots, Stalkers, and other unit types to create a well-rounded army capable of handling various threats

work page

[59] [60]

- Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities

**Upgrades**: - Prioritize essential upgrades such as Warp Gate technology, attack, and armor upgrades to enhance unit effectiveness in combat. - Research upgrades like Blink for Stalkers and charge for Zealots to improve their combat capabilities. - Scout enemy upgrades and adjust your own upgrade timings to stay competitive in engagements

work page

[60] [61]

- Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage

**Decision Making**: - Make effective decisions throughout the game, including timely expansions, tech advancements, and unit compositions based on scouting information. - Utilize harassment tactics like Warp Prism drops to disrupt the enemy's economy and gain a strategic advantage. - Adapt your strategy dynamically based on the evolving game state and en...

work page

[61] [62]

- **03:00**: Expand to additional Nexuses for increased resource income and build more Pylons to support supply cap and unit production

**Key Time Points and Recommendations**: - **00:00**: Start with a standard opening build order, prioritize Probe production, and begin scouting with a Probe for crucial information. - **03:00**: Expand to additional Nexuses for increased resource income and build more Pylons to support supply cap and unit production. - **06:00**: Complete Warpgate resear...

work page