Game Master LLM: Task-Based Role-Playing for Natural Slang Learning

Amir Tahmasbi; Aniket Bera; Judson Wright; Milad Esrafilian; Sooyeon Jeong

arxiv: 2511.15504 · v2 · submitted 2025-11-19 · 💻 cs.HC

Game Master LLM: Task-Based Role-Playing for Natural Slang Learning

Amir Tahmasbi , Milad Esrafilian , Judson Wright , Sooyeon Jeong , Aniket Bera This is my paper

Pith reviewed 2026-05-17 20:39 UTC · model grok-4.3

classification 💻 cs.HC

keywords slang acquisitionrole-playing gameLLM game mastertask-based learningsecond language vocabularyimmersive dialogueretention study

0 comments

The pith

An LLM role-playing game with a Game Master leads to better slang comprehension, use, and week-long retention than a virtual classroom.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether an immersive, task-based role-playing setup powered by GPT-4o can help second-language learners pick up and keep casual slang phrases that formal study often misses. Participants choose five unfamiliar expressions, then carry on spoken dialogues with non-player characters while a Game Master weaves the phrases into natural context and a Practice Box tracks progress in real time. A between-subjects trial with 14 international students found the role-play group outperformed the control group on immediate tests of understanding and sentence use, and the advantage held after one week with reported gains of 21-27 percent. The work matters because fluent everyday speech depends on idiomatic expressions that learners rarely acquire through drills alone. If the pattern holds, narrative-driven LLM interactions could supply the missing bridge between classroom accuracy and spontaneous, context-appropriate speech.

Core claim

The central claim is that a GPT-4o-based Game Master guiding learners through a three-phase spoken narrative, with implicit input enhancement via natural phrase incorporation and explicit support from a Practice Box plus post-session feedback, produces larger gains in both comprehension and contextual use of target slang than a traditional AI-led virtual classroom, and that these gains persist over a one-week delay.

What carries the argument

The Game Master LLM that embeds chosen slang phrases into ongoing open-ended dialogue with non-player characters while a dedicated Practice Box supplies real-time explicit tracking and encouragement.

If this is right

The RPG group shows larger immediate gains in both phrase comprehension and accurate contextual use in sentences.
These gains remain detectable after one week, with the role-play condition recording a 21-27 percent improvement over the control.
Qualitative responses indicate the game supplies more practice opportunities and feels more natural than classroom-style instruction.
The combination of implicit contextual exposure and explicit tracking supports longer-term retention of casual expressions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same narrative-plus-tracking design could be tested on other hard-to-teach language features such as pragmatic routines or culture-specific idioms.
If the retention edge persists at scale, developers might embed similar Game Master modules inside existing language apps to reach learners outside formal classes.
The approach raises the question of whether the benefit comes mainly from the story structure or from the real-time adaptive feedback the LLM can provide.

Load-bearing premise

The observed advantage for the role-play group can be credited to the RPG structure rather than to differences in how long participants practiced or how engaging they found each condition.

What would settle it

A replication study that equalizes total practice time across conditions, records engagement ratings, and still finds no reliable difference in one-week retention rates between the role-play and classroom groups.

Figures

Figures reproduced from arXiv: 2511.15504 by Amir Tahmasbi, Aniket Bera, Judson Wright, Milad Esrafilian, Sooyeon Jeong.

**Figure 2.** Figure 2: Overview of Game Modules and LLM Narrative Generation: The agent receives the core game materials, including a [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 5.** Figure 5: The visual interface of the AI english class. The [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 4.** Figure 4: Figure 4: Interface of the Game. Colored circles [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 6.** Figure 6: Task Flow: (1) Initial assessment: participants are [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of self-reported engagement and feed [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗

read the original abstract

Natural and idiomatic expressions are essential for fluent, everyday communication, yet many second-language learners struggle to acquire and spontaneously use casual slang despite strong formal proficiency. To address this gap, we designed and evaluated an LLM-powered, task-based role-playing game in which a GPT-4o-based Game Master guides learners through an immersive, three-phase spoken narrative. After selecting five unfamiliar slang phrases to practice, participants engage in open-ended dialogue with non-player characters; the Game Master naturally incorporates the target phrases in rich semantic contexts (implicit input enhancement) while a dedicated Practice Box provides real-time explicit tracking and encouragement. Post-session, learners receive multi-level formative feedback analyzing the entire interaction. We evaluated the system in a between-subjects study with 14 international graduate students, randomly assigned to either the RPG condition or a control condition consisting of a traditional AI-led virtual classroom. Results from an immediate post-test show that the RPG group achieved greater gains in both comprehension of the target phrases and their accurate, contextual use in sentences. A one-week delayed post-test further demonstrates that these gains are retained over time, with the RPG group showing a 21-27% improvement, indicating the effectiveness of our approach in supporting longer-term learning. Qualitative survey responses assessing engagement and perceived effectiveness further indicate that the game-based approach provided more practice opportunities and a more natural learning experience. These findings highlight the potential of narrative-driven LLM interactions in vocabulary acquisition.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a small pilot applying an LLM Game Master to slang practice via role-play, with directional retention gains that still need better controls and stats to hold up.

read the letter

The main point is that the authors built a GPT-4o Game Master that runs a three-phase spoken narrative, slips in target slang phrases naturally, and tracks progress in a Practice Box. They compared it to a standard AI classroom setup with 14 grad students and saw better immediate comprehension plus 21-27% retention after a week in the RPG arm, plus higher engagement in the surveys. That setup is a straightforward but concrete engineering choice that fills a practical gap in spoken slang practice for adult learners.

Referee Report

2 major / 2 minor

Summary. The manuscript presents Game Master LLM, a GPT-4o-based task-based role-playing system for second-language learners to acquire natural slang. Learners select five target phrases and interact in an immersive three-phase spoken narrative where the Game Master provides implicit input enhancement by incorporating phrases in context, supported by a Practice Box for real-time tracking and post-session multi-level feedback. A between-subjects evaluation with 14 international graduate students (randomly assigned to RPG or traditional AI-led virtual classroom control) reports greater immediate post-test gains in comprehension and contextual sentence use for the RPG group, plus 21-27% retention advantage on a one-week delayed post-test, along with qualitative indications of higher engagement and practice opportunities.

Significance. If the results hold after addressing methodological gaps, the work offers a concrete demonstration of how narrative-driven LLM interactions can support longer-term retention and spontaneous use of idiomatic expressions, a persistent challenge in language learning. The integration of implicit enhancement, open-ended dialogue, and formative feedback represents a promising HCI direction for educational applications, with potential to inform design of immersive language tools.

major comments (2)

[Abstract / Evaluation] Abstract and Evaluation section: The central claim of greater gains and 21-27% delayed improvement in the RPG condition is presented as directional evidence of effectiveness, yet no p-values, effect sizes, confidence intervals, or statistical test details are reported. With only seven participants per cell and no power analysis or pre-registration mentioned, the reliability of the between-subjects differences cannot be assessed and remains compatible with sampling variability.
[Study Design / Results] Study Design / Results: The attribution of post-test and retention advantages to the three-phase narrative, implicit input enhancement, and Practice Box requires that the control condition was matched for time-on-task and target-phrase exposure. No session-duration logs, exposure counts, or statistical controls for these factors are described, leaving the observed differences vulnerable to confounds from unequal practice opportunities or engagement levels.

minor comments (2)

[Abstract] The abstract states that participants 'engage in open-ended dialogue with non-player characters' but does not specify how many NPCs or dialogue turns were involved; adding this detail would aid reproducibility.
[Evaluation] Qualitative survey responses are mentioned as supporting higher engagement, but the specific items, response format, or analysis method are not described; a brief methods note would improve transparency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help us improve the clarity and rigor of our evaluation. We address each major comment below and indicate the revisions we plan to make.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and Evaluation section: The central claim of greater gains and 21-27% delayed improvement in the RPG condition is presented as directional evidence of effectiveness, yet no p-values, effect sizes, confidence intervals, or statistical test details are reported. With only seven participants per cell and no power analysis or pre-registration mentioned, the reliability of the between-subjects differences cannot be assessed and remains compatible with sampling variability.

Authors: We agree that statistical details are essential for interpreting the results, especially with a small sample of seven participants per group. In the revised version, we will add p-values from appropriate tests (such as independent t-tests or non-parametric equivalents), effect sizes (Cohen's d), and 95% confidence intervals for the reported gains in comprehension, sentence use, and the 21-27% retention advantage. We will also explicitly state that no power analysis was conducted and the study was not pre-registered, framing the findings as preliminary and exploratory. The retention figures derive from the proportion of phrases correctly recalled or used on the delayed post-test, and we will include a summary of the underlying data for transparency. revision: yes
Referee: [Study Design / Results] Study Design / Results: The attribution of post-test and retention advantages to the three-phase narrative, implicit input enhancement, and Practice Box requires that the control condition was matched for time-on-task and target-phrase exposure. No session-duration logs, exposure counts, or statistical controls for these factors are described, leaving the observed differences vulnerable to confounds from unequal practice opportunities or engagement levels.

Authors: We recognize the need to rule out confounds related to unequal practice opportunities. The control condition was designed as an AI-led virtual classroom with equivalent time allocated for phrase introduction, practice, and feedback, matching the overall session length of the RPG condition. Both conditions used the same five target phrases selected by participants. However, we did not record precise per-session duration logs or count the number of times each phrase was encountered during interactions. In the revision, we will elaborate on the control condition's structure to show intended equivalence and acknowledge this as a limitation that future studies should address with automated logging. The random assignment and focus on the same target phrases help support the attribution to the narrative and enhancement features, though we agree that explicit controls would strengthen causal claims. revision: partial

Circularity Check

0 steps flagged

Empirical user study with independent outcome measures; no derivation or self-referential reduction present.

full rationale

This paper reports results from a between-subjects user study (n=14) comparing an LLM-based RPG condition to a traditional AI-led classroom control. The central claims rest on measured post-test gains in comprehension and contextual use, plus one-week retention differences, which are external empirical observations collected after system use. No equations, first-principles derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The outcome data are independent of the system description and do not reduce to author-defined inputs by construction. This is a standard empirical evaluation whose validity can be assessed against the reported experimental controls rather than any internal definitional loop.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the observed post-test differences are caused by the RPG features rather than extraneous variables such as novelty or time spent. No free parameters or invented entities are introduced. The study implicitly assumes standard language-acquisition principles (contextual exposure aids retention) without deriving them.

axioms (1)

domain assumption Task-based role-play with implicit enhancement produces measurable gains in slang comprehension and production beyond those from a standard AI classroom.
Invoked in the interpretation of the between-subjects results in the evaluation section.

pith-pipeline@v0.9.0 · 5569 in / 1488 out tokens · 35691 ms · 2026-05-17T20:39:26.545629+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The system operates as a dyadic spoken turn-based interaction... three main phases... Phase 1: Preparation... Phase 2: Exploration... Phase 3: Strategy...
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Normalized Growth Rate... Definition Accuracy 0.822 vs 0.880

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

R. Atlas. 2023. Intelligent Chatbots in Language Learning: Opportunities and Limitations.Journal of Applied Linguistics and Language Research(2023)

work page 2023
[2]

Yasin Babazade. 2024. The Impact of Digital Tools on Vocabulary Development in Second Language Learning.Journal of Azerbaijan Language and Education Studies1 (11 2024), 35–41. doi:10.69760/jales.2024.00103

work page doi:10.69760/jales.2024.00103 2024
[3]

Brinton, Marguerite Ann Snow, and Marjorie Bingham Wesche

Donna M. Brinton, Marguerite Ann Snow, and Marjorie Bingham Wesche. 1989. Content-Based Second Language Instruction. Newbury House Publishers

work page 1989
[4]

P. Brown. 2020. Chatbots and L2 Fluency Development: A Case for Real-Time Dialogue.Modern Language Journal104, 4 (2020), 601–621

work page 2020
[5]

2018.Complexity, Accuracy, and Fluency

Gavin Bui and Peter Skehan. 2018.Complexity, Accuracy, and Fluency. Wiley. doi:10.1002/9781118784235.eelt0046

work page doi:10.1002/9781118784235.eelt0046 2018
[6]

Jaf, and Kenneth J

Guendalina Caldarini, Sardar F. Jaf, and Kenneth J. McGarry. 2021. A Literature Survey of Recent Advances in Chatbots.Information13, 1 (2021), 41. doi:10.3390/ info13010041

work page 2021
[7]

Shih-Chuan Chang. 2011. A Contrastive Study of Grammar Translation Method and Communicative Approach in Teaching English Grammar.English Language Teaching4, 2 (2011), 13–24. doi:10.5539/elt.v4n2p13

work page doi:10.5539/elt.v4n2p13 2011
[8]

Yang Chen, Luying Zhang, and Hua Yin. 2022. A Longitudinal Study on Students’ Foreign Language Anxiety and Cognitive Load in Gamified Classes of Higher Education.Sustainability14 (08 2022), 10905. doi:10.3390/su141710905

work page doi:10.3390/su141710905 2022
[9]

K. M. Chuah and M. K. Kabilan. 2021. Chatbots for Language Learning: Students’ Experiences and Attitudes.Computer Assisted Language Learning(2021)

work page 2021
[10]

Charles A. Curran. 1976.Counseling-Learning in Second Languages. Apple River Press

work page 1976
[11]

Dakhalan and John Carlo M

Amer M. Dakhalan and John Carlo M. Tanucan. 2024. The Direct Method in Language Teaching: A Literature Review of Its Effectiveness.Lingeduca: Journal of Language and Education Studies3, 2 (2024), 130–143. doi:10.70177/lingeduca. v3i2.1354

work page doi:10.70177/lingeduca 2024
[12]

Christiane Dalton-Puffer. 2011. Content-and-Language Integrated Learning: From Practice to Principles?Annual Review of Applied Linguistics31 (2011), 182–204. doi:10.1017/S0267190511000092

work page doi:10.1017/s0267190511000092 2011
[13]

2003.Task-Based Language Learning and Teaching

Rod Ellis. 2003.Task-Based Language Learning and Teaching. Oxford University Press

work page 2003
[14]

Yannakakis

Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, and Georgios N. Yannakakis. 2024. Large Language Models and Games: A Survey and Roadmap.arXiv preprint arXiv:2402.18659(2024). https://arxiv.org/abs/2402.18659

work page arXiv 2024
[15]

Google Cloud. 2023. Cloud Text-to-Speech Documentation. https://cloud.google. com/text-to-speech. Accessed: 2024-05-04

work page 2023
[16]

Haristiani, T

N. Haristiani, T. Wijaya, and R. Lestari. 2019. Gengobot: A grammar and vo- cabulary chatbot for Japanese language learning. InProceedings of the 2019 International Conference on Language, Literature, and Education (ICLLE)

work page 2019
[17]

Huang and Y

Y. Huang and Y. Wang. 2022. Integrating AI Chatbots into Language Education: A Review.International Journal of Emerging Technologies in Learning17, 3 (2022), 23–37

work page 2022
[18]

IELTS. 2024. How IELTS is Scored. https://www.ielts.org/about-ielts/how-ielts- is-scored. Accessed: 2025-06-01

work page 2024
[19]

Chinaza Solomon Ironsi. 2023. Investigating the Use of Virtual Reality to Im- prove Speaking Skills: Insights from Students and Teachers.Smart Learning Environments10, 53 (2023). doi:10.1186/s40561-023-00272-8

work page doi:10.1186/s40561-023-00272-8 2023
[20]

M. Jeon. 2021. A Review of Chatbot Use in Language Learning.Language Learning & Technology25, 1 (2021), 1–15

work page 2021
[21]

Jiyou Jia and Meixian Ruan. 2008. Use Chatbot CSIEC to Facilitate the Individual Learning in English Instruction: A Case Study. InLecture Notes in Computer Science, Vol. 5091. Springer, 706–708. doi:10.1007/978-3-540-69132-7_84

work page doi:10.1007/978-3-540-69132-7_84 2008
[22]

Johnson and Roger T

David W. Johnson and Roger T. Johnson. 1994.Learning Together and Alone: Cooperative, Competitive, and Individualistic Learning(4 ed.). Allyn & Bacon

work page 1994
[23]

Khamouja, M

A. Khamouja, M. Ben Mohamed, and A. El Ghouati. 2023. The Importance of Role- Playing Activities in Developing Students’ Speaking Competence.International Journal of Social Science and Human Research6, 5 (2023), 2150–2156

work page 2023
[24]

S. Kim. 2020. The Effects of Chatbots on Language Learning: A Meta-Analysis. Journal of Language Education36, 4 (2020), 56–67

work page 2020
[25]

Kamalesh Kumar and C

P. Kamalesh Kumar and C. Vairavan. 2024. The Impact of Gamification on Motivation and Retention in Language Learning: An Experimental Study Using a Gamified Language Learning Application.INTI Journal2024, 44 (2024), 1–15. doi:10.1234/inti.journal.2024.44

work page doi:10.1234/inti.journal.2024.44 2024
[26]

R. Lewis. 2020. The Use of Real-Time AI Translation Tools in Foreign Language Learning.Language Teaching Today(2020)

work page 2020
[27]

Jing Li. 2023. A Review of Studies on Task-Based Language Teaching.Journal of Language Teaching and Research14, 1 (2023), 1–10. doi:10.17507/jltr.1401.01

work page doi:10.17507/jltr.1401.01 2023
[28]

Shaofeng Li. 2010. The Effectiveness of Corrective Feedback in SLA: A Meta- Analysis.Language Learning60 (02 2010), 309 – 365. doi:10.1111/j.1467-9922. 2010.00561.x

work page doi:10.1111/j.1467-9922 2010
[29]

Lin Lin and Ariel M. Aloe. 2023. Game-based learning in early childhood educa- tion: A systematic review and meta-analysis.Frontiers in Psychology14 (2023), 1307881. doi:10.3389/fpsyg.2024.1307881

work page doi:10.3389/fpsyg.2024.1307881 2023
[30]

F. Liu. 2010. Role-play in English Language Teaching.Asian Social Science6, 10 (2010), 140–144

work page 2010
[31]

C. K. Ly. 2024. Applying Role-Play Technique on Improving EFL Students’ Lan- guage Learning: A Case Study at a Vietnamese University.Journal of Knowledge and Language Studies5, 1 (2024), 45–56

work page 2024
[32]

Qing Ma, Peter Crosthwaite, Daner Sun, and Di Zou. 2024. Exploring Chat- GPT Literacy in Language Education: A Global Perspective and Comprehensive Approach.Computers and Education: Artificial Intelligence7 (2024), 100278. doi:10.1016/j.caeai.2024.100278

work page doi:10.1016/j.caeai.2024.100278 2024
[33]

Cagri Tugrul Mart. 2013. The Audio-Lingual Method: An Easy Way of Achieving Speech.International Journal of Academic Research in Business and Social Sciences 3, 12 (2013), 63–65. doi:10.6007/IJARBSS/v3-i12/412

work page doi:10.6007/ijarbss/v3-i12/412 2013
[34]

Institute of International Education. 2023. Open Doors 2023 Report on In- ternational Educational Exchange. https://opendoorsdata.org/annual-release/ international-students/. Accessed: 2025-06-01

work page 2023
[35]

OpenAI. 2023. GPT-4 Technical Report. https://openai.com/research/gpt-4 Accessed: 2025-06-01

work page 2023
[36]

Mengxu Pan, Alexandra Kitson, Hongyu Wan, and Mirjana Prpa. 2024. ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR.arXiv preprint arXiv:2410.02406(2024)

work page arXiv 2024
[37]

Panagiotis Panagiotidis. 2024. LLM-Based Chatbots in Language Learning. In European Journal of Education, Vol. 7. 102–122

work page 2024
[38]

Jaekwon Park, Jiyoung Bae, Unggi Lee, Taekyung Ahn, Sookbun Lee, Dohee Kim, Aram Choi, Yeil Jeong, Jewoong Moon, and Hyeoncheol Kim. 2024. How to Align Large Language Models for Teaching English? Designing and Developing LLM-based Chatbot for Teaching English Conversation in EFL, Findings and Limitations.arXiv preprint arXiv:2409.04987(2024)

work page arXiv 2024
[39]

Petersen, C

M. Petersen, C. Medel, Y. Lu, and A. Abhari. 2024. Virtual Reality Role-Playing for Language Learning: Immersion and Feedback in a Multimodal System. In Proceedings of Eurographics 2024 - Education Papers. https://diglib.eg.org/handle/ 10.2312/eged20241037

work page doi:10.2312/eged20241037 2024
[40]

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. https://openai.com/research/whisper. OpenAI

work page 2022
[41]

Renau Renau and Maria Luisa. 2016. A Review of the Traditional and Current Lan- guage Teaching Methods. https://api.semanticscholar.org/CorpusID:54879390

work page 2016
[42]

Richards and Theodore S

Jack C. Richards and Theodore S. Rodgers. 2001.Approaches and Methods in Language Teaching(2 ed.). Cambridge University Press

work page 2001
[43]

Sherry Ruan, Liwei Jiang, Qianyao Xu, Zhiyuan Liu, Glenn M Davis, Emma Brunskill, and James A. Landay. 2021. EnglishBot: An AI-Powered Conversational System for Second Language Learning. InProceedings of the 26th International Conference on Intelligent User Interfaces(College Station, TX, USA)(IUI ’21). Association for Computing Machinery, New York, NY, U...

work page arXiv 2021
[44]

Robin Schmucker, Meng Xia, Amos Azaria, and Tom Mitchell. 2024. Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conver- sational Tutoring System.arXiv preprint arXiv:2404.17460(2024)

work page arXiv 2024
[45]

Educational Testing Service. 2024. TOEFL iBT Test Content. https://www.ets. org/toefl/test-takers/ibt/about/content/. Accessed: 2025-06-01

work page 2024
[46]

Rustam Shadiev and Yingying Feng. 2024. Using automated corrective feedback tools in language learning: a review study.Interactive Learning Environments32, 10 (2024), 2538–2566. doi:10.1080/10494820.2022.2153145

work page doi:10.1080/10494820.2022.2153145 2024
[47]

Alex Shashkevich. 2019. The Power of Language: How Words Shape People, Culture.Stanford News(2019). https://news.stanford.edu/2019/08/22/the-power- of-language-how-words-shape-people-culture/

work page 2019
[48]

Zijun Shen, Minjie Lai, and Fei Wang. 2024. Investigating the Influence of Gami- fication on Motivation and Learning Outcomes in Online Language Learning. Frontiers in Psychology15 (2024), 1295709. doi:10.3389/fpsyg.2024.1295709

work page doi:10.3389/fpsyg.2024.1295709 2024
[49]

Robert E. Slavin. 1995.Cooperative Learning: Theory, Research, and Practice(2 ed.). Allyn & Bacon

work page 1995
[50]

John Smith and Jane Doe. 2024. The Cognitive and Motivational Benefits of Gam- ification in English Language Learning: A Systematic Review.Open Psychology Journal18 (2024), e18743501359379. doi:10.2174/18743501359379

work page doi:10.2174/18743501359379 2024
[51]

Chuanxiang Song, Seong-Yoon Shin, and Kwang-Seong Shin. 2023. Optimizing Foreign Language Learning in Virtual Reality: A Comprehensive Theoretical Framework Based on Constructivism and Cognitive Load Theory (VR-CCL). Applied Sciences13, 23 (2023), 12557. doi:10.3390/app132312557

work page doi:10.3390/app132312557 2023
[52]

Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine.Commun. ACM9, 1 (1966), 36–45. doi:10.1145/365153.365168

work page doi:10.1145/365153.365168 1966
[53]

Zagal and Sebastian Deterding

José P. Zagal and Sebastian Deterding. 2018. Definitions of Role-Playing Games. InRole-Playing Game Studies. Routledge, 19–52

work page 2018
[54]

Zhang and Y

Y. Zhang and Y. Luo. 2021. The dyadic interaction model of relationship quality and the impact of attachment orientation and empathy.Journal of Advanced Nursing77, 4 (2021), 1774–1783

work page 2021

[1] [1]

R. Atlas. 2023. Intelligent Chatbots in Language Learning: Opportunities and Limitations.Journal of Applied Linguistics and Language Research(2023)

work page 2023

[2] [2]

Yasin Babazade. 2024. The Impact of Digital Tools on Vocabulary Development in Second Language Learning.Journal of Azerbaijan Language and Education Studies1 (11 2024), 35–41. doi:10.69760/jales.2024.00103

work page doi:10.69760/jales.2024.00103 2024

[3] [3]

Brinton, Marguerite Ann Snow, and Marjorie Bingham Wesche

Donna M. Brinton, Marguerite Ann Snow, and Marjorie Bingham Wesche. 1989. Content-Based Second Language Instruction. Newbury House Publishers

work page 1989

[4] [4]

P. Brown. 2020. Chatbots and L2 Fluency Development: A Case for Real-Time Dialogue.Modern Language Journal104, 4 (2020), 601–621

work page 2020

[5] [5]

2018.Complexity, Accuracy, and Fluency

Gavin Bui and Peter Skehan. 2018.Complexity, Accuracy, and Fluency. Wiley. doi:10.1002/9781118784235.eelt0046

work page doi:10.1002/9781118784235.eelt0046 2018

[6] [6]

Jaf, and Kenneth J

Guendalina Caldarini, Sardar F. Jaf, and Kenneth J. McGarry. 2021. A Literature Survey of Recent Advances in Chatbots.Information13, 1 (2021), 41. doi:10.3390/ info13010041

work page 2021

[7] [7]

Shih-Chuan Chang. 2011. A Contrastive Study of Grammar Translation Method and Communicative Approach in Teaching English Grammar.English Language Teaching4, 2 (2011), 13–24. doi:10.5539/elt.v4n2p13

work page doi:10.5539/elt.v4n2p13 2011

[8] [8]

Yang Chen, Luying Zhang, and Hua Yin. 2022. A Longitudinal Study on Students’ Foreign Language Anxiety and Cognitive Load in Gamified Classes of Higher Education.Sustainability14 (08 2022), 10905. doi:10.3390/su141710905

work page doi:10.3390/su141710905 2022

[9] [9]

K. M. Chuah and M. K. Kabilan. 2021. Chatbots for Language Learning: Students’ Experiences and Attitudes.Computer Assisted Language Learning(2021)

work page 2021

[10] [10]

Charles A. Curran. 1976.Counseling-Learning in Second Languages. Apple River Press

work page 1976

[11] [11]

Dakhalan and John Carlo M

Amer M. Dakhalan and John Carlo M. Tanucan. 2024. The Direct Method in Language Teaching: A Literature Review of Its Effectiveness.Lingeduca: Journal of Language and Education Studies3, 2 (2024), 130–143. doi:10.70177/lingeduca. v3i2.1354

work page doi:10.70177/lingeduca 2024

[12] [12]

Christiane Dalton-Puffer. 2011. Content-and-Language Integrated Learning: From Practice to Principles?Annual Review of Applied Linguistics31 (2011), 182–204. doi:10.1017/S0267190511000092

work page doi:10.1017/s0267190511000092 2011

[13] [13]

2003.Task-Based Language Learning and Teaching

Rod Ellis. 2003.Task-Based Language Learning and Teaching. Oxford University Press

work page 2003

[14] [14]

Yannakakis

Roberto Gallotta, Graham Todd, Marvin Zammit, Sam Earle, Antonios Liapis, Julian Togelius, and Georgios N. Yannakakis. 2024. Large Language Models and Games: A Survey and Roadmap.arXiv preprint arXiv:2402.18659(2024). https://arxiv.org/abs/2402.18659

work page arXiv 2024

[15] [15]

Google Cloud. 2023. Cloud Text-to-Speech Documentation. https://cloud.google. com/text-to-speech. Accessed: 2024-05-04

work page 2023

[16] [16]

Haristiani, T

N. Haristiani, T. Wijaya, and R. Lestari. 2019. Gengobot: A grammar and vo- cabulary chatbot for Japanese language learning. InProceedings of the 2019 International Conference on Language, Literature, and Education (ICLLE)

work page 2019

[17] [17]

Huang and Y

Y. Huang and Y. Wang. 2022. Integrating AI Chatbots into Language Education: A Review.International Journal of Emerging Technologies in Learning17, 3 (2022), 23–37

work page 2022

[18] [18]

IELTS. 2024. How IELTS is Scored. https://www.ielts.org/about-ielts/how-ielts- is-scored. Accessed: 2025-06-01

work page 2024

[19] [19]

Chinaza Solomon Ironsi. 2023. Investigating the Use of Virtual Reality to Im- prove Speaking Skills: Insights from Students and Teachers.Smart Learning Environments10, 53 (2023). doi:10.1186/s40561-023-00272-8

work page doi:10.1186/s40561-023-00272-8 2023

[20] [20]

M. Jeon. 2021. A Review of Chatbot Use in Language Learning.Language Learning & Technology25, 1 (2021), 1–15

work page 2021

[21] [21]

Jiyou Jia and Meixian Ruan. 2008. Use Chatbot CSIEC to Facilitate the Individual Learning in English Instruction: A Case Study. InLecture Notes in Computer Science, Vol. 5091. Springer, 706–708. doi:10.1007/978-3-540-69132-7_84

work page doi:10.1007/978-3-540-69132-7_84 2008

[22] [22]

Johnson and Roger T

David W. Johnson and Roger T. Johnson. 1994.Learning Together and Alone: Cooperative, Competitive, and Individualistic Learning(4 ed.). Allyn & Bacon

work page 1994

[23] [23]

Khamouja, M

A. Khamouja, M. Ben Mohamed, and A. El Ghouati. 2023. The Importance of Role- Playing Activities in Developing Students’ Speaking Competence.International Journal of Social Science and Human Research6, 5 (2023), 2150–2156

work page 2023

[24] [24]

S. Kim. 2020. The Effects of Chatbots on Language Learning: A Meta-Analysis. Journal of Language Education36, 4 (2020), 56–67

work page 2020

[25] [25]

Kamalesh Kumar and C

P. Kamalesh Kumar and C. Vairavan. 2024. The Impact of Gamification on Motivation and Retention in Language Learning: An Experimental Study Using a Gamified Language Learning Application.INTI Journal2024, 44 (2024), 1–15. doi:10.1234/inti.journal.2024.44

work page doi:10.1234/inti.journal.2024.44 2024

[26] [26]

R. Lewis. 2020. The Use of Real-Time AI Translation Tools in Foreign Language Learning.Language Teaching Today(2020)

work page 2020

[27] [27]

Jing Li. 2023. A Review of Studies on Task-Based Language Teaching.Journal of Language Teaching and Research14, 1 (2023), 1–10. doi:10.17507/jltr.1401.01

work page doi:10.17507/jltr.1401.01 2023

[28] [28]

Shaofeng Li. 2010. The Effectiveness of Corrective Feedback in SLA: A Meta- Analysis.Language Learning60 (02 2010), 309 – 365. doi:10.1111/j.1467-9922. 2010.00561.x

work page doi:10.1111/j.1467-9922 2010

[29] [29]

Lin Lin and Ariel M. Aloe. 2023. Game-based learning in early childhood educa- tion: A systematic review and meta-analysis.Frontiers in Psychology14 (2023), 1307881. doi:10.3389/fpsyg.2024.1307881

work page doi:10.3389/fpsyg.2024.1307881 2023

[30] [30]

F. Liu. 2010. Role-play in English Language Teaching.Asian Social Science6, 10 (2010), 140–144

work page 2010

[31] [31]

C. K. Ly. 2024. Applying Role-Play Technique on Improving EFL Students’ Lan- guage Learning: A Case Study at a Vietnamese University.Journal of Knowledge and Language Studies5, 1 (2024), 45–56

work page 2024

[32] [32]

Qing Ma, Peter Crosthwaite, Daner Sun, and Di Zou. 2024. Exploring Chat- GPT Literacy in Language Education: A Global Perspective and Comprehensive Approach.Computers and Education: Artificial Intelligence7 (2024), 100278. doi:10.1016/j.caeai.2024.100278

work page doi:10.1016/j.caeai.2024.100278 2024

[33] [33]

Cagri Tugrul Mart. 2013. The Audio-Lingual Method: An Easy Way of Achieving Speech.International Journal of Academic Research in Business and Social Sciences 3, 12 (2013), 63–65. doi:10.6007/IJARBSS/v3-i12/412

work page doi:10.6007/ijarbss/v3-i12/412 2013

[34] [34]

Institute of International Education. 2023. Open Doors 2023 Report on In- ternational Educational Exchange. https://opendoorsdata.org/annual-release/ international-students/. Accessed: 2025-06-01

work page 2023

[35] [35]

OpenAI. 2023. GPT-4 Technical Report. https://openai.com/research/gpt-4 Accessed: 2025-06-01

work page 2023

[36] [36]

Mengxu Pan, Alexandra Kitson, Hongyu Wan, and Mirjana Prpa. 2024. ELLMA-T: an Embodied LLM-agent for Supporting English Language Learning in Social VR.arXiv preprint arXiv:2410.02406(2024)

work page arXiv 2024

[37] [37]

Panagiotis Panagiotidis. 2024. LLM-Based Chatbots in Language Learning. In European Journal of Education, Vol. 7. 102–122

work page 2024

[38] [38]

Jaekwon Park, Jiyoung Bae, Unggi Lee, Taekyung Ahn, Sookbun Lee, Dohee Kim, Aram Choi, Yeil Jeong, Jewoong Moon, and Hyeoncheol Kim. 2024. How to Align Large Language Models for Teaching English? Designing and Developing LLM-based Chatbot for Teaching English Conversation in EFL, Findings and Limitations.arXiv preprint arXiv:2409.04987(2024)

work page arXiv 2024

[39] [39]

Petersen, C

M. Petersen, C. Medel, Y. Lu, and A. Abhari. 2024. Virtual Reality Role-Playing for Language Learning: Immersion and Feedback in a Multimodal System. In Proceedings of Eurographics 2024 - Education Papers. https://diglib.eg.org/handle/ 10.2312/eged20241037

work page doi:10.2312/eged20241037 2024

[40] [40]

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. https://openai.com/research/whisper. OpenAI

work page 2022

[41] [41]

Renau Renau and Maria Luisa. 2016. A Review of the Traditional and Current Lan- guage Teaching Methods. https://api.semanticscholar.org/CorpusID:54879390

work page 2016

[42] [42]

Richards and Theodore S

Jack C. Richards and Theodore S. Rodgers. 2001.Approaches and Methods in Language Teaching(2 ed.). Cambridge University Press

work page 2001

[43] [43]

Sherry Ruan, Liwei Jiang, Qianyao Xu, Zhiyuan Liu, Glenn M Davis, Emma Brunskill, and James A. Landay. 2021. EnglishBot: An AI-Powered Conversational System for Second Language Learning. InProceedings of the 26th International Conference on Intelligent User Interfaces(College Station, TX, USA)(IUI ’21). Association for Computing Machinery, New York, NY, U...

work page arXiv 2021

[44] [44]

Robin Schmucker, Meng Xia, Amos Azaria, and Tom Mitchell. 2024. Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conver- sational Tutoring System.arXiv preprint arXiv:2404.17460(2024)

work page arXiv 2024

[45] [45]

Educational Testing Service. 2024. TOEFL iBT Test Content. https://www.ets. org/toefl/test-takers/ibt/about/content/. Accessed: 2025-06-01

work page 2024

[46] [46]

Rustam Shadiev and Yingying Feng. 2024. Using automated corrective feedback tools in language learning: a review study.Interactive Learning Environments32, 10 (2024), 2538–2566. doi:10.1080/10494820.2022.2153145

work page doi:10.1080/10494820.2022.2153145 2024

[47] [47]

Alex Shashkevich. 2019. The Power of Language: How Words Shape People, Culture.Stanford News(2019). https://news.stanford.edu/2019/08/22/the-power- of-language-how-words-shape-people-culture/

work page 2019

[48] [48]

Zijun Shen, Minjie Lai, and Fei Wang. 2024. Investigating the Influence of Gami- fication on Motivation and Learning Outcomes in Online Language Learning. Frontiers in Psychology15 (2024), 1295709. doi:10.3389/fpsyg.2024.1295709

work page doi:10.3389/fpsyg.2024.1295709 2024

[49] [49]

Robert E. Slavin. 1995.Cooperative Learning: Theory, Research, and Practice(2 ed.). Allyn & Bacon

work page 1995

[50] [50]

John Smith and Jane Doe. 2024. The Cognitive and Motivational Benefits of Gam- ification in English Language Learning: A Systematic Review.Open Psychology Journal18 (2024), e18743501359379. doi:10.2174/18743501359379

work page doi:10.2174/18743501359379 2024

[51] [51]

Chuanxiang Song, Seong-Yoon Shin, and Kwang-Seong Shin. 2023. Optimizing Foreign Language Learning in Virtual Reality: A Comprehensive Theoretical Framework Based on Constructivism and Cognitive Load Theory (VR-CCL). Applied Sciences13, 23 (2023), 12557. doi:10.3390/app132312557

work page doi:10.3390/app132312557 2023

[52] [52]

Joseph Weizenbaum. 1966. ELIZA—a computer program for the study of natural language communication between man and machine.Commun. ACM9, 1 (1966), 36–45. doi:10.1145/365153.365168

work page doi:10.1145/365153.365168 1966

[53] [53]

Zagal and Sebastian Deterding

José P. Zagal and Sebastian Deterding. 2018. Definitions of Role-Playing Games. InRole-Playing Game Studies. Routledge, 19–52

work page 2018

[54] [54]

Zhang and Y

Y. Zhang and Y. Luo. 2021. The dyadic interaction model of relationship quality and the impact of attachment orientation and empathy.Journal of Advanced Nursing77, 4 (2021), 1774–1783

work page 2021