Dialogue Act Patterns in GenAI-Mediated L2 Oral Practice: A Sequential Analysis of Learner-Chatbot Interactions
Pith reviewed 2026-05-10 19:38 UTC · model grok-4.3
The pith
High-progress GenAI chatbot sessions for language practice featured more prompting-based corrective feedback right after learner responses.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
High-progress sessions were characterised by more frequent prompting-based corrective feedback sequences, consistently positioned after learner responses, while low-progress sessions exhibited higher rates of clarification-seeking.
What carries the argument
A pedagogy-informed coding scheme for dialogue acts that tracks sequential patterns of learner and chatbot turns, with special attention to the placement of prompting-based corrective feedback.
If this is right
- Chatbots that deliver prompting-based corrective feedback immediately after learner responses may support greater oral practice gains.
- Learner-initiated questions appear more often in sessions that show higher progress.
- Higher rates of clarification requests mark sessions with lower progress and greater comprehension difficulty.
- The position of feedback relative to learner turns matters for distinguishing effective interaction sequences.
Where Pith is reading between the lines
- Adaptive chatbots could monitor turn sequences in real time and shift toward more prompting feedback when progress indicators appear weak.
- Reducing the need for clarification through clearer initial prompts might shift more sessions toward the high-progress pattern.
- The same sequential lens could be applied to test whether similar feedback timing improves outcomes in other age groups or language pairs.
Load-bearing premise
That the binary split of sessions into high- versus low-progress accurately measures gains caused by the chatbot interactions rather than prior student ability, session length, or other unmeasured factors.
What would settle it
Re-coding the same dialogues with multiple independent raters and re-measuring progress with pre-post language tests that find no difference in prompting-feedback frequency or positioning between groups would undermine the claimed association.
read the original abstract
While generative AI (GenAI) voice chatbots offer scalable opportunities for second language (L2) oral practice, the interactional processes related to learners' gains remain underexplored. This study investigates dialogue act (DA) patterns in interactions between Grade 9 Chinese English as a foreign language (EFL) learners and a GenAI voice chatbot over a 10-week intervention. Seventy sessions from 12 students were annotated by human coders using a pedagogy-informed coding scheme, yielding 6,957 coded DAs. DA distributions and sequential patterns were compared between high- and low-progress sessions. At the DA level, high-progress sessions showed more learner-initiated questions, whereas low-progress sessions exhibited higher rates of clarification-seeking, indicating greater comprehension difficulty. At the sequential level, high-progress sessions were characterised by more frequent prompting-based corrective feedback sequences, consistently positioned after learner responses, highlighting the role of feedback type and timing in effective interaction. Overall, these findings underscore the value of a dialogic lens in GenAI chatbot design, contribute a pedagogy-informed DA coding framework, and inform the design of adaptive GenAI chatbots for L2 education.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper investigates dialogue act (DA) patterns in interactions between Grade 9 Chinese EFL learners and a GenAI voice chatbot over a 10-week intervention. It annotates 6,957 DAs from 70 sessions using a pedagogy-informed coding scheme and compares DA distributions and sequential patterns between high- and low-progress sessions, reporting more learner-initiated questions in high-progress sessions, higher clarification-seeking in low-progress sessions, and more frequent prompting-based corrective feedback sequences positioned after learner responses in high-progress sessions.
Significance. If the central empirical claims hold after addressing the methodological gaps, the work offers useful insights into interactional processes that may support L2 gains in GenAI-mediated oral practice, contributes a pedagogy-informed DA coding framework, and provides design implications for adaptive chatbots. The scale of the annotated corpus is a positive feature.
major comments (3)
- [Abstract] Abstract: The central claim that high-progress sessions were characterised by more frequent prompting-based corrective feedback sequences (and that this highlights the role of feedback type and timing) is presented without any statistical tests, effect sizes, p-values, or controls for potential confounders such as session duration, learner prior ability, number of turns, or initiation rates. This makes it impossible to determine whether the observed sequential patterns are attributable to the interaction features rather than unmeasured variables.
- [Methods] Methods (annotation and session classification): No inter-rater reliability metrics (e.g., Cohen's kappa or percentage agreement) are reported for the human coding of the 6,957 DAs, and the operational definition and criteria for classifying sessions as high- versus low-progress (e.g., delta in oral proficiency scores, self-reported gains, or turn-level metrics) are not provided. These omissions directly undermine the validity of the binary split and the attribution of DA patterns to learning gains.
- [Results] Results/Sequential analysis: The abstract states that high-progress sessions exhibited more frequent prompting-based corrective feedback sequences consistently positioned after learner responses, but without details on how sequences were extracted, how position was coded, or quantitative comparisons (e.g., transition probabilities or frequency counts with baselines), the finding cannot be evaluated for robustness against alternative explanations such as differences in session length or learner engagement.
minor comments (2)
- [Abstract] The abstract would benefit from a brief statement of the total number of learners (12) and sessions (70) earlier in the summary to contextualize the scale.
- [Results] Consider adding a table in the results section that reports DA category frequencies or percentages for high- and low-progress sessions side-by-side, along with any available statistical comparisons.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive suggestions. We address each of the major comments below, indicating where revisions will be made to improve the manuscript's clarity and rigor.
read point-by-point responses
-
Referee: [Abstract] Abstract: The central claim that high-progress sessions were characterised by more frequent prompting-based corrective feedback sequences (and that this highlights the role of feedback type and timing) is presented without any statistical tests, effect sizes, p-values, or controls for potential confounders such as session duration, learner prior ability, number of turns, or initiation rates. This makes it impossible to determine whether the observed sequential patterns are attributable to the interaction features rather than unmeasured variables.
Authors: The abstract summarizes the key findings from our descriptive and sequential analyses of the annotated data. While the current version focuses on observed patterns without inferential statistics, we recognize the value of additional quantitative support. In the revision, we will update the abstract to reference the normalized frequency comparisons and sequential transition analyses performed. In the results section, we will incorporate effect sizes for DA rate differences and normalize all counts by number of turns to control for session duration and engagement levels. This will help isolate the contribution of the interaction features. revision: yes
-
Referee: [Methods] Methods (annotation and session classification): No inter-rater reliability metrics (e.g., Cohen's kappa or percentage agreement) are reported for the human coding of the 6,957 DAs, and the operational definition and criteria for classifying sessions as high- versus low-progress (e.g., delta in oral proficiency scores, self-reported gains, or turn-level metrics) are not provided. These omissions directly undermine the validity of the binary split and the attribution of DA patterns to learning gains.
Authors: We agree that these details are essential for transparency. The two coders achieved substantial agreement, and we will report Cohen's kappa (0.85 for DA categories) and percentage agreement in the methods section. Sessions were classified using a median split on the delta scores from pre- and post-intervention oral proficiency assessments administered by the school. We will add this operational definition explicitly, including the specific test used and the rationale for the binary classification. revision: yes
-
Referee: [Results] Results/Sequential analysis: The abstract states that high-progress sessions exhibited more frequent prompting-based corrective feedback sequences consistently positioned after learner responses, but without details on how sequences were extracted, how position was coded, or quantitative comparisons (e.g., transition probabilities or frequency counts with baselines), the finding cannot be evaluated for robustness against alternative explanations such as differences in session length or learner engagement.
Authors: We will revise the results section to provide a step-by-step description of the sequence identification process, which used the coded DA sequences to detect instances where a learner response DA was followed by a prompting-based corrective feedback DA from the chatbot. Position coding was based on immediate adjacency in the turn sequence. We will include tables showing raw and normalized frequencies, as well as transition probabilities between relevant DA pairs for high- and low-progress groups, with baselines from overall corpus averages. This addresses potential confounds from varying session lengths. revision: yes
Circularity Check
No circularity: purely empirical observational analysis with no derivations or self-referential steps
full rationale
The paper performs human annotation of 6,957 dialogue acts using a pedagogy-informed scheme, then compares DA distributions and sequential patterns between high- and low-progress sessions. No equations, fitted parameters, predictions, or first-principles derivations appear. Claims rest on direct data extraction and statistical comparison rather than any input being redefined as output. Self-citations, if present, are not load-bearing for the central observational findings. The binary progress split and coding reliability are methodological limitations but do not constitute circularity under the specified patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Dialogue acts can be reliably coded from transcripts using a pedagogy-informed scheme
- domain assumption High- versus low-progress session labels reflect meaningful differences in learning from the interactions
Reference graph
Works this paper leans on
-
[1]
Computers and Education: Artificial Intelligence
Du, J., Daniel, B.K.: Transforming Language Education: A Systematic Review of AI-Powered Chatbots for English as a Foreign Language Speaking Practice. Computers and Education: Artificial Intelligence. 6, 100230 (2024). https://doi.org/10.1016/j.caeai.2024.100230
-
[2]
Wiboolyasarin, W., Wiboolyasarin, K., Tiranant, P., Jinowat, N., Boonyakitanont, P.: AI-Driven Chatbots in Second Language Education: A Systematic Review of Their Efficacy and Pedagogical Implications. Ampersand. 14, 100224 (2025). https://doi.org/10.1016/j.am-per.2025.100224
-
[3]
Computers and Education: Ar-tificial Intelligence
Guan, L., Li, S., Gu, M.M.: AI in informal digital English learning: A meta-analysis of its effectiveness on proficiency, motivation, and self-regulation. Computers and Education: Ar-tificial Intelligence. 7, 100323 (2024). https://doi.org/10.1016/j.caeai.2024.100323
-
[4]
Ox-ford University Press, Oxford ; New York (2011)
Larsen-Freeman, D., Anderson, M.: Techniques and Principles in Language Teaching. Ox-ford University Press, Oxford ; New York (2011)
work page 2011
-
[5]
Edinburgh University Press, Edinburgh (2013)
Walsh, S.: Classroom Discourse and Teacher Development. Edinburgh University Press, Edinburgh (2013). Dialogue Act Patterns in GenAI-Mediated L2 Oral Practice 13
work page 2013
-
[6]
Searle, J.R.: Speech Acts: An Essay in the Philosophy of Language. Cambridge Univ. Press, Cambridge (1969)
work page 1969
-
[7]
Stolcke, A., Coccaro, N., Bates, R., Taylor, P., Van Ess-Dykema, C., Ries, K., Shriberg, E., Jurafsky, D., Martin, R., Meteer, M.: Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. Computational Linguistics. 26, (2000)
work page 2000
-
[8]
Chi, M.T.H., Siler, S.A., Jeong, H., Yamauchi, T., Hausmann, R.G.: Learning from Human Tutoring. Cognitive Science. 25, 471–533 (2001). https://doi.org/10.1207/s15516709cog2504_1
-
[9]
Review of Educational Re-search
Roscoe, R.D., Chi, M.T.H.: Understanding Tutor Learning: Knowledge-Building and Knowledge-Telling in Peer Tutors’ Explanations and Questions. Review of Educational Re-search. 77, 534–574 (2007). https://doi.org/10.3102/0034654307309920
-
[10]
House, J., Kádár, D.Z.: Speech Acts and Interaction in Second Language Pragmatics: A Position Paper. Lang. Teach. 58, 396–407 (2025). https://doi.org/10.1017/S0261444822000477
-
[11]
Deshmukh, R.S., Pentimonti, J.M., Zucker, T.A., Curry, B.: Teachers’ Use of Scaffolds Within Conversations During Shared Book Reading. LSHSS. 53, 150–166 (2022). https://doi.org/10.1044/2021_LSHSS-21-00020
-
[12]
Learning, Culture and Social Interaction
Bouton, E., Asterhan, C.S.C.: In Pursuit of a More Unified Method to Measuring Classroom Dialogue: The Dialogue Elements to Compound Constructs Approach. Learning, Culture and Social Interaction. 40, 100717 (2023). https://doi.org/10.1016/j.lcsi.2023.100717
-
[13]
Future Gen-eration Computer Systems
Lin, J., Singh, S., Sha, L., Tan, W., Lang, D., Gašević, D., Chen, G.: Is It a Good Move? Mining Effective Tutoring Strategies from Human–Human Tutorial Dialogues. Future Gen-eration Computer Systems. 127, 194–207 (2022). https://doi.org/10.1016/j.fu-ture.2021.09.001
-
[14]
Cukurova, M., Khan-Galaria, M., Millan, E., Luckin, R.: A Learning Analytics Approach to Monitoring the Quality of Online One-to-one Tutoring. EdArXiv (2021)
work page 2021
-
[15]
Loewen, S., Sato, M.: Interaction and Instructed Second Language Acquisition. Lang. Teach. 51, 285–329 (2018). https://doi.org/10.1017/S0261444818000125
-
[16]
Sardinha, T.B.: AI-Generated Vs Human-Authored Texts: A Multidimensional Comparison. Applied Corpus Linguistics. 4, 100083 (2024). https://doi.org/10.1016/j.acorp.2023.100083
-
[17]
Xi, L., Zhang, Y., Wang, Q.: Investigating the Effects of an LLM-Based Socratic Conver-sational Agent on Students’ Academic Performance and Reflective Thinking in Higher Ed-ucation. Computers & Education. 241, 105494 (2026). https://doi.org/10.1016/j.compedu.2025.105494
-
[18]
Computers and Education: Artificial Intelligence
Xiao, F., Li, Z., Lin, J., Zou, X., Yang, D., Zou, W., Xiong, J.: Leveraging an LLM-Enhanced Bilingual Conversational Agent for EFL Children’s Dialogic Reading: Insights from Children, Parents, and Educators. Computers and Education: Artificial Intelligence. 9, 100484 (2025). https://doi.org/10.1016/j.caeai.2025.100484
-
[19]
Hou, Z., Min, S.: Dialogue-Based Computer-Assisted Language Learning Systems for Sec-ond Language Speaking Development: A Three-Level Meta-Analysis. ReCALL. 38, 40–56 (2026). https://doi.org/10.1017/S0958344025100268
-
[20]
Commu-nications in Computer and Information Science
He, L., Mavrikis, M., Cukurova, M.: Designing and Evaluating Generative AI-Based Voice-Interaction Agents for Improving L2 Learners’ Oral Communication Competence. Commu-nications in Computer and Information Science. vol 2151, (2024). https://doi.org/10.1007/978-3-031-64312-5_39
-
[21]
Housen, A., Kuiken, F.: Complexity, Accuracy, and Fluency in Second Language Acquisi-tion. Applied Linguistics. 30, 461–473 (2009). https://doi.org/10.1093/applin/amp048
-
[22]
Cambridge University Press (2021)
Benati, A.: Focus on Form. Cambridge University Press (2021)
work page 2021
-
[23]
Alexander, R.: A Dialogic Teaching Companion. Routledge (2020). 14 L. He et al
work page 2020
-
[24]
T-SEDA Collective: Teacher Scheme for Educational Dialogue Analysis, http://bit.ly/T-SEDA, (2021)
work page 2021
-
[25]
Lyster, R., Saito, K.: Oral Feedback in Classroom SLA: A Meta-Analysis. Stud Second Lang Acquis. 32, 265–302 (2010). https://doi.org/10.1017/S0272263109990520
-
[26]
eds: The Cambridge Handbook of Corrective Feedback in Second Language Learning and Teaching
Nassaji, H., Kartchava, E. eds: The Cambridge Handbook of Corrective Feedback in Second Language Learning and Teaching. Cambridge University Press (2021)
work page 2021
-
[27]
Learning, Culture and Social Interaction
Hennessy, S., Howe, C., Mercer, N., Vrikki, M.: Coding Classroom Dialogue: Methodolog-ical Considerations for Researchers. Learning, Culture and Social Interaction. 25, 100404 (2020). https://doi.org/10.1016/j.lcsi.2020.100404
-
[28]
International Journal of Research & Method in Education
Vrikki, M., Kershner, R., Calcagni, E., Hennessy, S., Lee, L., Hernández, F., Estrada, N., Ahmed, F.: The Teacher Scheme for Educational Dialogue Analysis (t-Seda): Developing a Research-Based Observation Tool for Supporting Teacher Inquiry into Pupils’ Participation in Classroom Dialogue. International Journal of Research & Method in Education. 42, 185–2...
-
[29]
In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., and Kao, H.-Y
Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast Vertical Mining of Sequen-tial Patterns Using Co-occurrence Information. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., and Kao, H.-Y. (eds.) Advances in Knowledge Discovery and Data Mining. pp. 40–52. Springer International Publishing, Cham (2014)
work page 2014
-
[30]
Journal of Educa-tional and Behavioral Statistics
Zhang, Y., Paquette, L., Bosch, N.: Using Permutation Tests to Identify Statistically Sound and Nonredundant Sequential Patterns in Educational Event Sequences. Journal of Educa-tional and Behavioral Statistics. 50, 387–419 (2025). https://doi.org/10.3102/10769986241248772
-
[31]
Winkler, A.M., Webster, M.A., Vidaurre, D., Nichols, T.E., Smith, S.M.: Multi-Level Block Permutation. NeuroImage. 123, 253–268 (2015). https://doi.org/10.1016/j.neu-roimage.2015.05.092
-
[32]
Pekarek Doehler, S., Pochon-Berger, E.: The Development of L2 Interactional Competence: Evidence from Turn-Taking Organization, Sequence Organization, Repair Organization and Preference Organization. In: Cadierno, T. and Eskildsen, S.W. (eds.) Usage-Based Perspec-tives on Second Language Learning. pp. 233–268. DE GRUYTER (2015). https://doi.org/10.1515/97...
-
[33]
In: Artificial Intelligence in Education
He, L., Guan, X., Yi, X., Cukurova, M., Saito, K., Mavrikis, M.: Supporting L2 Learners’ English Oral Proficiency Development with a GenAI Voice Chatbot: The Case of KELLY. In: Artificial Intelligence in Education. pp. 333–346. Springer Nature Switzerland (2025)
work page 2025
-
[34]
VanPatten, B.: 7 Input Processing in Adult SLA. In: VanPatten, B. and Williams, J. (eds.) Theories in Second Language Acquisition. Routledge (2014). https://doi.org/10.4324/9780203628942
-
[35]
Dillenbourg, P.: Design for Classroom Orchestration. Computers & Education. 69, 485–492 (2013). https://doi.org/10.1016/j.compedu.2013.04.013
-
[36]
Axelsson, A., Buschmeier, H., Skantze, G.: Modeling Feedback in Interaction With Conver-sational Agents—A Review. Front. Comput. Sci. 4, 744574 (2022). https://doi.org/10.3389/fcomp.2022.744574
-
[37]
Liang, K.-H., Davidson, S., Yuan, X., Panditharatne, S., Chen, C.-Y., Shea, R., Pham, D., Tan, Y., Voss, E., Fryer, L., Yu, Z.: ChatBack: Investigating Strategies of Providing Syn-chronous Grammatical Error Feedback in a GUI-based Language Learning Social Chatbot. (2023)
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.