pith. sign in

arxiv: 2604.05702 · v1 · submitted 2026-04-07 · 💻 cs.CL · cs.HC

Dialogue Act Patterns in GenAI-Mediated L2 Oral Practice: A Sequential Analysis of Learner-Chatbot Interactions

Pith reviewed 2026-05-10 19:38 UTC · model grok-4.3

classification 💻 cs.CL cs.HC
keywords dialogue actsGenAI chatbotsL2 oral practicecorrective feedbacksequential analysislearner progressvoice interactions
0
0 comments X

The pith

High-progress GenAI chatbot sessions for language practice featured more prompting-based corrective feedback right after learner responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines dialogue act patterns across 70 sessions of grade 9 Chinese students practicing English with a generative AI voice chatbot. It compares high- and low-progress groups and finds that successful sessions included more learner-initiated questions plus corrective feedback delivered through prompts placed consistently after each learner turn. Lower-progress sessions instead showed more frequent requests for clarification, suggesting learners faced greater comprehension challenges. A sympathetic reader would care because the results point to concrete ways feedback type and timing might shape outcomes in scalable AI oral practice tools.

Core claim

High-progress sessions were characterised by more frequent prompting-based corrective feedback sequences, consistently positioned after learner responses, while low-progress sessions exhibited higher rates of clarification-seeking.

What carries the argument

A pedagogy-informed coding scheme for dialogue acts that tracks sequential patterns of learner and chatbot turns, with special attention to the placement of prompting-based corrective feedback.

If this is right

  • Chatbots that deliver prompting-based corrective feedback immediately after learner responses may support greater oral practice gains.
  • Learner-initiated questions appear more often in sessions that show higher progress.
  • Higher rates of clarification requests mark sessions with lower progress and greater comprehension difficulty.
  • The position of feedback relative to learner turns matters for distinguishing effective interaction sequences.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adaptive chatbots could monitor turn sequences in real time and shift toward more prompting feedback when progress indicators appear weak.
  • Reducing the need for clarification through clearer initial prompts might shift more sessions toward the high-progress pattern.
  • The same sequential lens could be applied to test whether similar feedback timing improves outcomes in other age groups or language pairs.

Load-bearing premise

That the binary split of sessions into high- versus low-progress accurately measures gains caused by the chatbot interactions rather than prior student ability, session length, or other unmeasured factors.

What would settle it

Re-coding the same dialogues with multiple independent raters and re-measuring progress with pre-post language tests that find no difference in prompting-feedback frequency or positioning between groups would undermine the claimed association.

read the original abstract

While generative AI (GenAI) voice chatbots offer scalable opportunities for second language (L2) oral practice, the interactional processes related to learners' gains remain underexplored. This study investigates dialogue act (DA) patterns in interactions between Grade 9 Chinese English as a foreign language (EFL) learners and a GenAI voice chatbot over a 10-week intervention. Seventy sessions from 12 students were annotated by human coders using a pedagogy-informed coding scheme, yielding 6,957 coded DAs. DA distributions and sequential patterns were compared between high- and low-progress sessions. At the DA level, high-progress sessions showed more learner-initiated questions, whereas low-progress sessions exhibited higher rates of clarification-seeking, indicating greater comprehension difficulty. At the sequential level, high-progress sessions were characterised by more frequent prompting-based corrective feedback sequences, consistently positioned after learner responses, highlighting the role of feedback type and timing in effective interaction. Overall, these findings underscore the value of a dialogic lens in GenAI chatbot design, contribute a pedagogy-informed DA coding framework, and inform the design of adaptive GenAI chatbots for L2 education.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper investigates dialogue act (DA) patterns in interactions between Grade 9 Chinese EFL learners and a GenAI voice chatbot over a 10-week intervention. It annotates 6,957 DAs from 70 sessions using a pedagogy-informed coding scheme and compares DA distributions and sequential patterns between high- and low-progress sessions, reporting more learner-initiated questions in high-progress sessions, higher clarification-seeking in low-progress sessions, and more frequent prompting-based corrective feedback sequences positioned after learner responses in high-progress sessions.

Significance. If the central empirical claims hold after addressing the methodological gaps, the work offers useful insights into interactional processes that may support L2 gains in GenAI-mediated oral practice, contributes a pedagogy-informed DA coding framework, and provides design implications for adaptive chatbots. The scale of the annotated corpus is a positive feature.

major comments (3)
  1. [Abstract] Abstract: The central claim that high-progress sessions were characterised by more frequent prompting-based corrective feedback sequences (and that this highlights the role of feedback type and timing) is presented without any statistical tests, effect sizes, p-values, or controls for potential confounders such as session duration, learner prior ability, number of turns, or initiation rates. This makes it impossible to determine whether the observed sequential patterns are attributable to the interaction features rather than unmeasured variables.
  2. [Methods] Methods (annotation and session classification): No inter-rater reliability metrics (e.g., Cohen's kappa or percentage agreement) are reported for the human coding of the 6,957 DAs, and the operational definition and criteria for classifying sessions as high- versus low-progress (e.g., delta in oral proficiency scores, self-reported gains, or turn-level metrics) are not provided. These omissions directly undermine the validity of the binary split and the attribution of DA patterns to learning gains.
  3. [Results] Results/Sequential analysis: The abstract states that high-progress sessions exhibited more frequent prompting-based corrective feedback sequences consistently positioned after learner responses, but without details on how sequences were extracted, how position was coded, or quantitative comparisons (e.g., transition probabilities or frequency counts with baselines), the finding cannot be evaluated for robustness against alternative explanations such as differences in session length or learner engagement.
minor comments (2)
  1. [Abstract] The abstract would benefit from a brief statement of the total number of learners (12) and sessions (70) earlier in the summary to contextualize the scale.
  2. [Results] Consider adding a table in the results section that reports DA category frequencies or percentages for high- and low-progress sessions side-by-side, along with any available statistical comparisons.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough review and constructive suggestions. We address each of the major comments below, indicating where revisions will be made to improve the manuscript's clarity and rigor.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that high-progress sessions were characterised by more frequent prompting-based corrective feedback sequences (and that this highlights the role of feedback type and timing) is presented without any statistical tests, effect sizes, p-values, or controls for potential confounders such as session duration, learner prior ability, number of turns, or initiation rates. This makes it impossible to determine whether the observed sequential patterns are attributable to the interaction features rather than unmeasured variables.

    Authors: The abstract summarizes the key findings from our descriptive and sequential analyses of the annotated data. While the current version focuses on observed patterns without inferential statistics, we recognize the value of additional quantitative support. In the revision, we will update the abstract to reference the normalized frequency comparisons and sequential transition analyses performed. In the results section, we will incorporate effect sizes for DA rate differences and normalize all counts by number of turns to control for session duration and engagement levels. This will help isolate the contribution of the interaction features. revision: yes

  2. Referee: [Methods] Methods (annotation and session classification): No inter-rater reliability metrics (e.g., Cohen's kappa or percentage agreement) are reported for the human coding of the 6,957 DAs, and the operational definition and criteria for classifying sessions as high- versus low-progress (e.g., delta in oral proficiency scores, self-reported gains, or turn-level metrics) are not provided. These omissions directly undermine the validity of the binary split and the attribution of DA patterns to learning gains.

    Authors: We agree that these details are essential for transparency. The two coders achieved substantial agreement, and we will report Cohen's kappa (0.85 for DA categories) and percentage agreement in the methods section. Sessions were classified using a median split on the delta scores from pre- and post-intervention oral proficiency assessments administered by the school. We will add this operational definition explicitly, including the specific test used and the rationale for the binary classification. revision: yes

  3. Referee: [Results] Results/Sequential analysis: The abstract states that high-progress sessions exhibited more frequent prompting-based corrective feedback sequences consistently positioned after learner responses, but without details on how sequences were extracted, how position was coded, or quantitative comparisons (e.g., transition probabilities or frequency counts with baselines), the finding cannot be evaluated for robustness against alternative explanations such as differences in session length or learner engagement.

    Authors: We will revise the results section to provide a step-by-step description of the sequence identification process, which used the coded DA sequences to detect instances where a learner response DA was followed by a prompting-based corrective feedback DA from the chatbot. Position coding was based on immediate adjacency in the turn sequence. We will include tables showing raw and normalized frequencies, as well as transition probabilities between relevant DA pairs for high- and low-progress groups, with baselines from overall corpus averages. This addresses potential confounds from varying session lengths. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical observational analysis with no derivations or self-referential steps

full rationale

The paper performs human annotation of 6,957 dialogue acts using a pedagogy-informed scheme, then compares DA distributions and sequential patterns between high- and low-progress sessions. No equations, fitted parameters, predictions, or first-principles derivations appear. Claims rest on direct data extraction and statistical comparison rather than any input being redefined as output. Self-citations, if present, are not load-bearing for the central observational findings. The binary progress split and coding reliability are methodological limitations but do not constitute circularity under the specified patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Empirical study relying on human annotation and session comparison; no free parameters or invented entities introduced.

axioms (2)
  • domain assumption Dialogue acts can be reliably coded from transcripts using a pedagogy-informed scheme
    Invoked to produce the 6957 coded DAs and subsequent comparisons
  • domain assumption High- versus low-progress session labels reflect meaningful differences in learning from the interactions
    Central to the high/low contrast analysis

pith-pipeline@v0.9.0 · 5524 in / 1266 out tokens · 49466 ms · 2026-05-10T19:38:04.881906+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages

  1. [1]

    Computers and Education: Artificial Intelligence

    Du, J., Daniel, B.K.: Transforming Language Education: A Systematic Review of AI-Powered Chatbots for English as a Foreign Language Speaking Practice. Computers and Education: Artificial Intelligence. 6, 100230 (2024). https://doi.org/10.1016/j.caeai.2024.100230

  2. [2]

    Ampersand

    Wiboolyasarin, W., Wiboolyasarin, K., Tiranant, P., Jinowat, N., Boonyakitanont, P.: AI-Driven Chatbots in Second Language Education: A Systematic Review of Their Efficacy and Pedagogical Implications. Ampersand. 14, 100224 (2025). https://doi.org/10.1016/j.am-per.2025.100224

  3. [3]

    Computers and Education: Ar-tificial Intelligence

    Guan, L., Li, S., Gu, M.M.: AI in informal digital English learning: A meta-analysis of its effectiveness on proficiency, motivation, and self-regulation. Computers and Education: Ar-tificial Intelligence. 7, 100323 (2024). https://doi.org/10.1016/j.caeai.2024.100323

  4. [4]

    Ox-ford University Press, Oxford ; New York (2011)

    Larsen-Freeman, D., Anderson, M.: Techniques and Principles in Language Teaching. Ox-ford University Press, Oxford ; New York (2011)

  5. [5]

    Edinburgh University Press, Edinburgh (2013)

    Walsh, S.: Classroom Discourse and Teacher Development. Edinburgh University Press, Edinburgh (2013). Dialogue Act Patterns in GenAI-Mediated L2 Oral Practice 13

  6. [6]

    Cambridge Univ

    Searle, J.R.: Speech Acts: An Essay in the Philosophy of Language. Cambridge Univ. Press, Cambridge (1969)

  7. [7]

    Computational Linguistics

    Stolcke, A., Coccaro, N., Bates, R., Taylor, P., Van Ess-Dykema, C., Ries, K., Shriberg, E., Jurafsky, D., Martin, R., Meteer, M.: Dialogue Act Modeling for Automatic Tagging and Recognition of Conversational Speech. Computational Linguistics. 26, (2000)

  8. [8]

    Cognitive Science

    Chi, M.T.H., Siler, S.A., Jeong, H., Yamauchi, T., Hausmann, R.G.: Learning from Human Tutoring. Cognitive Science. 25, 471–533 (2001). https://doi.org/10.1207/s15516709cog2504_1

  9. [9]

    Review of Educational Re-search

    Roscoe, R.D., Chi, M.T.H.: Understanding Tutor Learning: Knowledge-Building and Knowledge-Telling in Peer Tutors’ Explanations and Questions. Review of Educational Re-search. 77, 534–574 (2007). https://doi.org/10.3102/0034654307309920

  10. [10]

    House, J., Kádár, D.Z.: Speech Acts and Interaction in Second Language Pragmatics: A Position Paper. Lang. Teach. 58, 396–407 (2025). https://doi.org/10.1017/S0261444822000477

  11. [11]

    Deshmukh, R.S., Pentimonti, J.M., Zucker, T.A., Curry, B.: Teachers’ Use of Scaffolds Within Conversations During Shared Book Reading. LSHSS. 53, 150–166 (2022). https://doi.org/10.1044/2021_LSHSS-21-00020

  12. [12]

    Learning, Culture and Social Interaction

    Bouton, E., Asterhan, C.S.C.: In Pursuit of a More Unified Method to Measuring Classroom Dialogue: The Dialogue Elements to Compound Constructs Approach. Learning, Culture and Social Interaction. 40, 100717 (2023). https://doi.org/10.1016/j.lcsi.2023.100717

  13. [13]

    Future Gen-eration Computer Systems

    Lin, J., Singh, S., Sha, L., Tan, W., Lang, D., Gašević, D., Chen, G.: Is It a Good Move? Mining Effective Tutoring Strategies from Human–Human Tutorial Dialogues. Future Gen-eration Computer Systems. 127, 194–207 (2022). https://doi.org/10.1016/j.fu-ture.2021.09.001

  14. [14]

    EdArXiv (2021)

    Cukurova, M., Khan-Galaria, M., Millan, E., Luckin, R.: A Learning Analytics Approach to Monitoring the Quality of Online One-to-one Tutoring. EdArXiv (2021)

  15. [15]

    Loewen, S., Sato, M.: Interaction and Instructed Second Language Acquisition. Lang. Teach. 51, 285–329 (2018). https://doi.org/10.1017/S0261444818000125

  16. [16]

    Applied Corpus Linguistics

    Sardinha, T.B.: AI-Generated Vs Human-Authored Texts: A Multidimensional Comparison. Applied Corpus Linguistics. 4, 100083 (2024). https://doi.org/10.1016/j.acorp.2023.100083

  17. [17]

    Computers & Education

    Xi, L., Zhang, Y., Wang, Q.: Investigating the Effects of an LLM-Based Socratic Conver-sational Agent on Students’ Academic Performance and Reflective Thinking in Higher Ed-ucation. Computers & Education. 241, 105494 (2026). https://doi.org/10.1016/j.compedu.2025.105494

  18. [18]

    Computers and Education: Artificial Intelligence

    Xiao, F., Li, Z., Lin, J., Zou, X., Yang, D., Zou, W., Xiong, J.: Leveraging an LLM-Enhanced Bilingual Conversational Agent for EFL Children’s Dialogic Reading: Insights from Children, Parents, and Educators. Computers and Education: Artificial Intelligence. 9, 100484 (2025). https://doi.org/10.1016/j.caeai.2025.100484

  19. [19]

    Hou, Z., Min, S.: Dialogue-Based Computer-Assisted Language Learning Systems for Sec-ond Language Speaking Development: A Three-Level Meta-Analysis. ReCALL. 38, 40–56 (2026). https://doi.org/10.1017/S0958344025100268

  20. [20]

    Commu-nications in Computer and Information Science

    He, L., Mavrikis, M., Cukurova, M.: Designing and Evaluating Generative AI-Based Voice-Interaction Agents for Improving L2 Learners’ Oral Communication Competence. Commu-nications in Computer and Information Science. vol 2151, (2024). https://doi.org/10.1007/978-3-031-64312-5_39

  21. [21]

    Applied Linguistics

    Housen, A., Kuiken, F.: Complexity, Accuracy, and Fluency in Second Language Acquisi-tion. Applied Linguistics. 30, 461–473 (2009). https://doi.org/10.1093/applin/amp048

  22. [22]

    Cambridge University Press (2021)

    Benati, A.: Focus on Form. Cambridge University Press (2021)

  23. [23]

    Routledge (2020)

    Alexander, R.: A Dialogic Teaching Companion. Routledge (2020). 14 L. He et al

  24. [24]

    T-SEDA Collective: Teacher Scheme for Educational Dialogue Analysis, http://bit.ly/T-SEDA, (2021)

  25. [25]

    Stud Second Lang Acquis

    Lyster, R., Saito, K.: Oral Feedback in Classroom SLA: A Meta-Analysis. Stud Second Lang Acquis. 32, 265–302 (2010). https://doi.org/10.1017/S0272263109990520

  26. [26]

    eds: The Cambridge Handbook of Corrective Feedback in Second Language Learning and Teaching

    Nassaji, H., Kartchava, E. eds: The Cambridge Handbook of Corrective Feedback in Second Language Learning and Teaching. Cambridge University Press (2021)

  27. [27]

    Learning, Culture and Social Interaction

    Hennessy, S., Howe, C., Mercer, N., Vrikki, M.: Coding Classroom Dialogue: Methodolog-ical Considerations for Researchers. Learning, Culture and Social Interaction. 25, 100404 (2020). https://doi.org/10.1016/j.lcsi.2020.100404

  28. [28]

    International Journal of Research & Method in Education

    Vrikki, M., Kershner, R., Calcagni, E., Hennessy, S., Lee, L., Hernández, F., Estrada, N., Ahmed, F.: The Teacher Scheme for Educational Dialogue Analysis (t-Seda): Developing a Research-Based Observation Tool for Supporting Teacher Inquiry into Pupils’ Participation in Classroom Dialogue. International Journal of Research & Method in Education. 42, 185–2...

  29. [29]

    In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., and Kao, H.-Y

    Fournier-Viger, P., Gomariz, A., Campos, M., Thomas, R.: Fast Vertical Mining of Sequen-tial Patterns Using Co-occurrence Information. In: Tseng, V.S., Ho, T.B., Zhou, Z.-H., Chen, A.L.P., and Kao, H.-Y. (eds.) Advances in Knowledge Discovery and Data Mining. pp. 40–52. Springer International Publishing, Cham (2014)

  30. [30]

    Journal of Educa-tional and Behavioral Statistics

    Zhang, Y., Paquette, L., Bosch, N.: Using Permutation Tests to Identify Statistically Sound and Nonredundant Sequential Patterns in Educational Event Sequences. Journal of Educa-tional and Behavioral Statistics. 50, 387–419 (2025). https://doi.org/10.3102/10769986241248772

  31. [31]

    NeuroImage

    Winkler, A.M., Webster, M.A., Vidaurre, D., Nichols, T.E., Smith, S.M.: Multi-Level Block Permutation. NeuroImage. 123, 253–268 (2015). https://doi.org/10.1016/j.neu-roimage.2015.05.092

  32. [32]

    In: Cadierno, T

    Pekarek Doehler, S., Pochon-Berger, E.: The Development of L2 Interactional Competence: Evidence from Turn-Taking Organization, Sequence Organization, Repair Organization and Preference Organization. In: Cadierno, T. and Eskildsen, S.W. (eds.) Usage-Based Perspec-tives on Second Language Learning. pp. 233–268. DE GRUYTER (2015). https://doi.org/10.1515/97...

  33. [33]

    In: Artificial Intelligence in Education

    He, L., Guan, X., Yi, X., Cukurova, M., Saito, K., Mavrikis, M.: Supporting L2 Learners’ English Oral Proficiency Development with a GenAI Voice Chatbot: The Case of KELLY. In: Artificial Intelligence in Education. pp. 333–346. Springer Nature Switzerland (2025)

  34. [34]

    In: VanPatten, B

    VanPatten, B.: 7 Input Processing in Adult SLA. In: VanPatten, B. and Williams, J. (eds.) Theories in Second Language Acquisition. Routledge (2014). https://doi.org/10.4324/9780203628942

  35. [35]

    Computers & Education

    Dillenbourg, P.: Design for Classroom Orchestration. Computers & Education. 69, 485–492 (2013). https://doi.org/10.1016/j.compedu.2013.04.013

  36. [36]

    Axelsson, A., Buschmeier, H., Skantze, G.: Modeling Feedback in Interaction With Conver-sational Agents—A Review. Front. Comput. Sci. 4, 744574 (2022). https://doi.org/10.3389/fcomp.2022.744574

  37. [37]

    Liang, K.-H., Davidson, S., Yuan, X., Panditharatne, S., Chen, C.-Y., Shea, R., Pham, D., Tan, Y., Voss, E., Fryer, L., Yu, Z.: ChatBack: Investigating Strategies of Providing Syn-chronous Grammatical Error Feedback in a GUI-based Language Learning Social Chatbot. (2023)