pith. machine review for the scientific record. sign in

arxiv: 2604.19245 · v2 · submitted 2026-04-21 · 💻 cs.CL · cs.AI

Recognition: unknown

Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:14 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords repairmulti-turn dialogueLLM unreliabilityconversational AImath problemsmodel differenceshuman-LLM interactiondialogue systems
0
0 comments X

The pith

Each LLM exhibits its own characteristic form of unreliability when handling repair during multi-turn math dialogues.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how large language models respond to and initiate repair during conversations about math problems. Repair is the process of correcting misunderstandings that arise in talk. The study finds large differences between models, with some stubbornly ignoring user corrections and others flipping their answers too readily. Longer conversations make each model's behavior more unique and less predictable. A sympathetic reader would care because it highlights that LLMs are not uniformly reliable conversation partners, especially when users try to fix errors over multiple turns.

Core claim

In this study, we investigate how LLMs engage in the interactive process of repair in multi-turn dialogues around solvable and unsolvable math questions. We examine whether models initiate repair themselves and how they respond to user-initiated repair. Our results show strong differences across models: reactions range from being almost completely resistant to (appropriate) repair attempts to being highly susceptible and easily manipulated. We further demonstrate that once conversations extend beyond a single turn, model behavior becomes more distinctive and less predictable across systems. Overall, our findings indicate that each tested LLM exhibits its own characteristic form of unreliabil

What carries the argument

The interactive process of repair for resolving trouble in conversation, which carries the argument by exposing model-specific patterns of resistance or susceptibility to corrections in extended dialogues.

If this is right

  • Models differ sharply in their willingness to self-repair or accept user repairs on math problems.
  • Multi-turn dialogues amplify distinctive and less predictable repair behaviors compared to single turns.
  • Repair interactions serve as a diagnostic tool for revealing unreliability that single-turn tests miss.
  • Each LLM develops its own characteristic response style to conversational corrections.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Users may benefit from learning model-specific ways to phrase corrections to achieve consistent answers.
  • Evaluation benchmarks for LLMs should incorporate multi-turn repair tasks to better measure real-world reliability.
  • The variability suggests that training data could be augmented with repair examples to reduce model-specific flaws.

Load-bearing premise

The observed differences in repair behavior are intrinsic to the models rather than arising from the specific choice of math questions, prompt phrasing, or evaluation criteria used in the study.

What would settle it

Re-running the experiments with a fresh set of math problems and rephrased prompts and observing that all models display identical repair patterns would falsify the claim of model-specific unreliability.

Figures

Figures reproduced from arXiv: 2604.19245 by Clara Lachenmaier, Hannah Bultmann, Sina Zarrie{\ss}.

Figure 1
Figure 1. Figure 1: Model-wise overview of performance across interaction turns and clarification strategies. The upper [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left: Slope plot showing differences in mean counts of 36 in answers by the two non-misleading repair [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrices for regression models predicting the LLM from the answer text. Left: predictions for [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Proportion of incorrect responses to unanswer [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Repair, an important resource for resolving trouble in human-human conversation, remains underexplored in human-LLM interaction. In this study, we investigate how LLMs engage in the interactive process of repair in multi-turn dialogues around solvable and unsolvable math questions. We examine whether models initiate repair themselves and how they respond to user-initiated repair. Our results show strong differences across models: reactions range from being almost completely resistant to (appropriate) repair attempts to being highly susceptible and easily manipulated. We further demonstrate that once conversations extend beyond a single turn, model behavior becomes more distinctive and less predictable across systems. Overall, our findings indicate that each tested LLM exhibits its own characteristic form of unreliability in the context of repair.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents an empirical study of repair behaviors in multi-turn human-LLM dialogues focused on solvable and unsolvable math questions. It examines whether models initiate repair and how they respond to user-initiated repair attempts, reporting substantial differences across LLMs (e.g., resistance vs. susceptibility) that become more pronounced and model-distinctive beyond single turns. The central claim is that each tested LLM exhibits its own characteristic form of unreliability in repair contexts.

Significance. If the observed model-specific patterns prove robust, the work would usefully extend conversational AI research by showing that repair mechanisms expose multi-turn inconsistencies not visible in single-turn evaluations. It provides an initial observational mapping of how LLMs handle conversational trouble, which could inform interaction design and reliability benchmarks. The study is strengthened by its focus on an underexplored aspect of human-LLM interaction and by contrasting solvable/unsolvable conditions, though its impact is limited by the absence of statistical controls and invariance tests.

major comments (3)
  1. [Methods / experimental setup] Methods / experimental setup: No sample size (number of dialogues, questions per model, or turns), statistical tests, exact prompt templates, or inter-annotator agreement for repair coding is reported. Without these, it is impossible to determine whether the reported inter-model differences in repair initiation and response are statistically reliable or merely descriptive, directly undermining the claim that each model has a 'characteristic form of unreliability'.
  2. [Results / discussion] Results and discussion of question selection: The central claim that differences reflect intrinsic model properties rather than artifacts of the chosen math questions or prompt phrasing is not supported by any sensitivity analysis, question substitution, or prompt paraphrasing. The study fixes the question set and wording; thus the observed repair patterns could arise from interactions between specific problems and model training distributions, as the stress-test concern correctly identifies.
  3. [Evaluation criteria] Evaluation criteria: The distinction between 'appropriate' repair attempts and model responses lacks an explicit rubric or examples of coding decisions. This makes it difficult to assess whether the reported susceptibility/resistance differences are reproducible or dependent on subjective evaluation criteria.
minor comments (2)
  1. [Abstract] The abstract would be clearer if it named the specific models tested and the approximate number of turns or dialogues analyzed.
  2. [Results] Any tables or figures presenting repair frequencies should include error bars or confidence intervals and explicit definitions of the metrics used.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of transparency and rigor that we will address in the revision. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Methods / experimental setup] Methods / experimental setup: No sample size (number of dialogues, questions per model, or turns), statistical tests, exact prompt templates, or inter-annotator agreement for repair coding is reported. Without these, it is impossible to determine whether the reported inter-model differences in repair initiation and response are statistically reliable or merely descriptive, directly undermining the claim that each model has a 'characteristic form of unreliability'.

    Authors: We agree that these methodological details are necessary for evaluating the robustness of the observed patterns. In the revised manuscript we will add a dedicated methods subsection reporting the precise sample sizes (number of dialogues and total turns per model), the full prompt templates in an appendix, and a description of the coding procedure. The study is observational and exploratory rather than hypothesis-driven, so we did not apply inferential statistical tests; we will clarify this scope and include descriptive counts of repair events. Coding was performed collaboratively by the authors with consensus resolution; we will document this process and note the absence of formal inter-annotator agreement metrics as a limitation. These additions will make the evidence base more transparent while preserving the descriptive nature of the findings. revision: yes

  2. Referee: [Results / discussion] Results and discussion of question selection: The central claim that differences reflect intrinsic model properties rather than artifacts of the chosen math questions or prompt phrasing is not supported by any sensitivity analysis, question substitution, or prompt paraphrasing. The study fixes the question set and wording; thus the observed repair patterns could arise from interactions between specific problems and model training distributions, as the stress-test concern correctly identifies.

    Authors: We accept that the absence of sensitivity checks leaves open the possibility that patterns are tied to the specific question set. The questions were chosen as canonical solvable and unsolvable math problems to isolate repair behavior from varying problem difficulty. In revision we will expand the discussion to justify this selection, acknowledge the limitation, and moderate the language from 'intrinsic model properties' to 'model-specific tendencies observed under these conditions.' We cannot conduct new question-substitution experiments at this stage, but the consistency of behaviors across the multiple questions already tested provides initial support for the patterns being model-linked rather than question-specific. revision: partial

  3. Referee: [Evaluation criteria] Evaluation criteria: The distinction between 'appropriate' repair attempts and model responses lacks an explicit rubric or examples of coding decisions. This makes it difficult to assess whether the reported susceptibility/resistance differences are reproducible or dependent on subjective evaluation criteria.

    Authors: We will revise the methods section to include an explicit coding rubric defining 'appropriate' repair (user requests for clarification on specific mathematical steps or contradictions) and the response categories (resistance: deflection or ignoring; susceptibility: unwarranted answer changes or over-accommodation). We will also add representative dialogue excerpts with our coding decisions to illustrate borderline cases and improve reproducibility. revision: yes

Circularity Check

0 steps flagged

No circularity: purely observational empirical study with no derivations or fitted predictions

full rationale

This paper conducts an empirical investigation of LLM repair behaviors in multi-turn dialogues involving solvable and unsolvable math questions. It directly compares model outputs for self-initiated repair and responses to user repair attempts, reporting observed differences across systems such as GPT and Claude. No equations, parameters, theoretical models, or derivation chains are present. Results rest on experimental data collection and qualitative description rather than any reduction to prior fits, self-definitions, or self-citations. The central claim of model-specific unreliability follows from the observed patterns without circular construction. External concerns about prompt or question specificity pertain to generalizability, not circularity in the reported findings.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper applies the established linguistic concept of repair to LLMs without introducing new free parameters, invented entities, or non-standard axioms beyond the domain assumption that repair functions similarly in human-LLM talk.

axioms (1)
  • domain assumption Repair is an important resource for resolving trouble in human-human conversation and can be studied analogously in human-LLM interaction.
    Invoked in the opening sentence of the abstract as the foundation for the investigation.

pith-pipeline@v0.9.0 · 5432 in / 1209 out tokens · 53015 ms · 2026-05-10T02:14:07.341072+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

183 extracted references · 140 canonical work pages · 3 internal anchors

  1. [1]

    Language , author =

    The. Language , author =. 1977 , note =. doi:10.2307/413107 , abstract =

  2. [2]

    Hoeken, Sanne and Zarrieß, Sina and Alacam, Özge , editor =. Hateful. Proceedings of the 2024. 2024 , keywords =. doi:10.18653/v1/2024.emnlp-main.10 , abstract =

  3. [3]

    Rethinking

    Duranti, Alessandro , month = may, year =. Rethinking

  4. [4]

    PLOS ONE , author =

    Universal. PLOS ONE , author =. 2015 , note =. doi:10.1371/journal.pone.0136100 , abstract =

  5. [5]

    Trends in Cognitive Sciences , author =

    Interactive repair and the foundations of language , volume =. Trends in Cognitive Sciences , author =. 2024 , keywords =. doi:10.1016/j.tics.2023.09.003 , abstract =

  6. [6]

    Studies in Language , author =

    Formats for other-initiation of repair across languages:. Studies in Language , author =. 2014 , pages =. doi:10.1075/sl.38.1.01din , abstract =

  7. [7]

    Analyzing

    Anh, Dang Hoang and Tran, Vu and Nguyen, Le Minh , editor =. Analyzing. New. 2025 , keywords =. doi:10.1007/978-981-96-7071-0_12 , abstract =

  8. [8]

    Tonmoy, S. M. Towhidul Islam and Zaman, S. M. Mehedi and Jain, Vinija and Rani, Anku and Rawte, Vipula and Chadha, Aman and Das, Amitava , month = jan, year =. A. doi:10.48550/arXiv.2401.01313 , abstract =

  9. [9]

    DeepSeek-AI and Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bocha...

  10. [10]

    Phi-4 Technical Report

    Abdin, Marah and Aneja, Jyoti and Behl, Harkirat and Bubeck, Sébastien and Eldan, Ronen and Gunasekar, Suriya and Harrison, Michael and Hewett, Russell J. and Javaheripi, Mojan and Kauffmann, Piero and Lee, James R. and Lee, Yin Tat and Li, Yuanzhi and Liu, Weishung and Mendes, Caio C. T. and Nguyen, Anh and Price, Eric and Rosa, Gustavo de and Saarikivi,...

  11. [11]

    Srivatsa, Kv Aditya and Kochmar, Ekaterina , year =. What. Findings of the. doi:10.18653/v1/2024.findings-naacl.72 , language =

  12. [12]

    Beyond the

    Albornoz-De Luise, Romina Soledad and Arnau, David and Arnau-González, Pablo and Arevalillo-Herráez, Miguel , year =. Beyond the. Proceedings of the 2nd. doi:10.18653/v1/2024.practicald2t-1.1 , language =

  13. [13]

    Cheng, Ziling and Cao, Meng and Pishdad, Leila and Cao, Yanshuai and Cheung, Jackie Ck , year =. Can. Proceedings of the 2025. doi:10.18653/v1/2025.emnlp-main.723 , language =

  14. [14]

    Yin, Zhangyue and Sun, YuHong and Huang, Xuanjing and Qiu, Xipeng and Zhao, Hui , year =. Error. Findings of the. doi:10.18653/v1/2025.findings-emnlp.20 , language =

  15. [15]

    Introducing

    Anthropic , year =. Introducing

  16. [16]

    OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and Babuschkin, Igor and Balaji, Suchir and Balcom, Valerie and Baltescu, Paul and Bao, Haiming and Bavarian, Mohammad and Belgum, Jeff a...

  17. [17]

    A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions , volume=

    A. ACM Trans. Inf. Syst. , author =. 2025 , keywords =. doi:10.1145/3703155 , abstract =

  18. [18]

    Topics in Cognitive Science , author =

    Running. Topics in Cognitive Science , author =. 2018 , note =. doi:10.1111/tops.12336 , abstract =

  19. [19]

    , month = aug, year =

    Liddicoat, Anthony J. , month = aug, year =. An

  20. [20]

    Learning and

    Toles, Matthew and Huang, Yukun and Yu, Zhou , year =. Learning and. Proceedings of the

  21. [21]

    Madureira, Brielen and Schlangen, David , year =. Taking. Proceedings of the

  22. [22]

    Wildenburg, Frank and Hanna, Michael and Pezzelle, Sandro , year =. Do. Findings of the. doi:10.18653/v1/2024.findings-acl.572 , language =

  23. [23]

    Asking the

    Testoni, Alberto and Fernández, Raquel , year =. Asking the. Proceedings of the 18th. doi:10.18653/v1/2024.eacl-long.16 , language =

  24. [24]

    Discourse & Communication , author =

    Performance without understanding:. Discourse & Communication , author =. 2024 , keywords =. doi:10.1177/17504813241271492 , abstract =

  25. [25]

    Detecting

    Piryani, Bhawna and Abdallah, Abdelrahman and Mozafari, Jamshid and Jatowt, Adam , year =. Detecting. Findings of the. doi:10.18653/v1/2024.findings-emnlp.562 , language =

  26. [26]

    Papakostas, Konstantinos and Papadopoulou, Irene , year =. Model. Findings of the. doi:10.18653/v1/2023.findings-acl.279 , language =

  27. [27]

    Lee, Dongryeol and Kim, Segwang and Lee, Minwoo and Lee, Hwanhee and Park, Joonsuk and Lee, Sang-Woo and Jung, Kyomin , year =. Asking. Findings of the. doi:10.18653/v1/2023.findings-emnlp.772 , language =

  28. [28]

    Don’t be

    Liang, Zhenwen and Zhang, Jipeng and Zhang, Xiangliang , year =. Don’t be. Proceedings of the 13th. doi:10.18653/v1/2023.ijcnlp-main.2 , language =

  29. [29]

    Leveraging

    Han, Donghee and Lim, Seungjae and Roh, Daeyoung and Kim, Sangryul and Kim, Sehyun and. Leveraging. Proceedings of the 31st. 2025 , keywords =

  30. [30]

    Ngo, Anh and Rollet, Nicolas and Pelachaud, Catherine and Clavel, Chloé , year =. ". Proceedings of the 2025

  31. [31]

    Iterative

    Sawhney, Riya and Yadav, Samrat and Bhattacharya, Indrajit and Mausam, Mausam , year =. Iterative. Findings of the. doi:10.18653/v1/2025.findings-acl.1262 , language =

  32. [32]

    Repairs in a

    Chiyah-Garcia, Javier and Suglia, Alessandro and Eshghi, Arash , year =. Repairs in a. Proceedings of the 2024. doi:10.18653/v1/2024.emnlp-main.643 , language =

  33. [33]

    Proceedings of the 61st

    Akyurek, Afra Feyza and Akyurek, Ekin and Kalyan, Ashwin and Clark, Peter and Wijaya, Derry Tanti and Tandon, Niket , year =. Proceedings of the 61st. doi:10.18653/v1/2023.acl-long.427 , language =

  34. [34]

    Detecting

    Kim, Hazel and Lamb, Tom A and Bibi, Adel and Torr, Philip and Gal, Yarin , year =. Detecting. Proceedings of the 2025

  35. [35]

    Findings of the

    Chen, Yue and Huang, Chen and Deng, Yang and Lei, Wenqiang and Jin, Dingnan and Liu, Jia and Chua, Tat-Seng , year =. Findings of the. doi:10.18653/v1/2024.findings-acl.632 , language =

  36. [36]

    Li, Haau-Sing (Xiaocheng) and Mesgar, Mohsen and Martins, André and Gurevych, Iryna , year =. Python. Proceedings of the 61st. doi:10.18653/v1/2023.acl-long.799 , language =

  37. [37]

    Kim, Gangwoo and Kim, Sungdong and Jeon, Byeongguk and Park, Joonsuk and Kang, Jaewoo , year =. Tree of. Proceedings of the 2023. doi:10.18653/v1/2023.emnlp-main.63 , language =

  38. [38]

    Findings of the

    Tan, Chuanyuan and Shao, Wenbiao and Xiong, Hao and Zhu, Tong and Liu, Zhenhua and Shi, Kai and Chen, Wenliang , year =. Findings of the. doi:10.18653/v1/2025.findings-acl.85 , language =

  39. [39]

    Findings of the

    Yuan, Yuewei and Malaviya, Chaitanya and Yatskar, Mark , year =. Findings of the. doi:10.18653/v1/2023.findings-eacl.75 , language =

  40. [40]

    Clarify When Necessary: Resolving Ambiguity Through Interaction with LM s

    Zhang, Michael J.Q. and Choi, Eunsol , year =. Clarify. Findings of the. doi:10.18653/v1/2025.findings-naacl.306 , language =

  41. [41]

    Clarifying the

    Rahmani, Hossein A and Wang, Xi and Aliannejadi, Mohammad and Naghiaei, Mohammadmehdi and Yilmaz, Emine , year =. Clarifying the. Findings of the

  42. [42]

    Loftus, Sebastian and Mülthaler, Adrian and Hoeken, Sanne and Zarrieß, Sina and Alacam, Ozge , editor =. Using. Proceedings of the. 2025 , keywords =

  43. [43]

    Hoeken, Sanne and Alacam, Özge and Nguyen, Dong and Poesio, Massimo and Zarrieß, Sina , editor =. Not. Proceedings of the 16th. 2025 , keywords =

  44. [44]

    No that’s not what

    Balaraman, Vevake and Eshghi, Arash and Konstas, Ioannis and Papaioannou, Ioannis , year =. No that’s not what. Proceedings of the 24th. doi:10.18653/v1/2023.sigdial-1.52 , abstract =

  45. [45]

    Referential ambiguity and clarification requests: comparing human and

    Madge, Chris and Purver, Matthew and Poesio, Massimo , year =. Referential ambiguity and clarification requests: comparing human and. doi:10.48550/arXiv.2507.10445 , abstract =

  46. [46]

    Proceedings of the 2024

    Clarifying. Proceedings of the 2024. 2024 , keywords =

  47. [47]

    Verifiable,

    Toroghi, Armin and Guo, Willis and Pesaranghader, Ali and Sanner, Scott , year =. Verifiable,. Proceedings of the 2024. doi:10.18653/v1/2024.emnlp-main.379 , language =

  48. [48]

    Proceedings of the 63rd

    Sahay, Rishav and Tekumalla, Lavanya Sita and Aggarwal, Purav and Jain, Arihant and Saladi, Anoop , year =. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-industry.63 , language =

  49. [49]

    Benchmarking

    Sun, YuHong and Yin, Zhangyue and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Zhao, Hui , year =. Benchmarking. Proceedings of the 2024

  50. [50]

    Aligning

    Naszadi, Kata and Manggala, Putra and Monz, Christof , year =. Aligning. Findings of the. doi:10.18653/v1/2023.findings-emnlp.999 , language =

  51. [51]

    Frontiers in Robotics and AI , author =

    An analysis of dialogue repair in virtual assistants , volume =. Frontiers in Robotics and AI , author =. 2024 , pages =. doi:10.3389/frobt.2024.1356847 , abstract =

  52. [52]

    Benchmarking

    Sun, YuHong and Yin, Zhangyue and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Zhao, Hui , editor =. Benchmarking. Proceedings of the 2024. 2024 , pages =

  53. [53]

    Gautam, Vagrant and Zhang, Miaoran and Klakow, Dietrich , year =. A. Findings of the. doi:10.18653/v1/2023.findings-emnlp.491 , language =

  54. [54]

    Prompting and

    Deng, Yang and Liao, Lizi and Chen, Liang and Wang, Hongru and Lei, Wenqiang and Chua, Tat-Seng , year =. Prompting and. Findings of the. doi:10.18653/v1/2023.findings-emnlp.711 , abstract =

  55. [55]

    Zhao, Wenting and Gao, Ge and Cardie, Claire and Rush, Alexander M , year =. I. Proceedings of the 2024. doi:10.18653/v1/2024.emnlp-main.242 , language =

  56. [56]

    Instruction

    Madureira, Brielen and Schlangen, David , year =. Instruction. Proceedings of the 17th. doi:10.18653/v1/2023.eacl-main.169 , language =

  57. [57]

    Gorodissky, Yuval and Sulem, Elior and Roth, Dan , year =. Cross-. Proceedings of the 14th

  58. [58]

    A coefficient of agreement for nominal scales.Educational and Psychological Measurement, 20(1):37–46, 1960

    A. Educational and Psychological Measurement , author =. 1960 , note =. doi:10.1177/001316446002000104 , language =

  59. [59]

    Communication & Medicine , author =

    Establishing mutual understanding in interaction:. Communication & Medicine , author =. 2010 , note =. doi:10.1558/cam.v6i2.165 , abstract =

  60. [60]

    Open Linguistics , author =

    Other-initiated repair across languages: towards a typology of conversational structures , volume =. Open Linguistics , author =. doi:10.2478/opli-2014-0007 , abstract =

  61. [61]

    Open Linguistics , author =

    A. Open Linguistics , author =. 2016 , keywords =. doi:10.1515/opli-2016-0002 , abstract =

  62. [62]

    Repair , copyright =

    Kitzinger, Celia , year =. Repair , copyright =. The. doi:10.1002/9781118325001.ch12 , note =

  63. [63]

    Topics in Cognitive Science , author =

    Repair:. Topics in Cognitive Science , author =. 2018 , note =. doi:10.1111/tops.12339 , abstract =

  64. [64]

    Godfrey, J. J. and Holliman, E. C. and McDaniel, J. , month = mar, year =. doi:10.1109/ICASSP.1992.225858 , abstract =

  65. [65]

    Core, Mark G and Allen, James F , keywords =. Coding

  66. [66]

    Tapaswi, Makarand and Zhu, Yukun and Stiefelhagen, Rainer and Torralba, Antonio and Urtasun, Raquel and Fidler, Sanja , year =

  67. [67]

    QuAC : Question Answering in Context

    Choi, Eunsol and He, He and Iyyer, Mohit and Yatskar, Mark and Yih, Wen-tau and Choi, Yejin and Liang, Percy and Zettlemoyer, Luke , month = aug, year =. doi:10.48550/arXiv.1808.07036 , abstract =

  68. [68]

    , year =

    Transactions of the Association for Computational Linguistics , author =. 2019 , keywords =. doi:10.1162/tacl_a_00266 , abstract =

  69. [69]

    Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi

    Yang, Yi and Yih, Wen-tau and Meek, Christopher , editor =. Proceedings of the 2015. 2015 , keywords =. doi:10.18653/v1/D15-1237 , urldate =

  70. [70]

    doi:10.48550/arXiv.1907.06292 , abstract =

    Xiong, Wenhan and Wu, Jiawei and Wang, Hong and Kulkarni, Vivek and Yu, Mo and Chang, Shiyu and Guo, Xiaoxiao and Wang, William Yang , month = jul, year =. doi:10.48550/arXiv.1907.06292 , abstract =

  71. [71]

    and Pathak, Jyotishman , month = jan, year =

    Alambo, Amanuel and Gaur, Manas and Lokala, Usha and Kursuncu, Ugur and Thirunarayan, Krishnaprasad and Gyrard, Amelie and Sheth, Amit and Welton, Randon S. and Pathak, Jyotishman , month = jan, year =. Question. 2019. doi:10.1109/ICOSC.2019.8665525 , abstract =

  72. [72]

    Journal of Pragmatics , author =

    Question–response sequences in conversation across ten languages:. Journal of Pragmatics , author =. 2010 , keywords =. doi:10.1016/j.pragma.2010.04.001 , language =

  73. [73]

    Wavchat: A survey of spoken dialogue models

    Ji, Shengpeng and Chen, Yifu and Fang, Minghui and Zuo, Jialong and Lu, Jingyu and Wang, Hanting and Jiang, Ziyue and Zhou, Long and Liu, Shujie and Cheng, Xize and Yang, Xiaoda and Wang, Zehan and Yang, Qian and Li, Jian and Jiang, Yidi and He, Jingzhen and Chu, Yunfei and Xu, Jin and Zhao, Zhou , month = nov, year =. doi:10.48550/arXiv.2411.13577 , abstract =

  74. [74]

    and Chiam, Caleb and Fu, Liye and Wang, Andrew Z

    Chang, Jonathan P. and Chiam, Caleb and Fu, Liye and Wang, Andrew Z. and Zhang, Justine and Danescu-Niculescu-Mizil, Cristian , month = may, year =. doi:10.48550/arXiv.2005.04246 , abstract =

  75. [75]

    Zhang, Justine and Spirling, Arthur and Danescu-Niculescu-Mizil, Cristian , month = aug, year =. Asking. doi:10.48550/arXiv.1708.02254 , abstract =

  76. [76]

    ACM Comput. Surv. , author =. 2023 , keywords =. doi:10.1145/3560260 , abstract =

  77. [77]

    Journal of Pragmatics , author =

    A coding scheme for question-response sequences in conversation , volume =. Journal of Pragmatics , author =. 2010 , pages =. doi:10.1016/j.pragma.2010.04.002 , abstract =

  78. [78]

    Weidinger et al., Taxonomy of risks posed by language models.Proc

    Weidinger, Laura and Uesato, Jonathan and Rauh, Maribeth and Griffin, Conor and Huang, Po-Sen and Mellor, John and Glaese, Amelia and Cheng, Myra and Balle, Borja and Kasirzadeh, Atoosa and Biles, Courtney and Brown, Sasha and Kenton, Zac and Hawkins, Will and Stepleton, Tom and Birhane, Abeba and Hendricks, Lisa Anne and Rimell, Laura and Isaac, William ...

  79. [79]

    Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , author =

    All. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , author =. 2024 , pages =. doi:10.1609/aies.v7i1.31613 , abstract =

  80. [80]

    Cognitive Science , author =

    Can. Cognitive Science , author =. 2024 , keywords =. doi:10.1111/cogs.70013 , abstract =

Showing first 80 references.