Recognition: unknown
Talking to a Know-It-All GPT or a Second-Guesser Claude? How Repair reveals unreliable Multi-Turn Behavior in LLMs
Pith reviewed 2026-05-10 02:14 UTC · model grok-4.3
The pith
Each LLM exhibits its own characteristic form of unreliability when handling repair during multi-turn math dialogues.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
In this study, we investigate how LLMs engage in the interactive process of repair in multi-turn dialogues around solvable and unsolvable math questions. We examine whether models initiate repair themselves and how they respond to user-initiated repair. Our results show strong differences across models: reactions range from being almost completely resistant to (appropriate) repair attempts to being highly susceptible and easily manipulated. We further demonstrate that once conversations extend beyond a single turn, model behavior becomes more distinctive and less predictable across systems. Overall, our findings indicate that each tested LLM exhibits its own characteristic form of unreliabil
What carries the argument
The interactive process of repair for resolving trouble in conversation, which carries the argument by exposing model-specific patterns of resistance or susceptibility to corrections in extended dialogues.
If this is right
- Models differ sharply in their willingness to self-repair or accept user repairs on math problems.
- Multi-turn dialogues amplify distinctive and less predictable repair behaviors compared to single turns.
- Repair interactions serve as a diagnostic tool for revealing unreliability that single-turn tests miss.
- Each LLM develops its own characteristic response style to conversational corrections.
Where Pith is reading between the lines
- Users may benefit from learning model-specific ways to phrase corrections to achieve consistent answers.
- Evaluation benchmarks for LLMs should incorporate multi-turn repair tasks to better measure real-world reliability.
- The variability suggests that training data could be augmented with repair examples to reduce model-specific flaws.
Load-bearing premise
The observed differences in repair behavior are intrinsic to the models rather than arising from the specific choice of math questions, prompt phrasing, or evaluation criteria used in the study.
What would settle it
Re-running the experiments with a fresh set of math problems and rephrased prompts and observing that all models display identical repair patterns would falsify the claim of model-specific unreliability.
Figures
read the original abstract
Repair, an important resource for resolving trouble in human-human conversation, remains underexplored in human-LLM interaction. In this study, we investigate how LLMs engage in the interactive process of repair in multi-turn dialogues around solvable and unsolvable math questions. We examine whether models initiate repair themselves and how they respond to user-initiated repair. Our results show strong differences across models: reactions range from being almost completely resistant to (appropriate) repair attempts to being highly susceptible and easily manipulated. We further demonstrate that once conversations extend beyond a single turn, model behavior becomes more distinctive and less predictable across systems. Overall, our findings indicate that each tested LLM exhibits its own characteristic form of unreliability in the context of repair.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical study of repair behaviors in multi-turn human-LLM dialogues focused on solvable and unsolvable math questions. It examines whether models initiate repair and how they respond to user-initiated repair attempts, reporting substantial differences across LLMs (e.g., resistance vs. susceptibility) that become more pronounced and model-distinctive beyond single turns. The central claim is that each tested LLM exhibits its own characteristic form of unreliability in repair contexts.
Significance. If the observed model-specific patterns prove robust, the work would usefully extend conversational AI research by showing that repair mechanisms expose multi-turn inconsistencies not visible in single-turn evaluations. It provides an initial observational mapping of how LLMs handle conversational trouble, which could inform interaction design and reliability benchmarks. The study is strengthened by its focus on an underexplored aspect of human-LLM interaction and by contrasting solvable/unsolvable conditions, though its impact is limited by the absence of statistical controls and invariance tests.
major comments (3)
- [Methods / experimental setup] Methods / experimental setup: No sample size (number of dialogues, questions per model, or turns), statistical tests, exact prompt templates, or inter-annotator agreement for repair coding is reported. Without these, it is impossible to determine whether the reported inter-model differences in repair initiation and response are statistically reliable or merely descriptive, directly undermining the claim that each model has a 'characteristic form of unreliability'.
- [Results / discussion] Results and discussion of question selection: The central claim that differences reflect intrinsic model properties rather than artifacts of the chosen math questions or prompt phrasing is not supported by any sensitivity analysis, question substitution, or prompt paraphrasing. The study fixes the question set and wording; thus the observed repair patterns could arise from interactions between specific problems and model training distributions, as the stress-test concern correctly identifies.
- [Evaluation criteria] Evaluation criteria: The distinction between 'appropriate' repair attempts and model responses lacks an explicit rubric or examples of coding decisions. This makes it difficult to assess whether the reported susceptibility/resistance differences are reproducible or dependent on subjective evaluation criteria.
minor comments (2)
- [Abstract] The abstract would be clearer if it named the specific models tested and the approximate number of turns or dialogues analyzed.
- [Results] Any tables or figures presenting repair frequencies should include error bars or confidence intervals and explicit definitions of the metrics used.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important aspects of transparency and rigor that we will address in the revision. We respond to each major comment below.
read point-by-point responses
-
Referee: [Methods / experimental setup] Methods / experimental setup: No sample size (number of dialogues, questions per model, or turns), statistical tests, exact prompt templates, or inter-annotator agreement for repair coding is reported. Without these, it is impossible to determine whether the reported inter-model differences in repair initiation and response are statistically reliable or merely descriptive, directly undermining the claim that each model has a 'characteristic form of unreliability'.
Authors: We agree that these methodological details are necessary for evaluating the robustness of the observed patterns. In the revised manuscript we will add a dedicated methods subsection reporting the precise sample sizes (number of dialogues and total turns per model), the full prompt templates in an appendix, and a description of the coding procedure. The study is observational and exploratory rather than hypothesis-driven, so we did not apply inferential statistical tests; we will clarify this scope and include descriptive counts of repair events. Coding was performed collaboratively by the authors with consensus resolution; we will document this process and note the absence of formal inter-annotator agreement metrics as a limitation. These additions will make the evidence base more transparent while preserving the descriptive nature of the findings. revision: yes
-
Referee: [Results / discussion] Results and discussion of question selection: The central claim that differences reflect intrinsic model properties rather than artifacts of the chosen math questions or prompt phrasing is not supported by any sensitivity analysis, question substitution, or prompt paraphrasing. The study fixes the question set and wording; thus the observed repair patterns could arise from interactions between specific problems and model training distributions, as the stress-test concern correctly identifies.
Authors: We accept that the absence of sensitivity checks leaves open the possibility that patterns are tied to the specific question set. The questions were chosen as canonical solvable and unsolvable math problems to isolate repair behavior from varying problem difficulty. In revision we will expand the discussion to justify this selection, acknowledge the limitation, and moderate the language from 'intrinsic model properties' to 'model-specific tendencies observed under these conditions.' We cannot conduct new question-substitution experiments at this stage, but the consistency of behaviors across the multiple questions already tested provides initial support for the patterns being model-linked rather than question-specific. revision: partial
-
Referee: [Evaluation criteria] Evaluation criteria: The distinction between 'appropriate' repair attempts and model responses lacks an explicit rubric or examples of coding decisions. This makes it difficult to assess whether the reported susceptibility/resistance differences are reproducible or dependent on subjective evaluation criteria.
Authors: We will revise the methods section to include an explicit coding rubric defining 'appropriate' repair (user requests for clarification on specific mathematical steps or contradictions) and the response categories (resistance: deflection or ignoring; susceptibility: unwarranted answer changes or over-accommodation). We will also add representative dialogue excerpts with our coding decisions to illustrate borderline cases and improve reproducibility. revision: yes
Circularity Check
No circularity: purely observational empirical study with no derivations or fitted predictions
full rationale
This paper conducts an empirical investigation of LLM repair behaviors in multi-turn dialogues involving solvable and unsolvable math questions. It directly compares model outputs for self-initiated repair and responses to user repair attempts, reporting observed differences across systems such as GPT and Claude. No equations, parameters, theoretical models, or derivation chains are present. Results rest on experimental data collection and qualitative description rather than any reduction to prior fits, self-definitions, or self-citations. The central claim of model-specific unreliability follows from the observed patterns without circular construction. External concerns about prompt or question specificity pertain to generalizability, not circularity in the reported findings.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Repair is an important resource for resolving trouble in human-human conversation and can be studied analogously in human-LLM interaction.
Reference graph
Works this paper leans on
-
[1]
The. Language , author =. 1977 , note =. doi:10.2307/413107 , abstract =
-
[2]
Hoeken, Sanne and Zarrieß, Sina and Alacam, Özge , editor =. Hateful. Proceedings of the 2024. 2024 , keywords =. doi:10.18653/v1/2024.emnlp-main.10 , abstract =
-
[3]
Rethinking
Duranti, Alessandro , month = may, year =. Rethinking
-
[4]
Universal. PLOS ONE , author =. 2015 , note =. doi:10.1371/journal.pone.0136100 , abstract =
-
[5]
Trends in Cognitive Sciences , author =
Interactive repair and the foundations of language , volume =. Trends in Cognitive Sciences , author =. 2024 , keywords =. doi:10.1016/j.tics.2023.09.003 , abstract =
-
[6]
Studies in Language , author =
Formats for other-initiation of repair across languages:. Studies in Language , author =. 2014 , pages =. doi:10.1075/sl.38.1.01din , abstract =
-
[7]
Anh, Dang Hoang and Tran, Vu and Nguyen, Le Minh , editor =. Analyzing. New. 2025 , keywords =. doi:10.1007/978-981-96-7071-0_12 , abstract =
-
[8]
Tonmoy, S. M. Towhidul Islam and Zaman, S. M. Mehedi and Jain, Vinija and Rani, Anku and Rawte, Vipula and Chadha, Aman and Das, Amitava , month = jan, year =. A. doi:10.48550/arXiv.2401.01313 , abstract =
-
[9]
DeepSeek-AI and Guo, Daya and Yang, Dejian and Zhang, Haowei and Song, Junxiao and Zhang, Ruoyu and Xu, Runxin and Zhu, Qihao and Ma, Shirong and Wang, Peiyi and Bi, Xiao and Zhang, Xiaokang and Yu, Xingkai and Wu, Yu and Wu, Z. F. and Gou, Zhibin and Shao, Zhihong and Li, Zhuoshu and Gao, Ziyi and Liu, Aixin and Xue, Bing and Wang, Bingxuan and Wu, Bocha...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2501.12948
-
[10]
Abdin, Marah and Aneja, Jyoti and Behl, Harkirat and Bubeck, Sébastien and Eldan, Ronen and Gunasekar, Suriya and Harrison, Michael and Hewett, Russell J. and Javaheripi, Mojan and Kauffmann, Piero and Lee, James R. and Lee, Yin Tat and Li, Yuanzhi and Liu, Weishung and Mendes, Caio C. T. and Nguyen, Anh and Price, Eric and Rosa, Gustavo de and Saarikivi,...
work page internal anchor Pith review doi:10.48550/arxiv.2412.08905
-
[11]
Srivatsa, Kv Aditya and Kochmar, Ekaterina , year =. What. Findings of the. doi:10.18653/v1/2024.findings-naacl.72 , language =
-
[12]
Albornoz-De Luise, Romina Soledad and Arnau, David and Arnau-González, Pablo and Arevalillo-Herráez, Miguel , year =. Beyond the. Proceedings of the 2nd. doi:10.18653/v1/2024.practicald2t-1.1 , language =
-
[13]
Cheng, Ziling and Cao, Meng and Pishdad, Leila and Cao, Yanshuai and Cheung, Jackie Ck , year =. Can. Proceedings of the 2025. doi:10.18653/v1/2025.emnlp-main.723 , language =
-
[14]
Yin, Zhangyue and Sun, YuHong and Huang, Xuanjing and Qiu, Xipeng and Zhao, Hui , year =. Error. Findings of the. doi:10.18653/v1/2025.findings-emnlp.20 , language =
-
[15]
Introducing
Anthropic , year =. Introducing
-
[16]
OpenAI and Achiam, Josh and Adler, Steven and Agarwal, Sandhini and Ahmad, Lama and Akkaya, Ilge and Aleman, Florencia Leoni and Almeida, Diogo and Altenschmidt, Janko and Altman, Sam and Anadkat, Shyamal and Avila, Red and Babuschkin, Igor and Balaji, Suchir and Balcom, Valerie and Baltescu, Paul and Bao, Haiming and Bavarian, Mohammad and Belgum, Jeff a...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.08774
-
[17]
A. ACM Trans. Inf. Syst. , author =. 2025 , keywords =. doi:10.1145/3703155 , abstract =
-
[18]
Topics in Cognitive Science , author =
Running. Topics in Cognitive Science , author =. 2018 , note =. doi:10.1111/tops.12336 , abstract =
-
[19]
, month = aug, year =
Liddicoat, Anthony J. , month = aug, year =. An
-
[20]
Learning and
Toles, Matthew and Huang, Yukun and Yu, Zhou , year =. Learning and. Proceedings of the
-
[21]
Madureira, Brielen and Schlangen, David , year =. Taking. Proceedings of the
-
[22]
Wildenburg, Frank and Hanna, Michael and Pezzelle, Sandro , year =. Do. Findings of the. doi:10.18653/v1/2024.findings-acl.572 , language =
-
[23]
Testoni, Alberto and Fernández, Raquel , year =. Asking the. Proceedings of the 18th. doi:10.18653/v1/2024.eacl-long.16 , language =
-
[24]
Discourse & Communication , author =
Performance without understanding:. Discourse & Communication , author =. 2024 , keywords =. doi:10.1177/17504813241271492 , abstract =
-
[25]
Piryani, Bhawna and Abdallah, Abdelrahman and Mozafari, Jamshid and Jatowt, Adam , year =. Detecting. Findings of the. doi:10.18653/v1/2024.findings-emnlp.562 , language =
-
[26]
Papakostas, Konstantinos and Papadopoulou, Irene , year =. Model. Findings of the. doi:10.18653/v1/2023.findings-acl.279 , language =
-
[27]
Lee, Dongryeol and Kim, Segwang and Lee, Minwoo and Lee, Hwanhee and Park, Joonsuk and Lee, Sang-Woo and Jung, Kyomin , year =. Asking. Findings of the. doi:10.18653/v1/2023.findings-emnlp.772 , language =
-
[28]
Liang, Zhenwen and Zhang, Jipeng and Zhang, Xiangliang , year =. Don’t be. Proceedings of the 13th. doi:10.18653/v1/2023.ijcnlp-main.2 , language =
-
[29]
Leveraging
Han, Donghee and Lim, Seungjae and Roh, Daeyoung and Kim, Sangryul and Kim, Sehyun and. Leveraging. Proceedings of the 31st. 2025 , keywords =
2025
-
[30]
Ngo, Anh and Rollet, Nicolas and Pelachaud, Catherine and Clavel, Chloé , year =. ". Proceedings of the 2025
2025
-
[31]
Sawhney, Riya and Yadav, Samrat and Bhattacharya, Indrajit and Mausam, Mausam , year =. Iterative. Findings of the. doi:10.18653/v1/2025.findings-acl.1262 , language =
-
[32]
Chiyah-Garcia, Javier and Suglia, Alessandro and Eshghi, Arash , year =. Repairs in a. Proceedings of the 2024. doi:10.18653/v1/2024.emnlp-main.643 , language =
-
[33]
Akyurek, Afra Feyza and Akyurek, Ekin and Kalyan, Ashwin and Clark, Peter and Wijaya, Derry Tanti and Tandon, Niket , year =. Proceedings of the 61st. doi:10.18653/v1/2023.acl-long.427 , language =
-
[34]
Detecting
Kim, Hazel and Lamb, Tom A and Bibi, Adel and Torr, Philip and Gal, Yarin , year =. Detecting. Proceedings of the 2025
2025
-
[35]
Chen, Yue and Huang, Chen and Deng, Yang and Lei, Wenqiang and Jin, Dingnan and Liu, Jia and Chua, Tat-Seng , year =. Findings of the. doi:10.18653/v1/2024.findings-acl.632 , language =
-
[36]
Li, Haau-Sing (Xiaocheng) and Mesgar, Mohsen and Martins, André and Gurevych, Iryna , year =. Python. Proceedings of the 61st. doi:10.18653/v1/2023.acl-long.799 , language =
-
[37]
Kim, Gangwoo and Kim, Sungdong and Jeon, Byeongguk and Park, Joonsuk and Kang, Jaewoo , year =. Tree of. Proceedings of the 2023. doi:10.18653/v1/2023.emnlp-main.63 , language =
-
[38]
Tan, Chuanyuan and Shao, Wenbiao and Xiong, Hao and Zhu, Tong and Liu, Zhenhua and Shi, Kai and Chen, Wenliang , year =. Findings of the. doi:10.18653/v1/2025.findings-acl.85 , language =
-
[39]
Yuan, Yuewei and Malaviya, Chaitanya and Yatskar, Mark , year =. Findings of the. doi:10.18653/v1/2023.findings-eacl.75 , language =
-
[40]
Clarify When Necessary: Resolving Ambiguity Through Interaction with LM s
Zhang, Michael J.Q. and Choi, Eunsol , year =. Clarify. Findings of the. doi:10.18653/v1/2025.findings-naacl.306 , language =
-
[41]
Clarifying the
Rahmani, Hossein A and Wang, Xi and Aliannejadi, Mohammad and Naghiaei, Mohammadmehdi and Yilmaz, Emine , year =. Clarifying the. Findings of the
-
[42]
Loftus, Sebastian and Mülthaler, Adrian and Hoeken, Sanne and Zarrieß, Sina and Alacam, Ozge , editor =. Using. Proceedings of the. 2025 , keywords =
2025
-
[43]
Hoeken, Sanne and Alacam, Özge and Nguyen, Dong and Poesio, Massimo and Zarrieß, Sina , editor =. Not. Proceedings of the 16th. 2025 , keywords =
2025
-
[44]
Balaraman, Vevake and Eshghi, Arash and Konstas, Ioannis and Papaioannou, Ioannis , year =. No that’s not what. Proceedings of the 24th. doi:10.18653/v1/2023.sigdial-1.52 , abstract =
-
[45]
Referential ambiguity and clarification requests: comparing human and
Madge, Chris and Purver, Matthew and Poesio, Massimo , year =. Referential ambiguity and clarification requests: comparing human and. doi:10.48550/arXiv.2507.10445 , abstract =
-
[46]
Proceedings of the 2024
Clarifying. Proceedings of the 2024. 2024 , keywords =
2024
-
[47]
Toroghi, Armin and Guo, Willis and Pesaranghader, Ali and Sanner, Scott , year =. Verifiable,. Proceedings of the 2024. doi:10.18653/v1/2024.emnlp-main.379 , language =
-
[48]
Sahay, Rishav and Tekumalla, Lavanya Sita and Aggarwal, Purav and Jain, Arihant and Saladi, Anoop , year =. Proceedings of the 63rd. doi:10.18653/v1/2025.acl-industry.63 , language =
-
[49]
Benchmarking
Sun, YuHong and Yin, Zhangyue and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Zhao, Hui , year =. Benchmarking. Proceedings of the 2024
2024
-
[50]
Naszadi, Kata and Manggala, Putra and Monz, Christof , year =. Aligning. Findings of the. doi:10.18653/v1/2023.findings-emnlp.999 , language =
-
[51]
Frontiers in Robotics and AI , author =
An analysis of dialogue repair in virtual assistants , volume =. Frontiers in Robotics and AI , author =. 2024 , pages =. doi:10.3389/frobt.2024.1356847 , abstract =
-
[52]
Benchmarking
Sun, YuHong and Yin, Zhangyue and Guo, Qipeng and Wu, Jiawen and Qiu, Xipeng and Zhao, Hui , editor =. Benchmarking. Proceedings of the 2024. 2024 , pages =
2024
-
[53]
Gautam, Vagrant and Zhang, Miaoran and Klakow, Dietrich , year =. A. Findings of the. doi:10.18653/v1/2023.findings-emnlp.491 , language =
-
[54]
Deng, Yang and Liao, Lizi and Chen, Liang and Wang, Hongru and Lei, Wenqiang and Chua, Tat-Seng , year =. Prompting and. Findings of the. doi:10.18653/v1/2023.findings-emnlp.711 , abstract =
-
[55]
Zhao, Wenting and Gao, Ge and Cardie, Claire and Rush, Alexander M , year =. I. Proceedings of the 2024. doi:10.18653/v1/2024.emnlp-main.242 , language =
-
[56]
Madureira, Brielen and Schlangen, David , year =. Instruction. Proceedings of the 17th. doi:10.18653/v1/2023.eacl-main.169 , language =
-
[57]
Gorodissky, Yuval and Sulem, Elior and Roth, Dan , year =. Cross-. Proceedings of the 14th
-
[58]
A. Educational and Psychological Measurement , author =. 1960 , note =. doi:10.1177/001316446002000104 , language =
-
[59]
Communication & Medicine , author =
Establishing mutual understanding in interaction:. Communication & Medicine , author =. 2010 , note =. doi:10.1558/cam.v6i2.165 , abstract =
-
[60]
Other-initiated repair across languages: towards a typology of conversational structures , volume =. Open Linguistics , author =. doi:10.2478/opli-2014-0007 , abstract =
-
[61]
A. Open Linguistics , author =. 2016 , keywords =. doi:10.1515/opli-2016-0002 , abstract =
-
[62]
Kitzinger, Celia , year =. Repair , copyright =. The. doi:10.1002/9781118325001.ch12 , note =
-
[63]
Topics in Cognitive Science , author =
Repair:. Topics in Cognitive Science , author =. 2018 , note =. doi:10.1111/tops.12339 , abstract =
-
[64]
Godfrey, J. J. and Holliman, E. C. and McDaniel, J. , month = mar, year =. doi:10.1109/ICASSP.1992.225858 , abstract =
-
[65]
Core, Mark G and Allen, James F , keywords =. Coding
-
[66]
Tapaswi, Makarand and Zhu, Yukun and Stiefelhagen, Rainer and Torralba, Antonio and Urtasun, Raquel and Fidler, Sanja , year =
-
[67]
QuAC : Question Answering in Context
Choi, Eunsol and He, He and Iyyer, Mohit and Yatskar, Mark and Yih, Wen-tau and Choi, Yejin and Liang, Percy and Zettlemoyer, Luke , month = aug, year =. doi:10.48550/arXiv.1808.07036 , abstract =
-
[68]
Transactions of the Association for Computational Linguistics , author =. 2019 , keywords =. doi:10.1162/tacl_a_00266 , abstract =
-
[69]
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi
Yang, Yi and Yih, Wen-tau and Meek, Christopher , editor =. Proceedings of the 2015. 2015 , keywords =. doi:10.18653/v1/D15-1237 , urldate =
-
[70]
doi:10.48550/arXiv.1907.06292 , abstract =
Xiong, Wenhan and Wu, Jiawei and Wang, Hong and Kulkarni, Vivek and Yu, Mo and Chang, Shiyu and Guo, Xiaoxiao and Wang, William Yang , month = jul, year =. doi:10.48550/arXiv.1907.06292 , abstract =
-
[71]
and Pathak, Jyotishman , month = jan, year =
Alambo, Amanuel and Gaur, Manas and Lokala, Usha and Kursuncu, Ugur and Thirunarayan, Krishnaprasad and Gyrard, Amelie and Sheth, Amit and Welton, Randon S. and Pathak, Jyotishman , month = jan, year =. Question. 2019. doi:10.1109/ICOSC.2019.8665525 , abstract =
-
[72]
Journal of Pragmatics , author =
Question–response sequences in conversation across ten languages:. Journal of Pragmatics , author =. 2010 , keywords =. doi:10.1016/j.pragma.2010.04.001 , language =
-
[73]
Wavchat: A survey of spoken dialogue models
Ji, Shengpeng and Chen, Yifu and Fang, Minghui and Zuo, Jialong and Lu, Jingyu and Wang, Hanting and Jiang, Ziyue and Zhou, Long and Liu, Shujie and Cheng, Xize and Yang, Xiaoda and Wang, Zehan and Yang, Qian and Li, Jian and Jiang, Yidi and He, Jingzhen and Chu, Yunfei and Xu, Jin and Zhao, Zhou , month = nov, year =. doi:10.48550/arXiv.2411.13577 , abstract =
-
[74]
and Chiam, Caleb and Fu, Liye and Wang, Andrew Z
Chang, Jonathan P. and Chiam, Caleb and Fu, Liye and Wang, Andrew Z. and Zhang, Justine and Danescu-Niculescu-Mizil, Cristian , month = may, year =. doi:10.48550/arXiv.2005.04246 , abstract =
-
[75]
Zhang, Justine and Spirling, Arthur and Danescu-Niculescu-Mizil, Cristian , month = aug, year =. Asking. doi:10.48550/arXiv.1708.02254 , abstract =
-
[76]
ACM Comput. Surv. , author =. 2023 , keywords =. doi:10.1145/3560260 , abstract =
-
[77]
Journal of Pragmatics , author =
A coding scheme for question-response sequences in conversation , volume =. Journal of Pragmatics , author =. 2010 , pages =. doi:10.1016/j.pragma.2010.04.002 , abstract =
-
[78]
Weidinger et al., Taxonomy of risks posed by language models.Proc
Weidinger, Laura and Uesato, Jonathan and Rauh, Maribeth and Griffin, Conor and Huang, Po-Sen and Mellor, John and Glaese, Amelia and Cheng, Myra and Balle, Borja and Kasirzadeh, Atoosa and Biles, Courtney and Brown, Sasha and Kenton, Zac and Hawkins, Will and Stepleton, Tom and Birhane, Abeba and Hendricks, Lisa Anne and Rimell, Laura and Isaac, William ...
-
[79]
Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , author =
All. Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society , author =. 2024 , pages =. doi:10.1609/aies.v7i1.31613 , abstract =
-
[80]
Can. Cognitive Science , author =. 2024 , keywords =. doi:10.1111/cogs.70013 , abstract =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.