Temperature and Persona Shape LLM Agent Consensus With Minimal Accuracy Gains in Qualitative Coding
Pith reviewed 2026-05-19 03:58 UTC · model grok-4.3
The pith
Temperature and persona settings shape when LLM multi-agent systems reach consensus but produce little accuracy gain over single agents in qualitative coding.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Temperature significantly impacted whether and when consensus was reached across all six LLMs, multiple personas delayed consensus in four models, and higher temperatures diminished those persona effects in three models; however, neither temperature nor persona pairing produced robust improvements in coding accuracy, with single agents matching or outperforming MAS consensus in most conditions.
What carries the argument
The open-source multi-agent system that emulates deductive human coding through structured agent discussion and consensus arbitration.
If this is right
- Consensus timing varies with temperature in every tested LLM.
- Multiple personas delay consensus relative to uniform personas in four LLMs.
- Higher temperatures reduce the delaying effect of multiple personas in three LLMs.
- Single agents match or exceed MAS accuracy in most tested conditions.
Where Pith is reading between the lines
- Analysis of coding disagreements within the MAS could guide refinements to codebook design.
- The pattern of minimal accuracy gains may hold for other deductive annotation tasks beyond educational dialogues.
- Researchers might default to single-agent prompting unless the consensus process itself is the object of study.
Load-bearing premise
The structured agent discussion and consensus arbitration in the multi-agent system accurately emulates the deductive human coding process captured by the gold-standard annotations.
What would settle it
A replication in which MAS consensus accuracy exceeds single-agent accuracy by a statistically significant margin across a majority of models and experimental conditions.
Figures
read the original abstract
Large Language Models (LLMs) enable new possibilities for qualitative research at scale, including annotation and qualitative coding of educational data. While LLM-based multi-agent systems (MAS) can emulate human coding workflows, their benefits over single LLM agents for coding remain poorly understood. To that end, we conducted an experimental study of how persona and temperature of component agents of a MAS shapes consensus-building and coding accuracy for dialog segments. LLMs were prompted to code these segments deductively using a mature codebook with 8 codes and high inter-rater reliability derived from prior research. Our open-source MAS mirrors deductive human coding through structured agent discussion and consensus arbitration. Using six open-source LLMs (with 3 to 32 billion parameters) and 18 experimental configurations, we analyze over 77,000 coding decisions against a gold-standard dataset of human-annotated transcripts from online math tutoring sessions facilitated by educational software. Temperature significantly impacted whether and when consensus was reached across all six LLMs. MAS with multiple personas (including neutral, assertive, or empathetic) significantly delayed consensus in four out of six LLMs compared to uniform personas. In three of those LLMs, higher temperatures significantly diminished the effects of multiple personas on consensus. However, neither temperature nor persona pairing led to robust improvements in coding accuracy. Single agents matched or outperformed MAS consensus in most conditions. Qualitative analysis of MAS collaboration and coding disagreement may, however, improve codebook design and human-AI coding.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports an empirical study of multi-agent LLM systems (MAS) for deductive qualitative coding of educational dialog segments against a human gold-standard dataset. Using six open-source LLMs (3B–32B parameters) and 18 configurations, the authors examine how temperature and persona diversity affect consensus timing and coding accuracy across >77,000 decisions. Key claims are that temperature modulates consensus, multiple personas delay consensus in four of six models, and neither factor produces robust accuracy gains; single agents match or exceed MAS performance in most conditions.
Significance. If the central empirical comparison holds after methodological clarification, the work provides useful evidence that added complexity of structured MAS discussion and arbitration may not improve accuracy over single-agent prompting for deductive coding with a mature, high-reliability codebook. The scale of the experiment and the open-source MAS implementation are strengths that could inform efficient LLM deployment in qualitative research pipelines.
major comments (2)
- [Methods] Methods (MAS description): The consensus arbitration procedure is described only at a high level as 'structured agent discussion and consensus arbitration.' It is not specified whether final codes are produced by majority vote, moderator override, iterative refinement until agreement, or another rule. This detail is load-bearing for the accuracy comparison because any aggregation rule that systematically favors high-frequency codes or penalizes rare ones could artifactually lower MAS accuracy relative to single agents and the gold-standard distribution.
- [Results] Results (accuracy claims): The assertion that 'single agents matched or outperformed MAS consensus in most conditions' is not supported by per-configuration accuracy numbers, confidence intervals, or statistical tests. Without these, it is impossible to evaluate whether the observed pattern is robust across the 18 experimental cells or driven by a subset of LLMs or code frequencies.
minor comments (2)
- [Abstract] Abstract: The final sentence states that 'Qualitative analysis of MAS collaboration and coding disagreement may, however, improve codebook design,' yet no such qualitative analysis is referenced in the provided text; either include a brief summary or revise the claim.
- [Methods] Notation: 'MAS' and 'single agents' are used throughout; a short table or paragraph defining the exact prompting templates and output formats for each would improve reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods (MAS description): The consensus arbitration procedure is described only at a high level as 'structured agent discussion and consensus arbitration.' It is not specified whether final codes are produced by majority vote, moderator override, iterative refinement until agreement, or another rule. This detail is load-bearing for the accuracy comparison because any aggregation rule that systematically favors high-frequency codes or penalizes rare ones could artifactually lower MAS accuracy relative to single agents and the gold-standard distribution.
Authors: We agree that the arbitration procedure requires greater specificity. In the revised manuscript we will expand the Methods section to describe the exact consensus rules, including the voting mechanism (majority vote with moderator tie-breaker), the number of discussion rounds permitted, and how the final code is selected when agents disagree. revision: yes
-
Referee: [Results] Results (accuracy claims): The assertion that 'single agents matched or outperformed MAS consensus in most conditions' is not supported by per-configuration accuracy numbers, confidence intervals, or statistical tests. Without these, it is impossible to evaluate whether the observed pattern is robust across the 18 experimental cells or driven by a subset of LLMs or code frequencies.
Authors: We accept that the accuracy comparison would be more convincing with disaggregated results. We will add a table (or supplementary table) reporting accuracy for each of the 18 configurations together with 95% confidence intervals and note any pairwise statistical comparisons between single-agent and MAS conditions. revision: yes
Circularity Check
Empirical comparison to human gold standard; no derivations or fitted predictions
full rationale
The paper reports direct experimental measurements of coding accuracy and consensus rates for single LLM agents versus MAS configurations across 18 setups and 77,000 decisions, benchmarked against an external human-annotated gold standard. No equations, parameter fits, or first-principles derivations are present that could reduce reported outcomes to inputs by construction. All load-bearing claims rest on observable experimental contrasts rather than self-referential definitions or self-citation chains.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The codebook with 8 codes has high inter-rater reliability derived from prior research
- domain assumption Structured agent discussion and consensus arbitration in the MAS emulates human coding workflows
Reference graph
Works this paper leans on
-
[1]
" write newline "" before.all 'output.state := FUNCTION article output.bibitem format.authors "author" output.check author format.key output output.year.check new.block format.title "title" output.check new.block crossref missing format.jour.vol output format.article.crossref output.nonnull format.pages output if new.block note output fin.entry FUNCTION b...
-
[2]
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Abdin, M. , Aneja, J. , Awadalla, H. , Awadallah, A. , Awan, A. A. , Bach, N. , Bahree, A. , Bakhtiari, A. , Bao, J. , Behl, H. , et al . 2024. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219\/
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[3]
Ahmad, K. , Iqbal, W. , El-Hassan, A. , Qadir, J. , Benhaddou, D. , Ayyash, M. , and Al-Fuqaha, A. 2023. Data-driven artificial intelligence in education: A comprehensive review. IEEE Transactions on Learning Technologies\/ 17 , 12--31
work page 2023
-
[4]
Baker, R. S. , Ga s evi \'c , D. , and Karumbaiah, S. 2021. Four paradigms in learning analytics: Why paradigm convergence matters. Computers and Education: Artificial Intelligence\/ 2 , 100021
work page 2021
-
[5]
Barany, A. , Nasiar, N. , Porter, C. , Zambrano, A. F. , Andres, A. L. , Bright, D. , Shah, M. , Liu, X. , Gao, S. , Zhang, J. , et al . 2024. Chatgpt for education research: exploring the potential of large language models for qualitative codebook development. In International conference on artificial intelligence in education . Springer, 134--149
work page 2024
-
[6]
Barrick, M. R. , Stewart, G. L. , Neubert, M. J. , and Mount, M. K. 1998. Relating member ability and personality to work-team processes and team effectiveness. Journal of applied psychology\/ 83,\/ 3, 377
work page 1998
-
[7]
Barry, B. and Stewart, G. L. 1997. Composition, process, and performance in self-managed groups: the role of personality. Journal of Applied psychology\/ 82,\/ 1, 62
work page 1997
-
[8]
Bates, D. , M \"a chler, M. , Bolker, B. , and Walker, S. 2015. Fitting linear mixed-effects models using lme4. Journal of statistical software\/ 67 , 1--48
work page 2015
-
[9]
Benjamini, Y. and Hochberg, Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological)\/ 57,\/ 1, 289--300
work page 1995
-
[10]
Borchers, C. , Thomas, D. R. , Lin, J. , Abboud, R. , and Koedinger, K. R. 2025. Augmenting human-annotated training data with large language model generation and distillation in open-response assessment. arXiv preprint arXiv:2501.09126\/
-
[11]
Borchers, C. , Zhang, J. , Baker, R. S. , and Aleven, V. 2024. Using think-aloud data to understand relations between self-regulation cycle characteristics and student performance in intelligent tutoring systems. In Proceedings of the 14th Learning Analytics and Knowledge Conference . 529--539
work page 2024
-
[12]
Braun, V. and Clarke, V. 2006. Using thematic analysis in psychology. Qualitative research in psychology\/ 3,\/ 2, 77--101
work page 2006
-
[13]
Braun, V. and Clarke, V. 2021. One size fits all? what counts as quality practice in (reflexive) thematic analysis? Qualitative research in psychology\/ 18,\/ 3, 328--352
work page 2021
-
[14]
Chandler, C. , Breideband, T. , Reitman, J. G. , Chitwood, M. , Bush, J. B. , Howard, A. , Leonhart, S. , Foltz, P. W. , Penuel, W. R. , and D'Mello, S. K. 2024. Computational modeling of collaborative discourse to enable feedback and reflection in middle school classrooms. In Proceedings of the 14th Learning Analytics and Knowledge Conference . 576--586
work page 2024
- [15]
- [16]
- [17]
-
[18]
Cheung, K. K. C. and Tai, K. W. 2023. The use of intercoder reliability in qualitative interview data analysis in science education. Research in Science & Technological Education\/ 41,\/ 3, 1155--1175
work page 2023
-
[19]
Chew, R. , Bollenbacher, J. , Wenger, M. , Speer, J. , and Kim, A. 2023. Llm-assisted content analysis: Using large language models to support deductive coding. arXiv preprint arXiv:2306.14924\/
-
[20]
Chittem, A. , Shrivastava, A. , Pendela, S. T. , Challa, J. S. , and Kumar, D. 2025. Sac: A framework for measuring and inducing personality traits in llms with dynamic intensity control. arXiv preprint arXiv:2506.20993\/
- [21]
-
[22]
De Paoli, S. 2024. Performing an inductive thematic analysis of semi-structured interviews with a large language model: An exploration and provocation on the limits of the approach. Social Science Computer Review\/ 42,\/ 4, 997--1019
work page 2024
-
[23]
D \'o sa, K. and Russ, R. 2016. Beyond correctness: Using qualitative methods to uncover nuances of student learning in undergraduate stem education. Journal of College Science Teaching\/ 46,\/ 2, 70--81
work page 2016
-
[24]
Fischer, C. , Pardos, Z. A. , Baker, R. S. , Williams, J. J. , Smyth, P. , Yu, R. , Slater, S. , Baker, R. , and Warschauer, M. 2020. Mining big data in education: Affordances and challenges. Review of research in education\/ 44,\/ 1, 130--160
work page 2020
-
[25]
Gao, J. , Guo, Y. , Lim, G. , Zhang, T. , Zhang, Z. , Li, T. J.-J. , and Perrault, S. T. 2024. Collabcoder: a lower-barrier, rigorous workflow for inductive collaborative qualitative analysis with large language models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems . 1--29
work page 2024
- [26]
-
[27]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Guo, D. , Yang, D. , Zhang, H. , Song, J. , Zhang, R. , Xu, R. , Zhu, Q. , Ma, S. , Wang, P. , Bi, X. , et al . 2025. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948\/
work page internal anchor Pith review Pith/arXiv arXiv 2025
- [28]
-
[29]
Hilal, A. H. and Alabri, S. S. 2013. Using nvivo for data analysis in qualitative research. International interdisciplinary journal of education\/ 2,\/ 2, 181--186
work page 2013
-
[30]
OliverPJohnandSanjaySrivastava.1999
Jiang, H. , Zhang, X. , Cao, X. , Breazeal, C. , Roy, D. , and Kabbara, J. 2023. Personallm: Investigating the ability of large language models to express personality traits. arXiv preprint arXiv:2305.02547\/
- [31]
-
[32]
Kambhampati, S. , Stechly, K. , Valmeekam, K. , Saldyt, L. , Bhambri, S. , Palod, V. , Gundawar, A. , Samineni, S. R. , Kalwar, D. , and Biswas, U. 2025. Stop anthropomorphizing intermediate tokens as reasoning/thinking traces! arXiv preprint arXiv:2504.09762\/
- [33]
-
[34]
Kuckartz, U. and R \"a diker, S. 2019. Analyzing qualitative data with MAXQDA . Springer
work page 2019
-
[35]
La Cava, L. and Tagarelli, A. 2025. Open models, closed minds? on agents capabilities in mimicking human personalities through open large language models. In Proceedings of the AAAI Conference on Artificial Intelligence . Vol. 39. 1355--1363
work page 2025
-
[36]
LLMs Get Lost In Multi-Turn Conversation
Laban, P. , Hayashi, H. , Zhou, Y. , and Neville, J. 2025. Llms get lost in multi-turn conversation. arXiv preprint arXiv:2505.06120\/
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[37]
Landis, J. R. and Koch, G. G. 1977. The measurement of observer agreement for categorical data. biometrics\/ , 159--174
work page 1977
-
[38]
Llm generated persona is a promise with a catch.arXiv preprint arXiv:2503.16527, 2025
Li, A. , Chen, H. , Namkoong, H. , and Peng, T. 2025. Llm generated persona is a promise with a catch. arXiv preprint arXiv:2503.16527\/
-
[39]
Li, G. , Chen, L. , Tang, C. , S v \'a bensk \`y , V. , Deguchi, D. , Yamashita, T. , and Shimada, A. 2025. Single-agent vs. multi-agent llm strategies for automated student reflection assessment. In Pacific-Asia Conference on Knowledge Discovery and Data Mining . Springer, 300--311
work page 2025
-
[40]
Internal consistency and self-feedback in large language models: A survey, 2024
Liang, X. , Song, S. , Zheng, Z. , Wang, H. , Yu, Q. , Li, X. , Li, R.-H. , Wang, Y. , Wang, Z. , Xiong, F. , et al . 2024. Internal consistency and self-feedback in large language models: A survey. arXiv preprint arXiv:2407.14507\/
-
[41]
Liu, L. 2016. Using generic inductive approach in qualitative educational research: A case study analysis. Journal of Education and Learning\/ 5,\/ 2, 129--135
work page 2016
-
[42]
Liu, X. , Zambrano, A. F. , Baker, R. S. , Barany, A. , Ocumpaugh, J. , Zhang, J. , Pankiewicz, M. , Nasiar, N. , and Wei, Z. 2025. Qualitative coding with gpt-4: Where it works better. Journal of Learning Analytics\/ , 1--17
work page 2025
-
[43]
McCrae, R. R. and John, O. P. 1992. An introduction to the five-factor model and its applications. Journal of personality\/ 60,\/ 2, 175--215
work page 1992
- [44]
-
[45]
Mistral-AI . 2024. Mistral-small-3.2-24b-instruct-2506. [Mistral-Small] Hugging Face
work page 2024
-
[46]
Mount, M. K. , Barrick, M. R. , and Stewart, G. L. 1998. Five-factor model of personality and performance in jobs involving interpersonal interactions. Human performance\/ 11,\/ 2-3, 145--165
work page 1998
-
[47]
Naeem, M. , Ozuem, W. , Howell, K. , and Ranfagni, S. 2023. A step-by-step process of thematic analysis to develop a conceptual model in qualitative research. International journal of qualitative methods\/ 22 , 16094069231205789
work page 2023
-
[48]
Personality-driven decision-making in llm-based au- tonomous agents
Newsham, L. and Prince, D. 2025. Personality-driven decision-making in llm-based autonomous agents. arXiv preprint arXiv:2504.00727\/
-
[49]
Ng, A. 2024. Agentic design patterns part 5: Multi-agent collaboration
work page 2024
-
[50]
Ollama . 2023. Ollama: Run large language models locally. https://ollama.com. Accessed: 2025-07-09
work page 2023
-
[51]
O’Connor, C. and Joffe, H. 2020. Intercoder reliability in qualitative research: debates and practical guidelines. International journal of qualitative methods\/ 19 , 1609406919899220
work page 2020
-
[52]
Pan, K. and Zeng, Y. 2023. Do llms possess a personality? making the mbti test an amazing evaluation for large language models. arXiv preprint arXiv:2307.16180\/
-
[53]
Panickssery, A. , Bowman, S. , and Feng, S. 2024. Llm evaluators recognize and favor their own generations. Advances in Neural Information Processing Systems\/ 37 , 68772--68802
work page 2024
-
[54]
Pinkwart, N. 2016. Another 25 years of aied? challenges and opportunities for intelligent educational technologies of the future. International journal of artificial intelligence in education\/ 26 , 771--783
work page 2016
-
[55]
Pugh, S. L. , Rao, A. , Stewart, A. E. , and D'Mello, S. K. 2022. Do speech-based collaboration analytics generalize across task contexts? In LAK22: 12th International Learning Analytics and Knowledge Conference . 208--218
work page 2022
-
[56]
Qiao, T. , Walker, C. , Cunningham, C. W. , and Koh, Y. S. 2025. Thematic-lm: a llm-based multi-agent system for large-scale thematic analysis. In Proceedings of the ACM on Web Conference 2025 . 649--658
work page 2025
-
[57]
Rafailov, R. , Sharma, A. , Mitchell, E. , Manning, C. D. , Ermon, S. , and Finn, C. 2023. Direct preference optimization: Your language model is secretly a reward model. Advances in Neural Information Processing Systems\/ 36 , 53728--53741
work page 2023
-
[58]
Ramanathan, S. , Lim, L.-A. , Mottaghi, N. R. , and Buckingham Shum, S. 2025. When the prompt becomes the codebook: Grounded prompt engineering (groproe) and its application to belonging analytics. In Proceedings of the 15th International Learning Analytics and Knowledge Conference . 713--725
work page 2025
-
[59]
Rasheed, Z. , Waseem, M. , Ahmad, A. , Kemell, K.-K. , Xiaofeng, W. , Duc, A. N. , and Abrahamsson, P. 2024. Can large language models serve as data analysts? a multi-agent assisted approach for qualitative data analysis. arXiv preprint arXiv:2402.01386\/
-
[60]
A Survey of Hallucination in Large Foundation Models
Rawte, V. , Sheth, A. , and Das, A. 2023. A survey of hallucination in large foundation models. arXiv preprint arXiv:2309.05922\/
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[61]
Reza, M. , Anastasopoulos, I. , Bhandari, S. , and Pardos, Z. A. 2025. Prompthive: Bringing subject matter experts back to the forefront with collaborative prompt engineering for educational content creation. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems . 1--22
work page 2025
-
[62]
Richards, K. and Hemphill, M. 2017. A practical guide to collaborative qualitative data analysis. Journal of Teaching in Physical Education\/ 37 , 1--20
work page 2017
-
[63]
Sankaranarayanan, S. , Borchers, C. , Simon, S. , Tajik, E. , Ata s , A. H. , Celik, B. , Balzan, F. , and Shahrokhian, B. 2025. Automating thematic analysis with multi-agent llm systems. EdArXiv Preprints (https://doi.org/10.35542/osf.io/kq8zh\_v1)\/
-
[64]
Serapio-Garc \' a, G. , Safdari, M. , Crepy, C. , Sun, L. , Fitz, S. , Abdulhai, M. , Faust, A. , and Matari \'c , M. 2023. Personality traits in large language models
work page 2023
-
[65]
Simon, S. , Sankaranarayanan, S. , Tajik, E. , Borchers, C. , Bahar, s. , Balzan, F. , Strau , S. , Viswanathan, S. , Ata s , A. , C arapina, M. , Liang, L. , and Celik, B. 2025. Comparing human and llm-generated inductive thematic analyses: Assessing agreement in coding consistency and interpretative accuracy. Proceedings of 26th International Conference...
work page 2025
-
[66]
Smit, B. 2002. Atlas. ti for qualitative data analysis. Perspectives in education\/ 20,\/ 3, 65--75
work page 2002
-
[67]
Tai, R. H. , Bentley, L. R. , Xia, X. , Sitt, J. M. , Fankhauser, S. C. , Chicas-Mosier, A. M. , and Monteith, B. G. 2024. An examination of the use of large language models to aid analysis of textual data. International Journal of Qualitative Methods\/ 23 , 16094069241231168
work page 2024
-
[68]
Takata, R. , Masumori, A. , and Ikegami, T. 2024. Spontaneous emergence of agent individuality through social interactions in llm-based communities. arXiv preprint arXiv:2411.03252\/
-
[69]
Teknium . 2023. Openhermes-2-mistral-7b. [Openhermes2-7B] Hugging Face
work page 2023
-
[70]
Terry, G. , Hayfield, N. , Clarke, V. , Braun, V. , et al . 2017. Thematic analysis. The SAGE handbook of qualitative research in psychology\/ 2,\/ 17-37, 25
work page 2017
-
[71]
Tommaso, T. , Hegazy, M. , Lemay, D. , Abukalam, M. , Rish, I. , and Dumas, G. 2024. Llms and personalities: Inconsistencies across scales. In NeurIPS 2024 Workshop on Behavioral Machine Learning
work page 2024
-
[72]
LLaMA: Open and Efficient Foundation Language Models
Touvron, H. , Lavril, T. , Izacard, G. , Martinet, X. , Lachaux, M.-A. , Lacroix, T. , Rozi \`e re, B. , Goyal, N. , Hambro, E. , Azhar, F. , et al . 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971\/
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[73]
Multi-Agent Collaboration Mechanisms: A Survey of LLMs
Tran, K.-T. , Dao, D. , Nguyen, M.-D. , Pham, Q.-V. , O'Sullivan, B. , and Nguyen, H. D. 2025. Multi-agent collaboration mechanisms: A survey of llms. arXiv preprint arXiv:2501.06322\/
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[74]
Venugopalan, D. , Yan, Z. , Borchers, C. , Lin, J. , and Aleven, V. 2025. Combining large language models with tutoring system intelligence: A case study in caregiver homework support. In Proceedings of the 15th International Learning Analytics and Knowledge Conference . 373--383
work page 2025
-
[75]
Vinay, R. , Spitale, G. , Biller-Andorno, N. , and Germani, F. 2025. Emotional prompting amplifies disinformation generation in ai large language models. Frontiers in Artificial Intelligence\/ 8 , 1543603
work page 2025
-
[76]
Xiao, Z. , Yuan, X. , Liao, Q. V. , Abdelghani, R. , and Oudeyer, P.-Y. 2023. Supporting qualitative analysis with large language models: Combining codebook with gpt-3 for deductive coding. In Companion proceedings of the 28th international conference on intelligent user interfaces . 75--78
work page 2023
-
[77]
WizardLM: Empowering large pre-trained language models to follow complex instructions
Xu, C. , Sun, Q. , Zheng, K. , Geng, X. , Zhao, P. , Feng, J. , Tao, C. , and Jiang, D. 2023. Wizardlm: Empowering large language models to follow complex instructions. arXiv preprint arXiv:2304.12244\/
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[78]
Yan, L. , Echeverria, V. , Fernandez-Nieto, G. M. , Jin, Y. , Swiecki, Z. , Zhao, L. , Ga s evi \'c , D. , and Martinez-Maldonado, R. 2024. Human-ai collaboration in thematic analysis using chatgpt: A user study and design recommendations. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems . 1--7
work page 2024
-
[79]
Zambrano, A. F. , Liu, X. , Barany, A. , Baker, R. S. , Kim, J. , and Nasiar, N. 2023. From ncoder to chatgpt: From automated coding to refining human coding. In International conference on quantitative ethnography . Springer, 470--485
work page 2023
- [80]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.