EduCoder: An Open-Source Annotation System for Education Transcript Data

Dorottya Demszky; Guanzhong Pan; Helen Higgins; Hyunji Nam; James Malamut; Liliana Deonizio; Luc\'ia Langlois; Mei Tan; Saad Ashraf; Vishal Kumar

arxiv: 2507.05385 · v5 · submitted 2025-07-07 · 💻 cs.CL

EduCoder: An Open-Source Annotation System for Education Transcript Data

Saad Ashraf , James Malamut , Vishal Kumar , Guanzhong Pan , Hyunji Nam , Mei Tan , Luc\'ia Langlois , Liliana Deonizio

show 2 more authors

Helen Higgins Dorottya Demszky

This is my paper

Pith reviewed 2026-05-19 05:32 UTC · model grok-4.3

classification 💻 cs.CL

keywords educational dialogueannotation toolcodebook developmentutterance-level codingcollaborative annotationopen-source systempedagogical featurestranscript analysis

0 comments

The pith

EduCoder provides a platform for researchers to collaboratively build codebooks and calibrate annotations when coding educational dialogue transcripts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EduCoder to handle the specific demands of annotating teacher-student and peer interactions in education transcripts. It focuses on enabling teams to define complex codebooks drawn from actual data, while supporting both categorical labels and open-ended responses plus lesson context. Side-by-side views of different annotators' work are included to support calibration and improve consistency. A reader would care because accurate, reliable coding is essential for studying teaching practices, yet general annotation tools often lack the needed education-specific features for codebook creation and comparison.

Core claim

EduCoder is designed to address these challenges by providing a platform for researchers and domain experts to collaboratively define complex codebooks based on observed data. It incorporates both categorical and open-ended annotation types along with contextual materials. Additionally, it offers a side-by-side comparison of multiple annotators' responses, allowing comparison and calibration of annotations with others to improve data reliability. The system is open-source.

What carries the argument

EduCoder, the annotation system that combines collaborative codebook definition from observed transcripts, mixed categorical and open-ended coding, contextual lesson materials, and side-by-side annotator comparison views.

If this is right

Teams can iteratively refine codebooks for pedagogical features directly from the transcripts being studied.
Annotators gain flexibility to apply both fixed categories and free-text descriptions to individual utterances.
Direct comparison of responses from multiple coders supports calibration and raises overall data reliability.
Open-source release allows education researchers to adopt, modify, and extend the system for their own projects.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption across studies could produce more comparable labeled datasets on classroom interactions.
The collaborative and comparison features might lower variability in qualitative education research more broadly.
The design could serve as a model for building specialized annotation systems in related fields like counseling or legal transcription.

Load-bearing premise

Existing general-purpose annotation tools do not sufficiently support the creation of complex pedagogical codebooks, mixed annotation types, contextualization, and annotator calibration for education dialogue data.

What would settle it

A direct comparison study that measures inter-annotator agreement scores, time per transcript, and reported ease of use when the same education dialogue data is coded with EduCoder versus a general-purpose tool.

Figures

Figures reproduced from arXiv: 2507.05385 by Dorottya Demszky, Guanzhong Pan, Helen Higgins, Hyunji Nam, James Malamut, Liliana Deonizio, Luc\'ia Langlois, Mei Tan, Saad Ashraf, Vishal Kumar.

**Figure 1.** Figure 1: Overview of EduCoder. EduCoder is designed to facilitate the collaborative annotation of educational dialogue data. It is a comprehensive, end-to-end platform for A. importing and preprocessing conversation transcripts, B. collaborative utterance-level annotation with customizable codebooks, and C. real-time inter-rater reliability (IRR) monitoring and cross-annotator comparison. This system aims to enhanc… view at source ↗

**Figure 2.** Figure 2: Screenshots of example tasks supported by EduCoder. Example 2a shows the codebook definition interface [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

We introduce EduCoder, a domain-specialized tool designed to support utterance-level annotation of educational dialogue. While general-purpose text annotation tools for NLP and qualitative research abound, few address the complexities of coding education dialogue transcripts -- with diverse teacher-student and peer interactions. Common challenges include defining codebooks for complex pedagogical features, supporting both open-ended and categorical coding, and contextualizing utterances with external features, such as the lesson's purpose and the pedagogical value of the instruction. EduCoder is designed to address these challenges by providing a platform for researchers and domain experts to collaboratively define complex codebooks based on observed data. It incorporates both categorical and open-ended annotation types along with contextual materials. Additionally, it offers a side-by-side comparison of multiple annotators' responses, allowing comparison and calibration of annotations with others to improve data reliability. The system is open-source, with a demo video available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

EduCoder is a clear description of an open-source annotation tool for education dialogues with some domain-specific features, but it rests entirely on that description without any evaluation data.

read the letter

Hi, the core of this paper is a practical open-source system for annotating educational dialogue transcripts at the utterance level. It targets workflow issues like building complex codebooks collaboratively, mixing categorical and open-ended labels, supplying context such as lesson purpose, and letting annotators compare their work side-by-side for calibration. Those features line up with the challenges the authors list for this subfield. The write-up is direct about what the tool does and they release the code, which is the main concrete output here. That makes the contribution verifiable in a basic way. What the paper does well is spell out why general tools fall short for pedagogical data and then map specific UI choices to those gaps. The open-source release and demo video are straightforward positives for anyone who might actually use it. The soft spot is obvious and not minor: there are no user studies, no reliability metrics, no timing comparisons, and no head-to-head tests against existing platforms. We have no evidence that the added features improve agreement, speed, or ease of use in practice. The claims stay at the level of intended design. This is for education researchers who spend time coding transcripts and want something more tailored than generic NLP annotation software. A reader in that narrow area could get immediate value from trying the released tool. It deserves peer review as a systems or tools paper because the implementation is public and the problem description is honest. A referee could check the code and suggest concrete improvements without needing to evaluate untested performance claims.

Referee Report

0 major / 2 minor

Summary. The paper introduces EduCoder, an open-source annotation system specialized for utterance-level coding of educational dialogue transcripts. It identifies four challenges with general-purpose tools (complex codebook definition, mixed categorical/open-ended types, contextual materials, and annotator calibration) and describes how the system supports collaborative codebook definition based on observed data, both annotation types, contextual support, and side-by-side annotator comparison for calibration.

Significance. If the described features operate as presented, EduCoder offers a domain-specific platform that could streamline annotation workflows for education researchers analyzing teacher-student and peer interactions. The open-source release and demo video constitute clear strengths that support reproducibility and community adoption.

minor comments (2)

The motivation section would benefit from explicit citations or a brief comparison table to 2-3 existing general-purpose tools (e.g., Doccano, Label Studio) to ground the claim that current options fall short on the listed challenges.
Consider adding a short limitations or future-work paragraph discussing scalability for large corpora or integration with existing qualitative-analysis pipelines.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive review and recommendation to accept the manuscript. We appreciate the recognition of EduCoder's domain-specific features for educational dialogue annotation and the value placed on its open-source release.

Circularity Check

0 steps flagged

No significant circularity

full rationale

This is a descriptive system paper introducing an open-source annotation tool. It states challenges in education dialogue coding and describes how EduCoder's features (collaborative codebook definition, mixed annotation types, contextual support, annotator comparison) address them. There are no derivations, equations, fitted parameters, predictions, or load-bearing self-citations that reduce to internal definitions. The contribution rests on the feature description and open-source release, which is externally verifiable and does not rely on any self-referential construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software-tool paper. It introduces no free parameters, mathematical axioms, or postulated entities; the only assumptions are standard software-engineering premises such as the existence of a web browser and user willingness to adopt the interface.

pith-pipeline@v0.9.0 · 5714 in / 1060 out tokens · 98553 ms · 2026-05-19T05:32:38.977057+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

EduCoder is designed to address these challenges by providing a platform for researchers and domain experts to collaboratively define complex codebooks based on observed data. It incorporates both categorical and open-ended annotation types along with contextual materials.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages

[1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Sterling Alic, Dorottya Demszky, Zid Mancenido, Jing Liu, Heather Hill, and Dan Jurafsky. 2022. Computationally identifying funneling and focusing questions in classroom discourse. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 224--233

work page 2022
[4]

Anthropic. 2025. Claude 4 sonnet model card. System card published May 2025; training data cut‐off: March 2025; accessed July 2 2025

work page 2025
[5]

ATLAS.ti Scientific Software Development GmbH . 2023. ATLAS.ti Mac (version 23.2.1) [qualitative data analysis software]. Accessed: 2025-07-04

work page 2023
[6]

Ljubi s a Boji\'c, Olga Zagovora, Asta Zelenkauskait\.e, Vuk Vukovi\'c, Milan C abarkapa, Selma Veseljevi\'c Jerkovi\'c, and Ana Jovan c evi\'c. 2025. Evaluating large language models against human annotators in latent content analysis: Sentiment, political leaning, emotional intensity, and sarcasm. arXiv preprint arXiv:2501.02532

work page arXiv 2025
[7]

David Broska, Michael Howes, and Austin van Loon. 2025. The mixed subjects design: Treating large language models as potentially informative observations. Sociological Methods & Research, page 00491241251326865

work page 2025
[8]

Nitay Calderon, Roi Reichart, and Rotem Dror. 2025. The alternative annotator test for llm-as-a-judge: How to statistically justify replacing human annotators with llms. arXiv preprint arXiv:2501.10970

work page arXiv 2025
[9]

John L Campbell, Charles Quincy, Jordan Osserman, and Ove K Pedersen. 2013. Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociological methods & research, 42(3):294--320

work page 2013
[10]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37--46

work page 1960
[11]

Rosanna Cole. 2024. Inter-rater reliability methods in qualitative case study research. Sociological Methods & Research, 53(4):1944--1975

work page 2024
[12]

Tobias Daudert. 2020. A web-based collaborative annotation and consolidation tool. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7053--7059

work page 2020
[13]

Dorottya Demszky and Heather Hill. 2023. The ncte transcripts: A dataset of elementary math classroom transcripts. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 528--538

work page 2023
[14]

Dorottya Demszky, Jing Liu, Zid Mancenido, Julie Cohen, Heather Hill, Dan Jurafsky, and Tatsunori B Hashimoto. 2021. Measuring conversational uptake: A case study on student-teacher interactions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Proces...

work page 2021
[15]

Naihao Deng, Yikai Liu, Mingye Chen, Winston Wu, Siyang Liu, Yulong Chen, Yue Zhang, and Rada Mihalcea. 2023. Ease: An easily-customized annotation system powered by efficiency enhancement mechanisms. arXiv preprint arXiv:2305.14169

work page arXiv 2023
[16]

Kerry Dhakal. 2022. Nvivo. Journal of the Medical Library Association: JMLA, 110(2):270

work page 2022
[17]

Sidney K D'Mello and Art Graesser. 2012. Language and discourse are powerful signals of student emotions during tutoring. IEEE Transactions on Learning Technologies, 5(4):304--317

work page 2012
[18]

Zackary Okun Dunivin. 2025. Scaling hermeneutics: a guide to qualitative coding with llms for reflexive content analysis. EPJ Data Science, 14(1):28

work page 2025
[19]

Eisenhardt

Kathleen M. Eisenhardt. 1989. Building theories from case study research. Academy of Management Review, 14(4):532--550. Available at JSTOR: stable 258557

work page 1989
[20]

David Garlan, Vishal Dwivedi, Ivan Ruchkin, and Bradley Schmerl. 2012. Foundations and tools for end-user architecting. In Large-Scale Complex IT Systems. Development, Operation and Management: 17th Monterey Workshop 2012, Oxford, UK, March 19-21, 2012, Revised Selected Papers 17, pages 157--182. Springer

work page 2012
[21]

Kevin A Hallgren. 2012. Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology, 8(1):23

work page 2012
[22]

Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures, 1(1):77--89

work page 2007
[23]

Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi, and Vera Demberg. 2023. Do large language models and humans have similar behaviors in causal inference with script knowledge? arXiv preprint arXiv:2311.07311

work page arXiv 2023
[24]

Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear

work page 2017
[25]

beautiful work, you're rock stars!

Nicholas Hunkins, Sean Kelly, and Sidney D'Mello. 2022. “beautiful work, you're rock stars!”: Teacher analytics to uncover discourse that supports or undermines student motivation, identity, and belonging in classrooms. In Lak22: 12th international learning analytics and knowledge conference, pages 230--238

work page 2022
[26]

Mete Ismayilzada, Claire Stevenson, and Lonneke van der Plas. 2024. Evaluating creative short story generation in humans and large language models. arXiv preprint arXiv:2411.02316

work page arXiv 2024
[27]

Pugh, and Sidney K

Emily Jensen, Samuel L. Pugh, and Sidney K. D'Mello. 2021. A deep transfer learning approach to modeling teacher discourse in the classroom. In LAK21: 11th international learning analytics and knowledge conference, pages 302--312

work page 2021
[28]

Jan-Christoph Klie, Michael Bugert, Beto Boullosa, Richard Eckart de Castilho, and Iryna Gurevych. 2018. The INCEpTION platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 5--9, Santa Fe, New Mexico, USA. Association for Comp...

work page 2018
[29]

Klaus Krippendorff. 2018. Content Analysis: An Introduction to Its Methodology, 4th edition. SAGE Publications, Inc., Thousand Oaks, CA

work page 2018
[30]

Yun Long, Haifeng Luo, and Yu Zhang. 2024. Evaluating large language models in analysing classroom dialogue. npj Science of Learning, 9(1):60

work page 2024
[31]

Jakub Macina, Nico Daheim, Sankalan Chowdhury, Tanmay Sinha, Manu Kapur, Iryna Gurevych, and Mrinmaya Sachan. 2023. Mathdial: A dialogue tutoring dataset with rich pedagogical properties grounded in math reasoning problems. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5602--5621

work page 2023
[32]

Neil Mercer and Christine Howe. 2012. Explaining the dialogic processes of teaching and learning: The value and potential of sociocultural theory. Learning, Culture and Social Interaction, 1(1):12--21

work page 2012
[33]

Sarah Michaels, Catherine O'Connor, and Lauren B. Resnick. 2008. Deliberative discourse idealized and realized: Accountable talk in the classroom and in civic life. In Studies in Philosophy and Education, volume 27, pages 283--297

work page 2008
[34]

Prodigy: A modern and scriptable annotation tool for creating training data for machine learning models

Ines Montani and Matthew Honnibal. Prodigy: A modern and scriptable annotation tool for creating training data for machine learning models

work page
[35]

Alberto Mu \ n oz‑Ortiz, Carlos Gómez‑Rodríguez, and David Vilares. 2023. Contrasting linguistic patterns in human and llm‑generated news text. arXiv preprint arXiv:2308.09067. Version 3 (Sep 2, 2024)

work page arXiv 2023
[36]

Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano : Text annotation tool for human. Open-source text annotation software

work page 2018
[37]

OpenAI. 2024. Gpt-4o technical report. Accessed: 2025-07-02

work page 2024
[38]

Soya Park, April Yi Wang, Ban Kawas, Q Vera Liao, David Piorkowski, and Marina Danilevsky. 2021. Facilitating knowledge sharing from domain experts to data scientists for building nlp models. In Proceedings of the 26th International Conference on Intelligent User Interfaces, pages 585--596

work page 2021
[39]

Michael Quinn Patton. 2002. Two decades of developments in qualitative inquiry: A personal, experiential perspective. Qualitative social work, 1(3):261--283

work page 2002
[40]

Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent, and David Jurgens. 2022. Potato: The portable text annotation tool. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 327--337

work page 2022
[41]

Tal Perry. 2021. Lighttag: Text annotation platform. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 20--27

work page 2021
[42]

Katherine Stasaski, Kimberly Kao, and Marti A Hearst. 2020. Cima: A large open access dialogue dataset for tutoring. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52--64

work page 2020
[43]

Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H Martin, and Tamara Sumner. 2022. The talkmoves dataset: K-12 mathematics lesson transcripts annotated for teacher and student discursive moves. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4654--4662

work page 2022
[44]

Abhijit Suresh, Tamara Sumner, Isabella Huang, Jennifer Jacobs, Bill Foland, and Wayne Ward. 2018. Using deep learning to automatically detect talk moves in teachers' mathematics lessons. In 2018 IEEE International Conference on Big Data (Big Data), pages 5445--5447. IEEE

work page 2018
[45]

2020-2022

Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020-2022. Label Studio : Data labeling software. Open source software

work page 2020
[46]

Ludi Wang, Dongze Song, Qiang Cui, Xueqing Chen, Yuanchun Zhou, Wenjuan Cui, and Yi Du. 2025. Autodive+: An adaptive model enhanced multimodal online annotation tool. In Companion Proceedings of the ACM on Web Conference 2025, pages 2919--2922

work page 2025
[47]

Rose Wang, Qingyang Zhang, Carly Robinson, Susanna Loeb, and Dorottya Demszky. 2024. Bridging the novice-expert gap via models of decision-making: A case study on remediating math mistakes. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Paper...

work page 2024
[48]

Cynthia Weston, Terry Gandell, Jacinthe Beauchamp, Lynn McAlpine, Carol Wiseman, and Cathy Beauchamp. 2001. Analyzing interview data: The development and evolution of a coding system. Qualitative sociology, 24:381--400

work page 2001
[49]

Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, and Yejin Choi. 2021. Reframing human-ai collaboration for generating free-text explanations. arXiv preprint arXiv:2112.08674

work page arXiv 2021
[50]

Linxuan Zhao, Dragan Ga s evi \'c , Zachari Swiecki, Yuheng Li, Jionghao Lin, Lele Sha, Lixiang Yan, Riordan Alfredo, Xinyu Li, and Roberto Martinez-Maldonado. 2024. Towards automated transcribing and coding of embodied teamwork communication through multimodal learning analytics. British Journal of Educational Technology, 55(4):1673--1702

work page 2024

[1] [1]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Sterling Alic, Dorottya Demszky, Zid Mancenido, Jing Liu, Heather Hill, and Dan Jurafsky. 2022. Computationally identifying funneling and focusing questions in classroom discourse. In Proceedings of the 17th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2022), pages 224--233

work page 2022

[4] [4]

Anthropic. 2025. Claude 4 sonnet model card. System card published May 2025; training data cut‐off: March 2025; accessed July 2 2025

work page 2025

[5] [5]

ATLAS.ti Scientific Software Development GmbH . 2023. ATLAS.ti Mac (version 23.2.1) [qualitative data analysis software]. Accessed: 2025-07-04

work page 2023

[6] [6]

Ljubi s a Boji\'c, Olga Zagovora, Asta Zelenkauskait\.e, Vuk Vukovi\'c, Milan C abarkapa, Selma Veseljevi\'c Jerkovi\'c, and Ana Jovan c evi\'c. 2025. Evaluating large language models against human annotators in latent content analysis: Sentiment, political leaning, emotional intensity, and sarcasm. arXiv preprint arXiv:2501.02532

work page arXiv 2025

[7] [7]

David Broska, Michael Howes, and Austin van Loon. 2025. The mixed subjects design: Treating large language models as potentially informative observations. Sociological Methods & Research, page 00491241251326865

work page 2025

[8] [8]

Nitay Calderon, Roi Reichart, and Rotem Dror. 2025. The alternative annotator test for llm-as-a-judge: How to statistically justify replacing human annotators with llms. arXiv preprint arXiv:2501.10970

work page arXiv 2025

[9] [9]

John L Campbell, Charles Quincy, Jordan Osserman, and Ove K Pedersen. 2013. Coding in-depth semistructured interviews: Problems of unitization and intercoder reliability and agreement. Sociological methods & research, 42(3):294--320

work page 2013

[10] [10]

Jacob Cohen. 1960. A coefficient of agreement for nominal scales. Educational and psychological measurement, 20(1):37--46

work page 1960

[11] [11]

Rosanna Cole. 2024. Inter-rater reliability methods in qualitative case study research. Sociological Methods & Research, 53(4):1944--1975

work page 2024

[12] [12]

Tobias Daudert. 2020. A web-based collaborative annotation and consolidation tool. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 7053--7059

work page 2020

[13] [13]

Dorottya Demszky and Heather Hill. 2023. The ncte transcripts: A dataset of elementary math classroom transcripts. In Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 528--538

work page 2023

[14] [14]

Dorottya Demszky, Jing Liu, Zid Mancenido, Julie Cohen, Heather Hill, Dan Jurafsky, and Tatsunori B Hashimoto. 2021. Measuring conversational uptake: A case study on student-teacher interactions. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Proces...

work page 2021

[15] [15]

Naihao Deng, Yikai Liu, Mingye Chen, Winston Wu, Siyang Liu, Yulong Chen, Yue Zhang, and Rada Mihalcea. 2023. Ease: An easily-customized annotation system powered by efficiency enhancement mechanisms. arXiv preprint arXiv:2305.14169

work page arXiv 2023

[16] [16]

Kerry Dhakal. 2022. Nvivo. Journal of the Medical Library Association: JMLA, 110(2):270

work page 2022

[17] [17]

Sidney K D'Mello and Art Graesser. 2012. Language and discourse are powerful signals of student emotions during tutoring. IEEE Transactions on Learning Technologies, 5(4):304--317

work page 2012

[18] [18]

Zackary Okun Dunivin. 2025. Scaling hermeneutics: a guide to qualitative coding with llms for reflexive content analysis. EPJ Data Science, 14(1):28

work page 2025

[19] [19]

Eisenhardt

Kathleen M. Eisenhardt. 1989. Building theories from case study research. Academy of Management Review, 14(4):532--550. Available at JSTOR: stable 258557

work page 1989

[20] [20]

David Garlan, Vishal Dwivedi, Ivan Ruchkin, and Bradley Schmerl. 2012. Foundations and tools for end-user architecting. In Large-Scale Complex IT Systems. Development, Operation and Management: 17th Monterey Workshop 2012, Oxford, UK, March 19-21, 2012, Revised Selected Papers 17, pages 157--182. Springer

work page 2012

[21] [21]

Kevin A Hallgren. 2012. Computing inter-rater reliability for observational data: an overview and tutorial. Tutorials in quantitative methods for psychology, 8(1):23

work page 2012

[22] [22]

Andrew F Hayes and Klaus Krippendorff. 2007. Answering the call for a standard reliability measure for coding data. Communication methods and measures, 1(1):77--89

work page 2007

[23] [23]

Xudong Hong, Margarita Ryzhova, Daniel Adrian Biondi, and Vera Demberg. 2023. Do large language models and humans have similar behaviors in causal inference with script knowledge? arXiv preprint arXiv:2311.07311

work page arXiv 2023

[24] [24]

Matthew Honnibal and Ines Montani. 2017. spacy 2: Natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. To appear

work page 2017

[25] [25]

beautiful work, you're rock stars!

Nicholas Hunkins, Sean Kelly, and Sidney D'Mello. 2022. “beautiful work, you're rock stars!”: Teacher analytics to uncover discourse that supports or undermines student motivation, identity, and belonging in classrooms. In Lak22: 12th international learning analytics and knowledge conference, pages 230--238

work page 2022

[26] [26]

Mete Ismayilzada, Claire Stevenson, and Lonneke van der Plas. 2024. Evaluating creative short story generation in humans and large language models. arXiv preprint arXiv:2411.02316

work page arXiv 2024

[27] [27]

Pugh, and Sidney K

Emily Jensen, Samuel L. Pugh, and Sidney K. D'Mello. 2021. A deep transfer learning approach to modeling teacher discourse in the classroom. In LAK21: 11th international learning analytics and knowledge conference, pages 302--312

work page 2021

[28] [28]

Jan-Christoph Klie, Michael Bugert, Beto Boullosa, Richard Eckart de Castilho, and Iryna Gurevych. 2018. The INCEpTION platform: Machine-assisted and knowledge-oriented interactive annotation. In Proceedings of the 27th International Conference on Computational Linguistics: System Demonstrations, pages 5--9, Santa Fe, New Mexico, USA. Association for Comp...

work page 2018

[29] [29]

Klaus Krippendorff. 2018. Content Analysis: An Introduction to Its Methodology, 4th edition. SAGE Publications, Inc., Thousand Oaks, CA

work page 2018

[30] [30]

Yun Long, Haifeng Luo, and Yu Zhang. 2024. Evaluating large language models in analysing classroom dialogue. npj Science of Learning, 9(1):60

work page 2024

[31] [31]

Jakub Macina, Nico Daheim, Sankalan Chowdhury, Tanmay Sinha, Manu Kapur, Iryna Gurevych, and Mrinmaya Sachan. 2023. Mathdial: A dialogue tutoring dataset with rich pedagogical properties grounded in math reasoning problems. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 5602--5621

work page 2023

[32] [32]

Neil Mercer and Christine Howe. 2012. Explaining the dialogic processes of teaching and learning: The value and potential of sociocultural theory. Learning, Culture and Social Interaction, 1(1):12--21

work page 2012

[33] [33]

Sarah Michaels, Catherine O'Connor, and Lauren B. Resnick. 2008. Deliberative discourse idealized and realized: Accountable talk in the classroom and in civic life. In Studies in Philosophy and Education, volume 27, pages 283--297

work page 2008

[34] [34]

Prodigy: A modern and scriptable annotation tool for creating training data for machine learning models

Ines Montani and Matthew Honnibal. Prodigy: A modern and scriptable annotation tool for creating training data for machine learning models

work page

[35] [35]

Alberto Mu \ n oz‑Ortiz, Carlos Gómez‑Rodríguez, and David Vilares. 2023. Contrasting linguistic patterns in human and llm‑generated news text. arXiv preprint arXiv:2308.09067. Version 3 (Sep 2, 2024)

work page arXiv 2023

[36] [36]

Hiroki Nakayama, Takahiro Kubo, Junya Kamura, Yasufumi Taniguchi, and Xu Liang. 2018. doccano : Text annotation tool for human. Open-source text annotation software

work page 2018

[37] [37]

OpenAI. 2024. Gpt-4o technical report. Accessed: 2025-07-02

work page 2024

[38] [38]

Soya Park, April Yi Wang, Ban Kawas, Q Vera Liao, David Piorkowski, and Marina Danilevsky. 2021. Facilitating knowledge sharing from domain experts to data scientists for building nlp models. In Proceedings of the 26th International Conference on Intelligent User Interfaces, pages 585--596

work page 2021

[39] [39]

Michael Quinn Patton. 2002. Two decades of developments in qualitative inquiry: A personal, experiential perspective. Qualitative social work, 1(3):261--283

work page 2002

[40] [40]

Jiaxin Pei, Aparna Ananthasubramaniam, Xingyao Wang, Naitian Zhou, Apostolos Dedeloudis, Jackson Sargent, and David Jurgens. 2022. Potato: The portable text annotation tool. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 327--337

work page 2022

[41] [41]

Tal Perry. 2021. Lighttag: Text annotation platform. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 20--27

work page 2021

[42] [42]

Katherine Stasaski, Kimberly Kao, and Marti A Hearst. 2020. Cima: A large open access dialogue dataset for tutoring. In Proceedings of the Fifteenth Workshop on Innovative Use of NLP for Building Educational Applications, pages 52--64

work page 2020

[43] [43]

Abhijit Suresh, Jennifer Jacobs, Charis Harty, Margaret Perkoff, James H Martin, and Tamara Sumner. 2022. The talkmoves dataset: K-12 mathematics lesson transcripts annotated for teacher and student discursive moves. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 4654--4662

work page 2022

[44] [44]

Abhijit Suresh, Tamara Sumner, Isabella Huang, Jennifer Jacobs, Bill Foland, and Wayne Ward. 2018. Using deep learning to automatically detect talk moves in teachers' mathematics lessons. In 2018 IEEE International Conference on Big Data (Big Data), pages 5445--5447. IEEE

work page 2018

[45] [45]

2020-2022

Maxim Tkachenko, Mikhail Malyuk, Andrey Holmanyuk, and Nikolai Liubimov. 2020-2022. Label Studio : Data labeling software. Open source software

work page 2020

[46] [46]

Ludi Wang, Dongze Song, Qiang Cui, Xueqing Chen, Yuanchun Zhou, Wenjuan Cui, and Yi Du. 2025. Autodive+: An adaptive model enhanced multimodal online annotation tool. In Companion Proceedings of the ACM on Web Conference 2025, pages 2919--2922

work page 2025

[47] [47]

Rose Wang, Qingyang Zhang, Carly Robinson, Susanna Loeb, and Dorottya Demszky. 2024. Bridging the novice-expert gap via models of decision-making: A case study on remediating math mistakes. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Paper...

work page 2024

[48] [48]

Cynthia Weston, Terry Gandell, Jacinthe Beauchamp, Lynn McAlpine, Carol Wiseman, and Cathy Beauchamp. 2001. Analyzing interview data: The development and evolution of a coding system. Qualitative sociology, 24:381--400

work page 2001

[49] [49]

Sarah Wiegreffe, Jack Hessel, Swabha Swayamdipta, Mark Riedl, and Yejin Choi. 2021. Reframing human-ai collaboration for generating free-text explanations. arXiv preprint arXiv:2112.08674

work page arXiv 2021

[50] [50]

Linxuan Zhao, Dragan Ga s evi \'c , Zachari Swiecki, Yuheng Li, Jionghao Lin, Lele Sha, Lixiang Yan, Riordan Alfredo, Xinyu Li, and Roberto Martinez-Maldonado. 2024. Towards automated transcribing and coding of embodied teamwork communication through multimodal learning analytics. British Journal of Educational Technology, 55(4):1673--1702

work page 2024