Effects of Collaboration on the Performance of Interactive Theme Discovery Systems

Alexandra Barry; Alvin Po-Chun Chen; Dananjay Srinivas; Maksim Seniw; Maria Leonor Pacheco; Rohan Das

arxiv: 2408.09030 · v6 · submitted 2024-08-16 · 💻 cs.CL · cs.HC

Effects of Collaboration on the Performance of Interactive Theme Discovery Systems

Alvin Po-Chun Chen , Rohan Das , Dananjay Srinivas , Alexandra Barry , Maksim Seniw , Maria Leonor Pacheco This is my paper

Pith reviewed 2026-05-23 22:09 UTC · model grok-4.3

classification 💻 cs.CL cs.HC

keywords collaborationsynchronousasynchronousqualitative analysisNLP toolstheme discoveryevaluation frameworkconsistency

0 comments

The pith

Synchronous versus asynchronous collaboration produces distinct differences in consistency, cohesiveness, and correctness when using interactive NLP-assisted theme discovery tools.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes a framework for evaluating how collaboration settings shape research outcomes in qualitative data analysis that relies on interactive NLP tools. It applies this framework to compare synchronous and asynchronous modes across three different tools, then measures resulting variations in the consistency, cohesiveness, and correctness of discovered themes. A sympathetic reader cares because the framework supplies a concrete method to test and improve collaborative practices rather than assuming one mode is uniformly better. The work therefore supplies evidence that timing of collaboration is a controllable variable worth tracking in tool-supported qualitative research.

Core claim

We propose a framework to evaluate the way collaboration settings may produce different research outcomes across a variety of interactive systems. Specifically, we study the impact of synchronous versus asynchronous collaboration using three different NLP-assisted qualitative research tools and present a comprehensive analysis of the differences in the consistency, cohesiveness, and correctness of their outcomes.

What carries the argument

An evaluation framework that compares synchronous and asynchronous collaboration across multiple interactive NLP-assisted tools by tracking consistency, cohesiveness, and correctness of theme discovery outcomes.

If this is right

Collaboration mode can be treated as an experimental variable that measurably shifts the quality profile of themes produced by interactive systems.
The proposed framework supplies a repeatable protocol for comparing additional tools or additional collaboration variables.
Researchers can use the three metrics to diagnose whether a given collaboration setting improves or reduces outcome reliability.
Tool interfaces may need to expose or log collaboration timing so that teams can monitor its influence on final theme sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be extended to non-NLP qualitative tools to test whether the same collaboration-mode effects appear outside the NLP-assisted setting.
If synchronous and asynchronous modes produce reliably different error patterns, future tool designs might include mode-specific prompts or review steps.
Teams could run small pilot studies with the framework before committing to a collaboration schedule for a large qualitative project.

Load-bearing premise

The three chosen tools together with the three chosen metrics of consistency, cohesiveness, and correctness are representative enough to support general claims about collaboration effects in interactive theme discovery systems.

What would settle it

A replication study that applies the same framework to a fourth independent NLP-assisted tool and finds no measurable differences in consistency, cohesiveness, or correctness between synchronous and asynchronous conditions would falsify the reported effects.

Figures

Figures reproduced from arXiv: 2408.09030 by Alexandra Barry, Alvin Po-Chun Chen, Dananjay Srinivas, Maksim Seniw, Maria Leonor Pacheco, Rohan Das.

**Figure 2.** Figure 2: Two sets of annotators use a particular HiTL [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Once an annotator has identified themes and [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: shows the correctness results for each quartile sample per experiment. First, we observe that the relational system is not only the most accurate, but it shows negligible correctness differences in synchronous vs. asynchronous configurations. This an encouraging result, given that this system took the most advantage of synchronous deliberation based on our other metrics of quality. The other two systems … view at source ↗

read the original abstract

NLP-assisted solutions to support qualitative data analysis have gained considerable traction. However, no unified evaluation framework exists which can account for the many different settings in which qualitative researchers may employ them. In this paper, we propose a framework to evaluate the way collaboration settings may produce different research outcomes across a variety of interactive systems. Specifically, we study the impact of synchronous vs. asynchronous collaboration using three different NLP-assisted qualitative research tools and present a comprehensive analysis of the differences in the consistency, cohesiveness, and correctness of their outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper runs a user study on sync versus async collaboration across three specific NLP theme discovery tools and measures consistency, cohesiveness, and correctness, but the narrow tool sample limits how far the results can be generalized.

read the letter

The main takeaway is that this paper sets up an empirical comparison of synchronous and asynchronous collaboration in three NLP-assisted tools for theme discovery, tracking differences in consistency, cohesiveness, and correctness of the resulting themes. The specific combination of those collaboration modes with those three metrics on those tools is not something already in the cited literature, so that part is new. The authors also try to give a structured way to evaluate collaboration effects in interactive qualitative tools, which fills a small gap in the HCI and applied NLP space. They do a reasonable job of laying out why collaboration settings might matter for research outcomes and of running the comparison across multiple tools rather than just one. The evidence they present is empirical rather than purely conceptual, which is a plus. The main soft spot is the choice of only three tools with no sampling justification or sensitivity checks shown in the abstract. If the observed differences are driven by idiosyncratic features of those particular tools or task setups, the claims about collaboration effects in general do not hold up. The abstract also leaves out participant counts, exclusion rules, and statistical tests, so it is hard to judge how solid the reported differences actually are. If the full paper has those details and they are sound, the concern shrinks; otherwise it stays central. This work is aimed at HCI and NLP researchers who build or evaluate interactive qualitative analysis systems. A reader who needs concrete data on collaboration modes for theme discovery could get some practical value, but anyone looking for broader claims about qualitative research or general interactive systems will find the scope too narrow. The paper shows clear thinking on its own terms and engages with the relevant evaluation practices, so it deserves a serious referee to check the methods and the generalizability argument. I would send it to peer review at a relevant conference or journal but would flag the tool selection and measurement details as items that need strengthening.

Referee Report

2 major / 1 minor

Summary. The paper proposes a framework for evaluating how collaboration settings (synchronous vs. asynchronous) affect research outcomes across interactive NLP-assisted qualitative research tools for theme discovery. It applies the framework to three such tools and reports a comprehensive analysis of differences in consistency, cohesiveness, and correctness of the resulting themes.

Significance. If the empirical findings hold after addressing generalizability concerns, the work could provide a useful starting point for standardized evaluation of collaboration effects in interactive qualitative analysis systems, filling a noted gap in unified frameworks. The explicit focus on measurable outcome dimensions (consistency, cohesiveness, correctness) is a positive step toward falsifiable claims in this domain.

major comments (2)

[Abstract] Abstract, paragraph 3: the central claim that the framework reveals collaboration effects 'across a variety of interactive systems' rests on only three specific NLP-assisted tools; no sampling justification, tool-class taxonomy, or sensitivity checks versus non-NLP baselines are described, so it remains possible that reported differences are idiosyncratic to the chosen implementations rather than attributable to synchronous/asynchronous settings.
[Methods / Experimental Setup] The manuscript provides no visible participant counts, statistical tests, data exclusion rules, or inter-rater reliability measures for the consistency/cohesiveness/correctness metrics; without these, it is impossible to determine whether the reported differences between collaboration conditions are supported by the measurements or could be explained by small sample variance or task-specific confounds.

minor comments (1)

[Abstract] The abstract would benefit from naming the three tools and briefly indicating how the metrics are operationalized, to allow readers to assess scope immediately.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on generalizability and methodological transparency. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract, paragraph 3: the central claim that the framework reveals collaboration effects 'across a variety of interactive systems' rests on only three specific NLP-assisted tools; no sampling justification, tool-class taxonomy, or sensitivity checks versus non-NLP baselines are described, so it remains possible that reported differences are idiosyncratic to the chosen implementations rather than attributable to synchronous/asynchronous settings.

Authors: We agree that the selection of three tools requires explicit justification to support the claim of applicability 'across a variety of interactive systems.' The tools were chosen to span distinct interaction styles (e.g., varying degrees of NLP automation and user steering), but we did not include a formal taxonomy or non-NLP baselines because the study scope centers on collaboration settings within existing NLP-assisted tools rather than a comprehensive system-class comparison. In revision we will add a Methods subsection detailing the selection criteria and diversity rationale, temper the abstract claim to reference the three studied systems, and add a limitations paragraph acknowledging the absence of non-NLP baselines and the need for future sensitivity checks. We do not believe a full taxonomy is required for the current contribution. revision: partial
Referee: [Methods / Experimental Setup] The manuscript provides no visible participant counts, statistical tests, data exclusion rules, or inter-rater reliability measures for the consistency/cohesiveness/correctness metrics; without these, it is impossible to determine whether the reported differences between collaboration conditions are supported by the measurements or could be explained by small sample variance or task-specific confounds.

Authors: We apologize that these details were not sufficiently prominent. The study collected data from a defined number of participants per condition, applied statistical tests to compare metrics across conditions, used predefined exclusion criteria for incomplete sessions, and assessed inter-rater reliability for the correctness metric. In the revised manuscript we will insert a dedicated 'Participants, Procedure, and Analysis' subsection that explicitly reports participant counts, the statistical tests employed (including p-values and effect sizes), exclusion rules, and reliability coefficients. This will make the evidential basis for the reported differences fully transparent. revision: yes

Circularity Check

0 steps flagged

Empirical user study with no derivation chain or self-referential inputs

full rationale

The paper describes an empirical user study comparing synchronous vs. asynchronous collaboration across three specific NLP-assisted tools, measuring consistency, cohesiveness, and correctness. No equations, fitted parameters, predictions derived from inputs, or self-citation chains appear in the provided text. The central claims rest on experimental observations rather than any quantity defined inside the paper by construction. This matches the default expectation of no significant circularity for non-mathematical empirical work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only paper; no free parameters, axioms, or invented entities are stated. The work rests on standard HCI assumptions about measurable user outcomes and tool representativeness.

pith-pipeline@v0.9.0 · 5621 in / 1020 out tokens · 30569 ms · 2026-05-23T22:09:04.134869+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

[1]

Shai Ben-David and Margareta Ackerman. 2008. https://proceedings.neurips.cc/paper_files/paper/2008/file/beed13602b9b0e6ecb5b568ff5058f07-Paper.pdf Measures of clustering quality: A working set of axioms for clustering . In Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc

work page 2008
[2]

Henry E. Brady. 2019. https://doi.org/10.1146/annurev-polisci-090216-023229 The challenge of big data and data science . Annual Review of Political Science, 22(1):297--323

work page doi:10.1146/annurev-polisci-090216-023229 2019
[3]

Virginia Braun and Victoria Clarke. 2006. https://doi.org/10.1191/1478088706qp063oa Using thematic analysis in psychology . Qualitative Research in Psychology, 3:77--101

work page doi:10.1191/1478088706qp063oa 2006
[4]

Nan-Chen Chen, Margaret Drouhard, Rafal Kocielnik, Jina Suh, and Cecilia R. Aragon. 2018. https://doi.org/10.1145/3185515 Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity . ACM Trans. Interact. Intell. Syst., 8(2)

work page doi:10.1145/3185515 2018
[5]

Robert Chew, John Bollenbacher, Michael Wenger, Jessica Speer, and Annice Kim. 2023. https://arxiv.org/abs/2306.14924 Llm-assisted content analysis: Using large language models to support deductive coding . Preprint, arXiv:2306.14924

work page arXiv 2023
[6]

Reddy, and Haesun Park

Jaegul Choo, Changhyun Lee, Chandan K. Reddy, and Haesun Park. 2013. https://doi.org/10.1109/TVCG.2013.212 Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization . IEEE Transactions on Visualization and Computer Graphics, 19(12):1992--2001

work page doi:10.1109/tvcg.2013.212 2013
[7]

McFarland

Jason Chuang and Daniel A. McFarland. 2013. https://api.semanticscholar.org/CorpusID:43940920 Document exploration with topic modeling : Designing interactive visualizations to support effective analysis workflows

work page 2013
[9]

Shih-Chieh Dai, Aiping Xiong, and Lun-Wei Ku. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.669 LLM -in-the-loop: Leveraging large language model for thematic analysis . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9993--10001, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.findings-emnlp.669 2023
[10]

Margaret Drouhard, Nan-Chen Chen, Jina Suh, Rafal Kocielnik, Vanessa Peña-Araya, Keting Cen, Xiangyi Zheng, and Cecilia R. Aragon. 2017. https://doi.org/10.1109/PACIFICVIS.2017.8031598 Aeonium: Visual analytics to support collaborative qualitative coding . In 2017 IEEE Pacific Visualization Symposium (PacificVis), pages 220--229

work page doi:10.1109/pacificvis.2017.8031598 2017
[11]

Zheng Fang, Lama Alqazlan, Du Liu, Yulan He, and Rob Procter. 2023. https://doi.org/10.18653/v1/2023.eacl-main.37 A user-centered, interactive, human-in-the-loop topic modelling system . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 505--522, Dubrovnik, Croatia. Association for Comput...

work page doi:10.18653/v1/2023.eacl-main.37 2023
[12]

Zheng Fang, Yulan He, and Rob Procter. 2021. https://doi.org/10.18653/v1/2021.findings-acl.154 A query-driven topic model . In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1764--1777, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.findings-acl.154 2021
[13]

Feuston and Jed R

Jessica L. Feuston and Jed R. Brubaker. 2021. https://doi.org/10.1145/3479856 Putting tools in their place: The role of time and perspective in human-ai collaboration for qualitative analysis . Proc. ACM Hum.-Comput. Interact., 5(CSCW2)

work page doi:10.1145/3479856 2021
[14]

Uwe Flick. 2014. https://doi.org/10.4135/9781446282243 The sage handbook of qualitative data analysis

work page doi:10.4135/9781446282243 2014
[15]

Jie Gao, Kenny Tsu Wei Choo, Junming Cao, Roy Ka-Wei Lee, and Simon Perrault. 2023. https://doi.org/10.1145/3617362 Coaicoder: Examining the effectiveness of ai-assisted human-to-human collaboration in qualitative analysis . ACM Trans. Comput.-Hum. Interact., 31(1)

work page doi:10.1145/3617362 2023
[16]

Jie Gao, Yuchen Guo, Gionnieve Lim, Tianqin Zhang, Zheng Zhang, Toby Jia-Jun Li, and Simon Tangi Perrault. 2024. https://arxiv.org/abs/2304.07366 Collabcoder: A lower-barrier, rigorous workflow for inductive collaborative qualitative analysis with large language models . Preprint, arXiv:2304.07366

work page arXiv 2024
[17]

Barney G Glaser, Anselm L Strauss, and Elizabeth Strutzel. 1968. The discovery of grounded theory; strategies for qualitative research. Nursing research, 17(4):364

work page 1968
[18]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Kilem Li Gwet. 2008. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1):29--48

work page 2008
[20]

Smaldino, Wouter van Atteveldt, Annie Waldherr, Jingwen Zhang, and Jonathan J

Martin Hilbert, George Barnett, Joshua Blumenstock, Noshir Contractor, Jana Diesner, Seth Frey, Sandra González-Bailón, PJ Lamberson, Jennifer Pan, Tai-Quan Peng, Cuihua (Cindy) Shen, Paul E. Smaldino, Wouter van Atteveldt, Annie Waldherr, Jingwen Zhang, and Jonathan J. H. Zhu. 2019. https://ijoc.org/index.php/ijoc/article/view/10675 Computational communi...

work page 2019
[21]

Enamul Hoque and Giuseppe Carenini. 2016. https://doi.org/10.1145/2854158 Interactive topic modeling for exploring asynchronous online conversations: Design and evaluation of convisit . ACM Trans. Interact. Intell. Syst., 6(1)

work page doi:10.1145/2854158 2016
[22]

Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik. 2021. Is automated topic model evaluation broken? the incoherence of coherence. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS '21, Red Hook, NY, USA. Curran Associates Inc

work page 2021
[23]

Brubaker

Jialun Aaron Jiang, Kandrea Wade, Casey Fiesler, and Jed R. Brubaker. 2021. https://doi.org/10.1145/3449168 Supporting serendipity: Opportunities and challenges for human-ai collaboration in qualitative analysis . Proc. ACM Hum.-Comput. Interact., 5(CSCW1)

work page doi:10.1145/3449168 2021
[24]

Xin Jin and Jiawei Han. 2010. https://doi.org/10.1007/978-0-387-30164-8_425 K-Means Clustering , pages 563--564. Springer US, Boston, MA

work page doi:10.1007/978-0-387-30164-8_425 2010
[25]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2024. Large language models are zero-shot reasoners. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA. Curran Associates Inc

work page 2024
[26]

Julia Mendelsohn, Ceren Budak, and David Jurgens. 2021. https://doi.org/10.18653/v1/2021.naacl-main.179 Modeling framing in immigration discourse on social media . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2219--2263, Online. Association for Comp...

work page doi:10.18653/v1/2021.naacl-main.179 2021
[27]

Maria Leonor Pacheco, Tunazzina Islam, Monal Mahajan, Andrey Shor, Ming Yin, Lyle Ungar, and Dan Goldwasser. 2022. https://doi.org/10.18653/v1/2022.naacl-main.427 A holistic framework for analyzing the COVID -19 vaccine debate . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lang...

work page doi:10.18653/v1/2022.naacl-main.427 2022
[28]

Maria Leonor Pacheco, Tunazzina Islam, Lyle Ungar, Ming Yin, and Dan Goldwasser. 2023. https://doi.org/10.18653/v1/2023.findings-acl.313 Interactive concept learning for uncovering latent themes in large text collections . In Findings of the Association for Computational Linguistics: ACL 2023, pages 5059--5080, Toronto, Canada. Association for Computation...

work page doi:10.18653/v1/2023.findings-acl.313 2023
[29]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019
[30]

Tim Rietz, Peyman Toreini, and Alexander Maedche. 2020. https://doi.org/10.1145/3379350.3416195 Cody: An interactive machine learning system for qualitative coding . In Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, UIST '20 Adjunct, page 90–92, New York, NY, USA. Association for Computing Machinery

work page doi:10.1145/3379350.3416195 2020
[31]

Shamik Roy, Maria Leonor Pacheco, and Dan Goldwasser. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.783 Identifying morality frames in political tweets using relational learning . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9939--9958, Online and Punta Cana, Dominican Republic. Association for Compu...

work page doi:10.18653/v1/2021.emnlp-main.783 2021
[32]

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. 2018. https://doi.org/10.1145/3172944.3172965 Closing the loop: User-centered design and evaluation of a human-in-the-loop topic modeling system . In Proceedings of the 23rd International Conference on Intelligent User Interfaces, IUI '18, page 293–304, New York, NY, USA. Asso...

work page doi:10.1145/3172944.3172965 2018
[33]

Laurens van der Maaten and Geoffrey Hinton. 2008. http://jmlr.org/papers/v9/vandermaaten08a.html Visualizing data using t-sne . Journal of Machine Learning Research, 9(86):2579--2605

work page 2008
[34]

Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer

Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer. 2023. https://doi.org/10.1145/3581754.3584136 Supporting qualitative analysis with large language models: Combining codebook with gpt-3 for deductive coding . In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI '23 Companion, pag...

work page doi:10.1145/3581754.3584136 2023
[35]

Himanshu Zade, Margaret Drouhard, Bonnie Chinh, Lu Gan, and Cecilia Aragon. 2018. https://doi.org/10.1145/3173574.3173733 Conceptualizing disagreement in qualitative coding . In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI '18, page 1–11, New York, NY, USA. Association for Computing Machinery

work page doi:10.1145/3173574.3173733 2018
[36]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page
[37]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[38]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page
[39]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[1] [1]

Shai Ben-David and Margareta Ackerman. 2008. https://proceedings.neurips.cc/paper_files/paper/2008/file/beed13602b9b0e6ecb5b568ff5058f07-Paper.pdf Measures of clustering quality: A working set of axioms for clustering . In Advances in Neural Information Processing Systems, volume 21. Curran Associates, Inc

work page 2008

[2] [2]

Henry E. Brady. 2019. https://doi.org/10.1146/annurev-polisci-090216-023229 The challenge of big data and data science . Annual Review of Political Science, 22(1):297--323

work page doi:10.1146/annurev-polisci-090216-023229 2019

[3] [3]

Virginia Braun and Victoria Clarke. 2006. https://doi.org/10.1191/1478088706qp063oa Using thematic analysis in psychology . Qualitative Research in Psychology, 3:77--101

work page doi:10.1191/1478088706qp063oa 2006

[4] [4]

Nan-Chen Chen, Margaret Drouhard, Rafal Kocielnik, Jina Suh, and Cecilia R. Aragon. 2018. https://doi.org/10.1145/3185515 Using machine learning to support qualitative coding in social science: Shifting the focus to ambiguity . ACM Trans. Interact. Intell. Syst., 8(2)

work page doi:10.1145/3185515 2018

[5] [5]

Robert Chew, John Bollenbacher, Michael Wenger, Jessica Speer, and Annice Kim. 2023. https://arxiv.org/abs/2306.14924 Llm-assisted content analysis: Using large language models to support deductive coding . Preprint, arXiv:2306.14924

work page arXiv 2023

[6] [6]

Reddy, and Haesun Park

Jaegul Choo, Changhyun Lee, Chandan K. Reddy, and Haesun Park. 2013. https://doi.org/10.1109/TVCG.2013.212 Utopian: User-driven topic modeling based on interactive nonnegative matrix factorization . IEEE Transactions on Visualization and Computer Graphics, 19(12):1992--2001

work page doi:10.1109/tvcg.2013.212 2013

[7] [7]

McFarland

Jason Chuang and Daniel A. McFarland. 2013. https://api.semanticscholar.org/CorpusID:43940920 Document exploration with topic modeling : Designing interactive visualizations to support effective analysis workflows

work page 2013

[8] [9]

Shih-Chieh Dai, Aiping Xiong, and Lun-Wei Ku. 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.669 LLM -in-the-loop: Leveraging large language model for thematic analysis . In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9993--10001, Singapore. Association for Computational Linguistics

work page doi:10.18653/v1/2023.findings-emnlp.669 2023

[9] [10]

Margaret Drouhard, Nan-Chen Chen, Jina Suh, Rafal Kocielnik, Vanessa Peña-Araya, Keting Cen, Xiangyi Zheng, and Cecilia R. Aragon. 2017. https://doi.org/10.1109/PACIFICVIS.2017.8031598 Aeonium: Visual analytics to support collaborative qualitative coding . In 2017 IEEE Pacific Visualization Symposium (PacificVis), pages 220--229

work page doi:10.1109/pacificvis.2017.8031598 2017

[10] [11]

Zheng Fang, Lama Alqazlan, Du Liu, Yulan He, and Rob Procter. 2023. https://doi.org/10.18653/v1/2023.eacl-main.37 A user-centered, interactive, human-in-the-loop topic modelling system . In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 505--522, Dubrovnik, Croatia. Association for Comput...

work page doi:10.18653/v1/2023.eacl-main.37 2023

[11] [12]

Zheng Fang, Yulan He, and Rob Procter. 2021. https://doi.org/10.18653/v1/2021.findings-acl.154 A query-driven topic model . In Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pages 1764--1777, Online. Association for Computational Linguistics

work page doi:10.18653/v1/2021.findings-acl.154 2021

[12] [13]

Feuston and Jed R

Jessica L. Feuston and Jed R. Brubaker. 2021. https://doi.org/10.1145/3479856 Putting tools in their place: The role of time and perspective in human-ai collaboration for qualitative analysis . Proc. ACM Hum.-Comput. Interact., 5(CSCW2)

work page doi:10.1145/3479856 2021

[13] [14]

Uwe Flick. 2014. https://doi.org/10.4135/9781446282243 The sage handbook of qualitative data analysis

work page doi:10.4135/9781446282243 2014

[14] [15]

Jie Gao, Kenny Tsu Wei Choo, Junming Cao, Roy Ka-Wei Lee, and Simon Perrault. 2023. https://doi.org/10.1145/3617362 Coaicoder: Examining the effectiveness of ai-assisted human-to-human collaboration in qualitative analysis . ACM Trans. Comput.-Hum. Interact., 31(1)

work page doi:10.1145/3617362 2023

[15] [16]

Jie Gao, Yuchen Guo, Gionnieve Lim, Tianqin Zhang, Zheng Zhang, Toby Jia-Jun Li, and Simon Tangi Perrault. 2024. https://arxiv.org/abs/2304.07366 Collabcoder: A lower-barrier, rigorous workflow for inductive collaborative qualitative analysis with large language models . Preprint, arXiv:2304.07366

work page arXiv 2024

[16] [17]

Barney G Glaser, Anselm L Strauss, and Elizabeth Strutzel. 1968. The discovery of grounded theory; strategies for qualitative research. Nursing research, 17(4):364

work page 1968

[17] [18]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, and 542 others. 2024. https://arxiv.org/abs/2407.21783 The llama 3...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [19]

Kilem Li Gwet. 2008. Computing inter-rater reliability and its variance in the presence of high agreement. British Journal of Mathematical and Statistical Psychology, 61(1):29--48

work page 2008

[19] [20]

Smaldino, Wouter van Atteveldt, Annie Waldherr, Jingwen Zhang, and Jonathan J

Martin Hilbert, George Barnett, Joshua Blumenstock, Noshir Contractor, Jana Diesner, Seth Frey, Sandra González-Bailón, PJ Lamberson, Jennifer Pan, Tai-Quan Peng, Cuihua (Cindy) Shen, Paul E. Smaldino, Wouter van Atteveldt, Annie Waldherr, Jingwen Zhang, and Jonathan J. H. Zhu. 2019. https://ijoc.org/index.php/ijoc/article/view/10675 Computational communi...

work page 2019

[20] [21]

Enamul Hoque and Giuseppe Carenini. 2016. https://doi.org/10.1145/2854158 Interactive topic modeling for exploring asynchronous online conversations: Design and evaluation of convisit . ACM Trans. Interact. Intell. Syst., 6(1)

work page doi:10.1145/2854158 2016

[21] [22]

Alexander Hoyle, Pranav Goel, Denis Peskov, Andrew Hian-Cheong, Jordan Boyd-Graber, and Philip Resnik. 2021. Is automated topic model evaluation broken? the incoherence of coherence. In Proceedings of the 35th International Conference on Neural Information Processing Systems, NIPS '21, Red Hook, NY, USA. Curran Associates Inc

work page 2021

[22] [23]

Brubaker

Jialun Aaron Jiang, Kandrea Wade, Casey Fiesler, and Jed R. Brubaker. 2021. https://doi.org/10.1145/3449168 Supporting serendipity: Opportunities and challenges for human-ai collaboration in qualitative analysis . Proc. ACM Hum.-Comput. Interact., 5(CSCW1)

work page doi:10.1145/3449168 2021

[23] [24]

Xin Jin and Jiawei Han. 2010. https://doi.org/10.1007/978-0-387-30164-8_425 K-Means Clustering , pages 563--564. Springer US, Boston, MA

work page doi:10.1007/978-0-387-30164-8_425 2010

[24] [25]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2024. Large language models are zero-shot reasoners. In Proceedings of the 36th International Conference on Neural Information Processing Systems, NIPS '22, Red Hook, NY, USA. Curran Associates Inc

work page 2024

[25] [26]

Julia Mendelsohn, Ceren Budak, and David Jurgens. 2021. https://doi.org/10.18653/v1/2021.naacl-main.179 Modeling framing in immigration discourse on social media . In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2219--2263, Online. Association for Comp...

work page doi:10.18653/v1/2021.naacl-main.179 2021

[26] [27]

Maria Leonor Pacheco, Tunazzina Islam, Monal Mahajan, Andrey Shor, Ming Yin, Lyle Ungar, and Dan Goldwasser. 2022. https://doi.org/10.18653/v1/2022.naacl-main.427 A holistic framework for analyzing the COVID -19 vaccine debate . In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Lang...

work page doi:10.18653/v1/2022.naacl-main.427 2022

[27] [28]

Maria Leonor Pacheco, Tunazzina Islam, Lyle Ungar, Ming Yin, and Dan Goldwasser. 2023. https://doi.org/10.18653/v1/2023.findings-acl.313 Interactive concept learning for uncovering latent themes in large text collections . In Findings of the Association for Computational Linguistics: ACL 2023, pages 5059--5080, Toronto, Canada. Association for Computation...

work page doi:10.18653/v1/2023.findings-acl.313 2023

[28] [29]

Nils Reimers and Iryna Gurevych. 2019. https://doi.org/10.18653/v1/D19-1410 Sentence- BERT : Sentence embeddings using S iamese BERT -networks . In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3982--3992, Hong Kong, Chi...

work page doi:10.18653/v1/d19-1410 2019

[29] [30]

Tim Rietz, Peyman Toreini, and Alexander Maedche. 2020. https://doi.org/10.1145/3379350.3416195 Cody: An interactive machine learning system for qualitative coding . In Adjunct Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, UIST '20 Adjunct, page 90–92, New York, NY, USA. Association for Computing Machinery

work page doi:10.1145/3379350.3416195 2020

[30] [31]

Shamik Roy, Maria Leonor Pacheco, and Dan Goldwasser. 2021. https://doi.org/10.18653/v1/2021.emnlp-main.783 Identifying morality frames in political tweets using relational learning . In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9939--9958, Online and Punta Cana, Dominican Republic. Association for Compu...

work page doi:10.18653/v1/2021.emnlp-main.783 2021

[31] [32]

Alison Smith, Varun Kumar, Jordan Boyd-Graber, Kevin Seppi, and Leah Findlater. 2018. https://doi.org/10.1145/3172944.3172965 Closing the loop: User-centered design and evaluation of a human-in-the-loop topic modeling system . In Proceedings of the 23rd International Conference on Intelligent User Interfaces, IUI '18, page 293–304, New York, NY, USA. Asso...

work page doi:10.1145/3172944.3172965 2018

[32] [33]

Laurens van der Maaten and Geoffrey Hinton. 2008. http://jmlr.org/papers/v9/vandermaaten08a.html Visualizing data using t-sne . Journal of Machine Learning Research, 9(86):2579--2605

work page 2008

[33] [34]

Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer

Ziang Xiao, Xingdi Yuan, Q. Vera Liao, Rania Abdelghani, and Pierre-Yves Oudeyer. 2023. https://doi.org/10.1145/3581754.3584136 Supporting qualitative analysis with large language models: Combining codebook with gpt-3 for deductive coding . In Companion Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI '23 Companion, pag...

work page doi:10.1145/3581754.3584136 2023

[34] [35]

Himanshu Zade, Margaret Drouhard, Bonnie Chinh, Lu Gan, and Cecilia Aragon. 2018. https://doi.org/10.1145/3173574.3173733 Conceptualizing disagreement in qualitative coding . In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI '18, page 1–11, New York, NY, USA. Association for Computing Machinery

work page doi:10.1145/3173574.3173733 2018

[35] [36]

online" 'onlinestring :=

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint eprinttype howpublished institution journal key month note number organization pages publisher school series title type volume year doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRING...

work page

[36] [37]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[37] [38]

URL: " 'urlintro :=

ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year eprint doi pubmed url lastchecked label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block STRINGS urlintro eprinturl eprintpr...

work page

[38] [39]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page