Revisiting Framing Codebooks with AI: Employing Large Language Models as Analytical Collaborators in Deductive Content Analysis
Pith reviewed 2026-05-10 02:30 UTC · model grok-4.3
The pith
Large language models can act as analytic collaborators to externalize decision rules and refine framing codebooks through iterative researcher dialogues.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLMs can augment the creation and refinement of framing codebooks by combining theoretical frameworks with data-driven exploration, where the models serve as analytic collaborators that externalize decision rules, surface latent dimensions, and support iterative revisions through dialogues between researchers and their data, as shown in an application to Latin American news coverage that generated frame distinctions and adapted frameworks to new contexts.
What carries the argument
The LLM-assisted iterative workflow for codebook refinement, in which models participate in researcher dialogues to externalize rules and identify latent patterns while researchers retain interpretive authority.
If this is right
- Codebooks can adapt more readily to evolving news corpora and cross-cultural differences without requiring complete redevelopment from theory.
- Latent framing dimensions that theory alone overlooks become visible through data-driven dialogue with the models.
- Researchers maintain full control over final codebook decisions while gaining structured support for exploring ambiguities.
- The workflow extends to other deductive content analysis tasks where initial theoretical criteria require data-grounded sharpening.
Where Pith is reading between the lines
- The method could enable more responsive codebooks that update continuously as new news data arrives.
- It suggests a broader pattern for integrating AI tools into qualitative research while keeping human oversight central.
- Validation experiments comparing LLM-assisted codebooks against purely manual ones on held-out datasets would test the approach's reliability across domains.
- The technique may reduce initial development time for codebooks but shifts effort toward ongoing dialogue and bias-checking steps.
Load-bearing premise
Large language models can reliably externalize decision rules, surface latent patterns, and support valid iterative revisions without introducing systematic biases, hallucinations, or interpretive distortions that undermine the codebooks' theoretical integrity.
What would settle it
A direct comparison on the same news corpus where frames identified by the LLM-refined codebook systematically differ from those identified by an independently developed traditional codebook in ways that match documented model biases or omissions.
Figures
read the original abstract
Codebooks are central to framing research, providing theoretically grounded criteria for analyzing news content. While traditionally codebooks are built from theoretical frameworks and researchers' knowledge, applying these codebooks to large news corpora often exposes ambiguities, borderline cases, and underspecified rules that are difficult to resolve through theory alone. Moreover, news corpora evolve over time and differ across cultures, necessitating that researchers revisit the theoretical frameworks underlying these codebooks. In this article, we propose a workflow that uses Large Language Models (LLMs) to augment the creation and refinement of framing codebooks by combining theoretical frameworks with data-driven exploration. Rather than treating LLMs as automated classifiers, this approach positions them as analytic collaborators that help externalize decision rules, surface latent dimensions, and support iterative revisions of codebooks through dialogues between researchers and their data. We illustrate this workflow using a dataset of Latin American news coverage, demonstrating how the application of LLMs' capabilities has led to the surfacing of latent patterns, the generation of frame distinctions, and the adaptation of frameworks to new contexts. This method provides an LLM-assisted strategy that supports methodology creativity while preserving researchers' interpretative authority.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to propose a workflow that uses large language models (LLMs) as analytic collaborators rather than automated classifiers to augment the creation and refinement of framing codebooks in deductive content analysis. The workflow combines theoretical frameworks with data-driven exploration through iterative researcher-LLM dialogues that externalize decision rules, surface latent dimensions, and support codebook revisions. It is illustrated with a Latin American news corpus, where the authors report that LLM use surfaced latent patterns, generated frame distinctions, and adapted frameworks to new contexts while preserving researchers' interpretive authority.
Significance. If empirically validated, the proposed workflow would offer a meaningful advance for framing research by providing a structured, human-centered method to resolve ambiguities in codebooks when applied to large, evolving, or cross-cultural news corpora. It correctly positions LLMs as dialogue partners to enhance methodological creativity without ceding control, addressing a practical gap where theory alone proves insufficient. The emphasis on researcher authority is a clear strength. However, the current manuscript presents only a conceptual sketch and qualitative illustration, so its significance is prospective rather than demonstrated.
major comments (1)
- [Illustration section] Illustration section (Latin American news application): The demonstration reports qualitative outcomes such as surfaced patterns and adapted frameworks but supplies no evaluation metrics, validation against human coders, inter-rater reliability comparisons, error analysis, or discussion of LLM limitations (e.g., hallucinations or biases). This is load-bearing for the central claim that the workflow reliably supports valid iterative revisions without introducing distortions.
minor comments (1)
- [Abstract] Abstract: The phrasing 'the application of LLMs' capabilities has led to...' is vague; specifying the exact dialogue prompts or LLM roles used in the illustration would improve clarity and set appropriate expectations.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful review. We agree that the illustration section would benefit from greater attention to evaluation and limitations, and we have revised the manuscript to incorporate a more explicit discussion of these issues while maintaining the paper's focus as a methodological proposal rather than an empirical validation study.
read point-by-point responses
-
Referee: [Illustration section] Illustration section (Latin American news application): The demonstration reports qualitative outcomes such as surfaced patterns and adapted frameworks but supplies no evaluation metrics, validation against human coders, inter-rater reliability comparisons, error analysis, or discussion of LLM limitations (e.g., hallucinations or biases). This is load-bearing for the central claim that the workflow reliably supports valid iterative revisions without introducing distortions.
Authors: We accept this critique. The illustration is qualitative and designed to show how the workflow operates in practice rather than to empirically prove its reliability across cases. In the revised manuscript we have added a dedicated subsection within the illustration that explicitly discusses LLM limitations (hallucinations, biases, and context drift) and the safeguards the workflow employs through ongoing researcher oversight and iterative prompting. We have also inserted a new Limitations section that outlines the absence of quantitative metrics in the current work and proposes concrete directions for future validation, including comparisons with human coders, inter-rater reliability checks, and error analysis. These additions directly address the concern without converting the paper into an empirical validation study, which would exceed its stated scope as a conceptual workflow contribution. revision: partial
Circularity Check
No significant circularity: methodological workflow proposal
full rationale
The paper advances a conceptual workflow for LLM-assisted refinement of framing codebooks in deductive content analysis, illustrated on one news corpus. It contains no equations, parameter fitting, statistical predictions, or first-principles derivations. The central claim is a design sketch that explicitly retains final interpretive authority with human researchers and does not claim automation or error-free output. No load-bearing premise reduces to a self-citation chain, fitted input renamed as prediction, or ansatz smuggled via prior work by the same authors. The argument is self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models can effectively externalize decision rules, surface latent dimensions, and support iterative codebook revisions through researcher-model dialogues without compromising validity.
Reference graph
Works this paper leans on
-
[1]
Machine Learning for Social Science: An Agnostic Approach
Grimmer, Justin and Roberts, Margaret E and Stewart, Brandon M. Machine Learning for Social Science: An Agnostic Approach. Annu. Rev. Polit. Sci. (Palo Alto)
-
[2]
Than, Nga and Fan, Leanne and Law, Tina and Nelson, Laura K and McCall, Leslie. Updating “the future of coding”: Qualitative coding with generative large language models. Sociol. Methods Res
-
[3]
MetricMate : An interactive tool for generating evaluation criteria for LLM -as-a-judge workflow
Gebreegziabher, Simret Araya and Chiang, Charles and Wang, Zichu and Ashktorab, Zahra and Brachman, Michelle and Geyer, Werner and Li, Toby Jia-Jun and Gómez-Zará, Diego. MetricMate : An interactive tool for generating evaluation criteria for LLM -as-a-judge workflow. Proceedings of the 4th Annual Symposium on Human-Computer Interaction for Work
-
[4]
Science in the age of large language models
Birhane, Abeba and Kasirzadeh, Atoosa and Leslie, David and Wachter, Sandra. Science in the age of large language models. Nat. Rev. Phys
-
[5]
Automating democracy: Generative AI , journalism, and the future of democracy
Arguedas, Amy Ross and Simon, Felix M. Automating democracy: Generative AI , journalism, and the future of democracy
-
[6]
Baek, Jinheon and Jauhar, Sujay Kumar and Cucerzan, Silviu and Hwang, Sung Ju. ResearchAgent : Iterative research idea generation over scientific literature with Large Language Models. arXiv [cs.CL]
-
[7]
Dennstädt, Fabio and Zink, Johannes and Putora, Paul Martin and Hastings, Janna and Cihoric, Nikola. Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain. Syst. Rev
- [8]
-
[9]
From manual to machine: assessing the efficacy of large language models in content analysis
Pilny, Andrew and McAninch, Kelly and Slone, Amanda and Moore, Kelsey. From manual to machine: assessing the efficacy of large language models in content analysis. Commun. Res. Rep
-
[10]
Leveraging large language models for literature review tasks - A case study using ChatGPT
Zimmermann, Robert and Staab, Marina and Nasseri, Mehran and Brandtner, Patrick. Leveraging large language models for literature review tasks - A case study using ChatGPT. Communications in Computer and Information Science
-
[11]
(what) can journalism studies learn from supervised machine learning?
De Grove, Frederik and Boghe, Kristof and De Marez, Lieven. (what) can journalism studies learn from supervised machine learning?. Journal. Stud
-
[12]
Artificial Intelligence and Journalism
Broussard, M and Diakopoulos, N and others. Artificial Intelligence and Journalism. Journal. Mass Commun. Q
-
[13]
FACT - GPT : Fact-Checking Augmentation via Claim Matching with LLMs
Choi, Eun Cheol and Ferrara, Emilio. FACT - GPT : Fact-Checking Augmentation via Claim Matching with LLMs. arXiv [cs.CL]
-
[14]
Hypothesis generation with large language models
Zhou, Yangqiaoyu and Liu, Haokun and Srivastava, Tejes and Mei, Hongyuan and Tan, Chenhao. Hypothesis generation with large language models. Proceedings of the 1st Workshop on NLP for Science (NLP4Science)
-
[15]
Scharkow, Michael. Thematic content analysis using supervised machine learning: An empirical evaluation using German online news. Qual. Quant
-
[16]
Guo, Lei and Vargo, Chris J and Pan, Zixuan and Ding, Weicong and Ishwar, Prakash. Big social data analytics in journalism and mass communication: Comparing dictionary-based text analysis and unsupervised topic modeling. Journal. Mass Commun. Q
-
[17]
García-Marín, Javier and Calatrava, Adolfo. The Use of Supervised Learning Algorithms in Political Communication and Media Studies: Locating Frames in the Press. Communication & Society; Pamplona
-
[18]
ChatGPT for good? On opportunities and challenges of large language models for education
Kasneci, Enkelejda and Sessler, Kathrin and Küchemann, Stefan and Bannert, Maria and Dementieva, Daryna and Fischer, Frank and Gasser, Urs and Groh, Georg and Günnemann, Stephan and Hüllermeier, Eyke and Krusche, Stephan and Kutyniok, Gitta and Michaeli, Tilman and Nerdel, Claudia and Pfeffer, Jürgen and Poquet, Oleksandra and Sailer, Michael and Schmidt,...
-
[19]
Boumans, Jelle W and Trilling, Damian. Taking Stock of the Toolkit. Digital Journalism
-
[20]
A Survey on Evaluation of Large Language Models
Chang, Yupeng and Wang, Xu and Wang, Jindong and Wu, Yuan and Yang, Linyi and Zhu, Kaijie and Chen, Hao and Yi, Xiaoyuan and Wang, Cunxiang and Wang, Yidong and Ye, Wei and Zhang, Yue and Chang, Yi and Yu, Philip S and Yang, Qiang and Xie, Xing. A Survey on Evaluation of Large Language Models. ACM Trans. Intell. Syst. Technol
-
[21]
Language models are few-shot learners
Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Others. Language models are few-shot learners. Adv. Neural Inf. Process. Syst
-
[22]
Kroon, Anne C and van der Meer, Toni and Vliegenthart, Rens. Beyond counting words. Computational Communication Research
-
[23]
ChatGPT outperforms crowd workers for text-annotation tasks
Gilardi, Fabrizio and Alizadeh, Meysam and Kubli, Maël. ChatGPT outperforms crowd workers for text-annotation tasks. Proc. Natl. Acad. Sci. U. S. A
-
[24]
Walter, Dror and Ophir, Yotam. Strategy Framing in News Coverage and Electoral Success: An Analysis of Topic Model Networks Approach. Polit. Commun
-
[25]
Nishal, Sachita and Sinchai, Jasmine and Diakopoulos, Nicholas. Understanding Practices around Computational News Discovery Tools in the Domain of Science Journalism. arXiv [cs.HC]
-
[26]
Sarısakaloğlu, Aynur. Navigating the research landscape of algorithm-driven journalism: A systematic literature review of authorship, research trends, and future research pathways. Journal. Stud
-
[27]
Escape Me If You Can: How AI Reshapes News Organisations’ Dependency on Platform Companies
Simon, Felix M. Escape Me If You Can: How AI Reshapes News Organisations’ Dependency on Platform Companies. Digital Journalism
-
[28]
Computational social science: Obstacles and opportunities
Lazer, David and Pentland, Alex and Watts, Duncan J and Aral, Sinan and Athey, Susan and Contractor, Noshir and Freelon, Deen and Gonzalez-Bailon, Sandra and King, Gary and Margetts, Helen and Nelson, Alondra and Salganik, Matthew J and Strohmaier, Markus and Vespignani, Alessandro and Wagner, Claudia. Computational social science: Obstacles and opportuni...
-
[29]
van Atteveldt, Wouter and van der Velden, Mariken A C G and Boukes, Mark. The validity of sentiment analysis: Comparing manual annotation, crowd-coding, dictionary approaches, and machine learning algorithms. Commun. Methods Meas
-
[30]
Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey
Min, Bonan and Ross, Hayley and Sulem, Elior and Veyseh, Amir Pouran Ben and Nguyen, Thien Huu and Sainz, Oscar and Agirre, Eneko and Heintz, Ilana and Roth, Dan. Recent Advances in Natural Language Processing via Large Pre-trained Language Models: A Survey. ACM Comput. Surv
-
[31]
Deep reinforcement learning from human preferences
Christiano, P and Leike, J and Brown, Tom B and Martic, Miljan and Legg, S and Amodei, Dario. Deep reinforcement learning from human preferences. Adv. Neural Inf. Process. Syst
-
[32]
Coding latent concepts: a human and LLM -coordinated content analysis procedure
Fan, Jia and Ai, Yushi and Liu, Xiaofan and Deng, Yilin and Li, Yongning. Coding latent concepts: a human and LLM -coordinated content analysis procedure. Commun. Res. Rep
-
[33]
The media framing dataset: Analyzing news narratives in Mexico and Colombia
Cuadrado, Juan and Martinez, Elizabeth and Martinez-Santos, Juan Carlos and Puertas, Edwin. The media framing dataset: Analyzing news narratives in Mexico and Colombia. Data Brief
- [34]
-
[35]
LLM -assisted content analysis: Using large language models to support deductive coding
Chew, Robert and Bollenbacher, John and Wenger, Michael and Speer, Jessica and Kim, Annice. LLM -assisted content analysis: Using large language models to support deductive coding. arXiv [cs.CL]
-
[36]
AngleKindling : Supporting Journalistic Angle Ideation with Large Language Models
Petridis, Savvas and Diakopoulos, Nicholas and Crowston, Kevin and Hansen, Mark and Henderson, Keren and Jastrzebski, Stan and Nickerson, Jeffrey V and Chilton, Lydia B. AngleKindling : Supporting Journalistic Angle Ideation with Large Language Models. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
work page 2023
-
[37]
PaTAT : Human- AI collaborative qualitative coding with explainable interactive rule synthesis
Gebreegziabher, Simret Araya and Zhang, Zheng and Tang, Xiaohang and Meng, Yihao and Glassman, Elena L and Li, Toby Jia-Jun. PaTAT : Human- AI collaborative qualitative coding with explainable interactive rule synthesis. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
work page 2023
-
[38]
Agrawal, Ajay and Gans, Joshua S and Goldfarb, Avi. Do we want less automation?. Science
-
[39]
Bias and unfairness in information retrieval systems: New challenges in the LLM era
Dai, Sunhao and Xu, Chen and Xu, Shicheng and Pang, Liang and Dong, Zhenhua and Xu, Jun. Bias and unfairness in information retrieval systems: New challenges in the LLM era. Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
-
[40]
Gao, Jie and Guo, Yuchen and Lim, Gionnieve and Zhang, Tianqin and Zhang, Zheng and Li, Toby Jia-Jun and Perrault, Simon Tangi. CollabCoder : A lower-barrier, rigorous workflow for inductive collaborative qualitative analysis with large language models. Proceedings of the CHI Conference on Human Factors in Computing Systems
-
[41]
Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?
Burscher, Bjorn and Vliegenthart, Rens and De Vreese, Claes H. Using Supervised Machine Learning to Code Policy Issues: Can Classifiers Generalize across Contexts?. Ann. Am. Acad. Pol. Soc. Sci
- [42]
-
[43]
Burscher, Björn and Odijk, Daan and Vliegenthart, Rens and de Rijke, Maarten and de Vreese, Claes H. Teaching the Computer to Code Frames in News: Comparing Two Supervised Machine Learning Approaches to Frame Analysis. Commun. Methods Meas
-
[44]
News Frame Analysis: An Inductive Mixed-method Computational Approach
Walter, Dror and Ophir, Yotam. News Frame Analysis: An Inductive Mixed-method Computational Approach. Commun. Methods Meas
-
[45]
Framing European politics: A content analysis of press and television news
Semetko, Holli A and Valkenburg, Patti M Valkenburg. Framing European politics: A content analysis of press and television news. J. Commun
-
[46]
Pre-Trained Language Models and Their Applications
Wang, Haifeng and Li, Jiwei and Wu, Hua and Hovy, Eduard and Sun, Yu. Pre-Trained Language Models and Their Applications. Proc. Est. Acad. Sci. Eng
-
[47]
Sparks: Inspiration for Science Writing using Language Models
Gero, Katy Ilonka and Liu, Vivian and Chilton, Lydia. Sparks: Inspiration for Science Writing using Language Models. Designing Interactive Systems Conference
-
[48]
Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda
Baden, Christian and Pipal, Christian and Schoonvelde, Martijn and van der Velden, Mariken A C G. Three Gaps in Computational Text Analysis Methods for Social Sciences: A Research Agenda. Commun. Methods Meas
-
[49]
Computational social science and sociology
Edelmann, Achim and Wolff, Tom and Montagne, Danielle and Bail, Christopher A. Computational social science and sociology. Annu. Rev. Sociol
-
[50]
The future landscape of large language models in medicine
Clusmann, Jan and Kolbinger, Fiona R and Muti, Hannah Sophie and Carrero, Zunamys I and Eckardt, Jan-Niklas and Laleh, Narmin Ghaffari and Löffler, Chiara Maria Lavinia and Schwarzkopf, Sophie-Caroline and Unger, Michaela and Veldhuizen, Gregory P and Wagner, Sophia J and Kather, Jakob Nikolas. The future landscape of large language models in medicine. Co...
-
[51]
Guilty by association: Using word embeddings to measure ethnic stereotypes in news coverage
Kroon, Anne C and Trilling, Damian and Raats, Tamara. Guilty by association: Using word embeddings to measure ethnic stereotypes in news coverage. Journal. Mass Commun. Q
-
[52]
Proposing an Open-Sourced Tool for Computational Framing Analysis of Multilingual Data
Guo, Lei and Su, Chao and Paik, Sejin and Bhatia, Vibhu and Akavoor, Vidya Prasad and Gao, Ge and Betke, Margrit and Wijaya, Derry. Proposing an Open-Sourced Tool for Computational Framing Analysis of Multilingual Data. Digital Journalism
-
[53]
Envisioning the Applications and Implications of Generative AI for News Media
Nishal, Sachita and Diakopoulos, Nicholas. Envisioning the Applications and Implications of Generative AI for News Media. arXiv [cs.CY]
-
[54]
Why we support and encourage the use of large language models in NEJM AI submissions
Koller, Daphne and Beam, Andrew and Manrai, Arjun and Ashley, Euan and Liu, Xiaoxuan and Gichoya, Judy and Holmes, Chris and Zou, James and Dagan, Noa and Wong, Tien Y and Blumenthal, David and Kohane, Isaac. Why we support and encourage the use of large language models in NEJM AI submissions. NEJM AI
-
[55]
[Data set for replication] Valenzuela, S ., Piña, M ., & Ramírez, J
Valenzuela, Sebastián. [Data set for replication] Valenzuela, S ., Piña, M ., & Ramírez, J . (2017). Behavioral effects of framing on social media users: How conflict, economic, human interest, and morality frames drive news sharing. Journal of Communication, 67(5), 803-826
work page 2017
-
[56]
Xiao, Ziang and Yuan, Xingdi and Liao, Q Vera and Abdelghani, Rania and Oudeyer, Pierre-Yves. Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT -3 for Deductive Coding. Companion Proceedings of the 28th International Conference on Intelligent User Interfaces
-
[57]
Hayes, Adam S. “conversing” with qualitative data: Enhancing qualitative research through large Language Models ( LLMs ). Int. J. Qual. Methods
-
[58]
A Computational Inflection for Scientific Discovery
Hope, Tom and Downey, Doug and Weld, Daniel S and Etzioni, Oren and Horvitz, Eric and Shein, Esther. A Computational Inflection for Scientific Discovery. Communications of the ACM
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.