Recognition: no theorem link
Politics of Questions in News: A Mixed-Methods Study of Interrogative Stances as Markers of Voice and Power
Pith reviewed 2026-05-15 00:49 UTC · model grok-4.3
The pith
Interrogative discourse in news foregrounds prominent actors and places over broad publics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study shows that interrogative contexts are densely populated with named individuals, organizations, and places, whereas publics and broad social groups are mentioned much less frequently, indicating that interrogative discourse tends to foreground already prominent actors and places and thus exhibits strong personalization.
What carries the argument
The operationalization of interrogative stance, textual uptake, and voice at corpus scale, combining automatic detection with qualitative annotation grounded in semantic and pragmatic theories.
Load-bearing premise
The automatic detection of interrogative stances and their functional types is accurate enough to support claims about patterns across the corpus, and the qualitatively annotated subcorpus is representative of the larger dataset.
What would settle it
Manual review of a representative sample showing that interrogative contexts mention broad social groups as frequently as named individuals would falsify the personalization claim.
Figures
read the original abstract
Interrogatives in news discourse have been examined in linguistics and conversation analysis, but mostly in broadcast interviews and relatively small, often English-language corpora, while large-scale computational studies of news rarely distinguish interrogatives from declaratives or differentiate their functions. This paper brings these strands together through a mixed-methods study of the "Politics of Questions" in contemporary French-language digital news. Using over one million articles published between January 2023 and June 2024, we automatically detect interrogative stances, approximate their functional types, and locate textual answers when present, linking these quantitative measures to a qualitatively annotated subcorpus grounded in semantic and pragmatic theories of questions. Interrogatives are sparse but systematically patterned: they mainly introduce or organize issues, with most remaining cases being information-seeking or echo-like, while explicitly leading or tag questions are rare. Although their density and mix vary across outlets and topics, our heuristic suggests that questions are overwhelmingly taken up within the same article and usually linked to a subsequent answer-like span, most often in the journalist's narrative voice and less often through quoted speech. Interrogative contexts are densely populated with named individuals, organizations, and places, whereas publics and broad social groups are mentioned much less frequently, suggesting that interrogative discourse tends to foreground already prominent actors and places and thus exhibits strong personalization. We show how interrogative stance, textual uptake, and voice can be operationalized at corpus scale, and argue that combining computational methods with pragmatic and sociological perspectives can help account for how questioning practices structure contemporary news discourse.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a mixed-methods study of interrogative stances in over one million French-language news articles (Jan 2023–Jun 2024). It automatically detects interrogatives, approximates functional types (e.g., issue-introducing, information-seeking, echo), locates textual answers, and grounds findings in a qualitatively annotated subcorpus drawing on semantic/pragmatic theories. Key results: interrogatives are sparse but patterned, mostly taken up within-article in journalistic voice; interrogative contexts show high density of named individuals/organizations/places and low density of publics/broad groups, interpreted as evidence of strong personalization.
Significance. If the detection pipeline is shown to be reliable, the work provides a scalable operationalization of interrogative stance, uptake, and voice that links computational corpus methods to pragmatic and sociological accounts of news discourse. It offers falsifiable, corpus-scale evidence on how questioning practices foreground prominent actors, which could inform future studies of media power and voice.
major comments (2)
- [Methods] Methods section: no precision, recall, confusion matrix, or inter-annotator agreement is reported for the automatic interrogative-stance detector or the entity-typing step. Because the central personalization claim rests on density contrasts between named entities and publics across >1M automatically labeled contexts, absence of these metrics leaves open the possibility that detection biases (e.g., preferential flagging around proper nouns) artifactually produce the reported pattern.
- [Qualitative Analysis] Qualitative subcorpus description: size, sampling frame, and agreement between automatic labels and manual annotations are unspecified. Without these details it is impossible to assess whether the subcorpus reliably grounds the quantitative functional-type and uptake claims.
minor comments (2)
- [Abstract] Abstract: the phrase 'our heuristic suggests' is used without a one-sentence gloss of the heuristic; adding this would improve immediate clarity for readers.
- [Results] Results: statements about variation across outlets and topics would be strengthened by at least one concrete numerical example or statistical test rather than qualitative description alone.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive review. The comments highlight important gaps in the reporting of evaluation metrics and subcorpus details, which we agree need to be addressed to strengthen the manuscript. We outline our responses below and will incorporate the necessary revisions.
read point-by-point responses
-
Referee: [Methods] Methods section: no precision, recall, confusion matrix, or inter-annotator agreement is reported for the automatic interrogative-stance detector or the entity-typing step. Because the central personalization claim rests on density contrasts between named entities and publics across >1M automatically labeled contexts, absence of these metrics leaves open the possibility that detection biases (e.g., preferential flagging around proper nouns) artifactually produce the reported pattern.
Authors: We agree that the current manuscript lacks these critical evaluation details, which is a genuine limitation for assessing the reliability of the detection pipeline and the robustness of the personalization findings. In the revised version, we will add a dedicated evaluation subsection that reports precision, recall, and F1 scores for the interrogative-stance detector on a manually annotated test set, along with a confusion matrix. For the entity-typing component, we will include inter-annotator agreement metrics (e.g., Cohen's kappa) from our validation process. These additions will directly address potential biases and allow readers to evaluate the density contrasts more confidently. revision: yes
-
Referee: [Qualitative Analysis] Qualitative subcorpus description: size, sampling frame, and agreement between automatic labels and manual annotations are unspecified. Without these details it is impossible to assess whether the subcorpus reliably grounds the quantitative functional-type and uptake claims.
Authors: We acknowledge that the manuscript does not provide sufficient details on the qualitative subcorpus, which limits the ability to evaluate how well it supports the functional-type and uptake analyses. In the revision, we will expand this section to specify the exact size of the subcorpus, the sampling frame (including how articles were selected from the full corpus), and agreement statistics between the automatic labels and manual annotations (such as percentage agreement and Cohen's kappa). This will provide the necessary transparency and strengthen the mixed-methods grounding. revision: yes
Circularity Check
No significant circularity; empirical patterns derived from independent data processing
full rationale
The paper is a mixed-methods empirical study that applies automatic detection heuristics to a large corpus of news articles, links outputs to a qualitatively annotated subcorpus, and reports observed distributions of entities and voices. No equations, fitted parameters, or predictions are defined in terms of themselves. The central claims rest on corpus statistics (density of named entities vs. publics in interrogative contexts) rather than any self-citation chain, ansatz smuggling, or renaming of known results. The automatic stance detection is presented as a heuristic tool whose outputs are then interpreted against pragmatic theory; no step reduces the reported personalization pattern to a tautological input by construction. This is the normal non-circular outcome for a data-driven observational study.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Interrogatives can be automatically detected and functionally classified using heuristics in news text.
Reference graph
Works this paper leans on
-
[1]
Association for Computing Machinery
InProceedings of the 12th ACM Conference on Web Science. Association for Computing Machinery. McCombs, M. E.; and Shaw, D. L. 1993. The Evolution of Agenda- Setting Research: Twenty-Five Years in the Marketplace of Ideas. Journal of Communication, 43(2): 58–67. Nelson, L. K. 2020. Computational Grounded Theory: A Method- ological Framework.Sociological Me...
work page 1993
-
[2]
Publisher: SAGE Publications Inc. Nguyen, D.; and van Es, K. 2024. Exploring the Value of Com- putational Methods for Metajournalistic Discourse: The Exam- ple of COVID-19 Reporting in Dutch Newspapers.Journal- ism Studies, 25(10): 1160–1181. Publisher: Routledge eprint: https://doi.org/10.1080/1461670X.2024.2358118. Olsen, W. 2004. Triangulation in Socia...
-
[3]
For most authors... (a) Would answering this research question advance sci- ence without violating social contracts, such as violat- ing privacy norms, perpetuating unfair profiling, exac- erbating the socio-economic divide, or implying disre- spect to societies or cultures? Yes, and the work ana- lyzes publicly available news articles and reports only ag...
-
[4]
Additionally, if your study involves hypotheses testing... (a) Did you clearly state the assumptions underlying all theoretical results? NA (b) Have you provided justifications for all theoretical re- sults? NA (c) Did you discuss competing hypotheses or theories that might challenge or complement your theoretical re- sults? NA (d) Have you considered alt...
-
[5]
Additionally, if you are including theoretical proofs... (a) Did you state the full set of assumptions of all theoret- ical results? NA (b) Did you include complete proofs of all theoretical re- sults? NA
-
[6]
Additionally, if you ran machine learning experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (ei- ther in the supplemental material or as a URL)? Due to news copyright constraints, we do not redistribute full-text articles. Code and derived non-copyright- restricted artifacts are available...
-
[7]
Additionally, if you are using existing assets (e.g., code, data, models) or curating/releasing new assets,without compromising anonymity... (a) If your work uses existing assets, did you cite the cre- ators? Yes, and we cite the creators of the news cor- pora and the main NLP models and toolkits we use (CCNews, the Suisse Romande corpus, CamemBERT, BERTo...
work page 2020
-
[8]
Additionally, if you used crowdsourcing or conducted research with human subjects,without compromising anonymity... (a) Did you include the full text of instructions given to participants and screenshots? NA (b) Did you describe any potential participant risks, with mentions of Institutional Review Board (IRB) ap- provals? NA (c) Did you include the estim...
work page 2023
-
[9]
Compute a group embedding as the average of the normalized sentence embeddings in the group, re- normalized to unit length
-
[10]
Precompute cumulative sums over the article’s sentence embeddings to allow fast average embeddings over any contiguous window
-
[11]
Search only among subsequent sentences up to 15 sen- tences ahead of the last question sentence in the group
-
[12]
For each candidate window lengthL∈ {1,2,3,4,5}and each possible start position, compute the mean embed- ding and its cosine similarity with the group embedding
-
[13]
Ten percent fewer apprentices: how can Valais fix this?
If the best window has cosine similarity≥0.40, treat it as the answer span. Sensitivity checks reported in Appendix C show that the stored similarity scores are sharply bimodal, so the main answerability estimates are effectively invariant across a broad range of thresholds. Otherwise mark the group as unanswered. For each interrogative sentence we stored...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.