arxiv: 2603.21823 · v2 · submitted 2026-03-23 · 💻 cs.CL · cs.CY

Recognition: no theorem link

Politics of Questions in News: A Mixed-Methods Study of Interrogative Stances as Markers of Voice and Power

Bros Victor , Barbini Matilde , Gerard Patrick , Gatica-Perez Daniel

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:49 UTC · model grok-4.3

classification 💻 cs.CL cs.CY

keywords interrogativesnews discoursepersonalizationmixed-methodsFrench newsvoicepowerpragmatics

0 comments

The pith

Interrogative discourse in news foregrounds prominent actors and places over broad publics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper investigates how questions function in contemporary French-language digital news through a mixed-methods analysis of over a million articles. It finds that interrogatives are sparse yet patterned, primarily serving to introduce issues or seek information, and are usually answered within the same article. A key insight is that these questions tend to focus on specific named individuals, organizations, and locations rather than wider social groups, pointing to a personalization effect in how news discourse is structured. This matters because it shows how questioning practices can shape whose voices and stories are centered in public discourse.

Core claim

The study shows that interrogative contexts are densely populated with named individuals, organizations, and places, whereas publics and broad social groups are mentioned much less frequently, indicating that interrogative discourse tends to foreground already prominent actors and places and thus exhibits strong personalization.

What carries the argument

The operationalization of interrogative stance, textual uptake, and voice at corpus scale, combining automatic detection with qualitative annotation grounded in semantic and pragmatic theories.

Load-bearing premise

The automatic detection of interrogative stances and their functional types is accurate enough to support claims about patterns across the corpus, and the qualitatively annotated subcorpus is representative of the larger dataset.

What would settle it

Manual review of a representative sample showing that interrogative contexts mention broad social groups as frequently as named individuals would falsify the personalization claim.

Figures

Figures reproduced from arXiv: 2603.21823 by Barbini Matilde, Bros Victor, Gatica-Perez Daniel, Gerard Patrick.

**Figure 1.** Figure 1: Overview of the mixed-methods pipeline for interrogative stance analysis. Starting from a corpus of 1.2M Frenchlanguage news articles from 24 outlets (2023-2024), the upper branch applies a teacher-student NLP pipeline to detect interrogative stances and derive measures of interrogativity, answerability, dialogicity, and actor/addressivity profiles. In parallel, a stratified qualitative subcorpus is manu… view at source ↗

**Figure 2.** Figure 2: Variation in interrogative density across (a) regions and (b) editorial scales in French-language news (2023–2024). [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Mean answerability and external dialogicity by ed [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Row-normalized conditional confusion matrix for [PITH_FULL_IMAGE:figures/full_fig_p019_4.png] view at source ↗

read the original abstract

Interrogatives in news discourse have been examined in linguistics and conversation analysis, but mostly in broadcast interviews and relatively small, often English-language corpora, while large-scale computational studies of news rarely distinguish interrogatives from declaratives or differentiate their functions. This paper brings these strands together through a mixed-methods study of the "Politics of Questions" in contemporary French-language digital news. Using over one million articles published between January 2023 and June 2024, we automatically detect interrogative stances, approximate their functional types, and locate textual answers when present, linking these quantitative measures to a qualitatively annotated subcorpus grounded in semantic and pragmatic theories of questions. Interrogatives are sparse but systematically patterned: they mainly introduce or organize issues, with most remaining cases being information-seeking or echo-like, while explicitly leading or tag questions are rare. Although their density and mix vary across outlets and topics, our heuristic suggests that questions are overwhelmingly taken up within the same article and usually linked to a subsequent answer-like span, most often in the journalist's narrative voice and less often through quoted speech. Interrogative contexts are densely populated with named individuals, organizations, and places, whereas publics and broad social groups are mentioned much less frequently, suggesting that interrogative discourse tends to foreground already prominent actors and places and thus exhibits strong personalization. We show how interrogative stance, textual uptake, and voice can be operationalized at corpus scale, and argue that combining computational methods with pragmatic and sociological perspectives can help account for how questioning practices structure contemporary news discourse.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Scales interrogative analysis in French news but unvalidated detection undercuts the personalization findings.

read the letter

The paper's main contribution is a large-scale computational analysis of interrogatives in over a million French news articles from 2023-2024, showing they are sparse but patterned, often introducing issues, and that their contexts feature more named individuals and organizations than broad social groups. This suggests a personalization bias in questioning practices. It does a decent job bridging small-scale linguistic studies with corpus methods by approximating functional types and linking to uptake and voice. The mixed-methods angle, grounding quantitative measures in qualitative annotation based on semantic and pragmatic theories, is a strength. It also notes variations across outlets and topics, which adds some nuance. The soft spots are around the methods. The detection relies on a heuristic without any accuracy metrics, validation against the annotated subcorpus, or details on how entity mentions are extracted and typed. If the automatic system misses certain question types or favors proper nouns, the reported density differences could be misleading rather than reflective of real discourse patterns. The abstract mentions 'our heuristic suggests' for the uptake part too, so the claims rest on untested assumptions. This is for people working on computational discourse analysis or media bias in news. A reader interested in scaling pragmatic concepts would get some value from the operationalization, but the lack of validation means it's not ready to cite as solid evidence yet. I'd recommend sending it to peer review, but the referees should push hard on the detection validation and any statistical support for the patterns.

Referee Report

2 major / 2 minor

Summary. The paper conducts a mixed-methods study of interrogative stances in over one million French-language news articles (Jan 2023–Jun 2024). It automatically detects interrogatives, approximates functional types (e.g., issue-introducing, information-seeking, echo), locates textual answers, and grounds findings in a qualitatively annotated subcorpus drawing on semantic/pragmatic theories. Key results: interrogatives are sparse but patterned, mostly taken up within-article in journalistic voice; interrogative contexts show high density of named individuals/organizations/places and low density of publics/broad groups, interpreted as evidence of strong personalization.

Significance. If the detection pipeline is shown to be reliable, the work provides a scalable operationalization of interrogative stance, uptake, and voice that links computational corpus methods to pragmatic and sociological accounts of news discourse. It offers falsifiable, corpus-scale evidence on how questioning practices foreground prominent actors, which could inform future studies of media power and voice.

major comments (2)

[Methods] Methods section: no precision, recall, confusion matrix, or inter-annotator agreement is reported for the automatic interrogative-stance detector or the entity-typing step. Because the central personalization claim rests on density contrasts between named entities and publics across >1M automatically labeled contexts, absence of these metrics leaves open the possibility that detection biases (e.g., preferential flagging around proper nouns) artifactually produce the reported pattern.
[Qualitative Analysis] Qualitative subcorpus description: size, sampling frame, and agreement between automatic labels and manual annotations are unspecified. Without these details it is impossible to assess whether the subcorpus reliably grounds the quantitative functional-type and uptake claims.

minor comments (2)

[Abstract] Abstract: the phrase 'our heuristic suggests' is used without a one-sentence gloss of the heuristic; adding this would improve immediate clarity for readers.
[Results] Results: statements about variation across outlets and topics would be strengthened by at least one concrete numerical example or statistical test rather than qualitative description alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive review. The comments highlight important gaps in the reporting of evaluation metrics and subcorpus details, which we agree need to be addressed to strengthen the manuscript. We outline our responses below and will incorporate the necessary revisions.

read point-by-point responses

Referee: [Methods] Methods section: no precision, recall, confusion matrix, or inter-annotator agreement is reported for the automatic interrogative-stance detector or the entity-typing step. Because the central personalization claim rests on density contrasts between named entities and publics across >1M automatically labeled contexts, absence of these metrics leaves open the possibility that detection biases (e.g., preferential flagging around proper nouns) artifactually produce the reported pattern.

Authors: We agree that the current manuscript lacks these critical evaluation details, which is a genuine limitation for assessing the reliability of the detection pipeline and the robustness of the personalization findings. In the revised version, we will add a dedicated evaluation subsection that reports precision, recall, and F1 scores for the interrogative-stance detector on a manually annotated test set, along with a confusion matrix. For the entity-typing component, we will include inter-annotator agreement metrics (e.g., Cohen's kappa) from our validation process. These additions will directly address potential biases and allow readers to evaluate the density contrasts more confidently. revision: yes
Referee: [Qualitative Analysis] Qualitative subcorpus description: size, sampling frame, and agreement between automatic labels and manual annotations are unspecified. Without these details it is impossible to assess whether the subcorpus reliably grounds the quantitative functional-type and uptake claims.

Authors: We acknowledge that the manuscript does not provide sufficient details on the qualitative subcorpus, which limits the ability to evaluate how well it supports the functional-type and uptake analyses. In the revision, we will expand this section to specify the exact size of the subcorpus, the sampling frame (including how articles were selected from the full corpus), and agreement statistics between the automatic labels and manual annotations (such as percentage agreement and Cohen's kappa). This will provide the necessary transparency and strengthen the mixed-methods grounding. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical patterns derived from independent data processing

full rationale

The paper is a mixed-methods empirical study that applies automatic detection heuristics to a large corpus of news articles, links outputs to a qualitatively annotated subcorpus, and reports observed distributions of entities and voices. No equations, fitted parameters, or predictions are defined in terms of themselves. The central claims rest on corpus statistics (density of named entities vs. publics in interrogative contexts) rather than any self-citation chain, ansatz smuggling, or renaming of known results. The automatic stance detection is presented as a heuristic tool whose outputs are then interpreted against pragmatic theory; no step reduces the reported personalization pattern to a tautological input by construction. This is the normal non-circular outcome for a data-driven observational study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The study assumes standard NLP techniques for question detection and pragmatic categories for question functions, with no new entities postulated.

axioms (1)

domain assumption Interrogatives can be automatically detected and functionally classified using heuristics in news text.
The paper relies on this to scale analysis to 1M articles.

pith-pipeline@v0.9.0 · 5592 in / 1250 out tokens · 46611 ms · 2026-05-15T00:49:58.786203+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

[1]

Association for Computing Machinery

InProceedings of the 12th ACM Conference on Web Science. Association for Computing Machinery. McCombs, M. E.; and Shaw, D. L. 1993. The Evolution of Agenda- Setting Research: Twenty-Five Years in the Marketplace of Ideas. Journal of Communication, 43(2): 58–67. Nelson, L. K. 2020. Computational Grounded Theory: A Method- ological Framework.Sociological Me...

work page 1993
[2]

Nguyen, D.; and van Es, K

Publisher: SAGE Publications Inc. Nguyen, D.; and van Es, K. 2024. Exploring the Value of Com- putational Methods for Metajournalistic Discourse: The Exam- ple of COVID-19 Reporting in Dutch Newspapers.Journal- ism Studies, 25(10): 1160–1181. Publisher: Routledge eprint: https://doi.org/10.1080/1461670X.2024.2358118. Olsen, W. 2004. Triangulation in Socia...

work page doi:10.1080/1461670x.2024.2358118 2024
[3]

For most authors... (a) Would answering this research question advance sci- ence without violating social contracts, such as violat- ing privacy norms, perpetuating unfair profiling, exac- erbating the socio-economic divide, or implying disre- spect to societies or cultures? Yes, and the work ana- lyzes publicly available news articles and reports only ag...

work page
[4]

Additionally, if your study involves hypotheses testing... (a) Did you clearly state the assumptions underlying all theoretical results? NA (b) Have you provided justifications for all theoretical re- sults? NA (c) Did you discuss competing hypotheses or theories that might challenge or complement your theoretical re- sults? NA (d) Have you considered alt...

work page
[5]

(a) Did you state the full set of assumptions of all theoret- ical results? NA (b) Did you include complete proofs of all theoretical re- sults? NA

Additionally, if you are including theoretical proofs... (a) Did you state the full set of assumptions of all theoret- ical results? NA (b) Did you include complete proofs of all theoretical re- sults? NA

work page
[6]

the cost

Additionally, if you ran machine learning experiments... (a) Did you include the code, data, and instructions needed to reproduce the main experimental results (ei- ther in the supplemental material or as a URL)? Due to news copyright constraints, we do not redistribute full-text articles. Code and derived non-copyright- restricted artifacts are available...

work page
[7]

Additionally, if you are using existing assets (e.g., code, data, models) or curating/releasing new assets,without compromising anonymity... (a) If your work uses existing assets, did you cite the cre- ators? Yes, and we cite the creators of the news cor- pora and the main NLP models and toolkits we use (CCNews, the Suisse Romande corpus, CamemBERT, BERTo...

work page 2020
[8]

interrogative stance

Additionally, if you used crowdsourcing or conducted research with human subjects,without compromising anonymity... (a) Did you include the full text of instructions given to participants and screenshots? NA (b) Did you describe any potential participant risks, with mentions of Institutional Review Board (IRB) ap- provals? NA (c) Did you include the estim...

work page 2023
[9]

Compute a group embedding as the average of the normalized sentence embeddings in the group, re- normalized to unit length

work page
[10]

Precompute cumulative sums over the article’s sentence embeddings to allow fast average embeddings over any contiguous window

work page
[11]

Search only among subsequent sentences up to 15 sen- tences ahead of the last question sentence in the group

work page
[12]

For each candidate window lengthL∈ {1,2,3,4,5}and each possible start position, compute the mean embed- ding and its cosine similarity with the group embedding

work page
[13]

Ten percent fewer apprentices: how can Valais fix this?

If the best window has cosine similarity≥0.40, treat it as the answer span. Sensitivity checks reported in Appendix C show that the stored similarity scores are sharply bimodal, so the main answerability estimates are effectively invariant across a broad range of thresholds. Otherwise mark the group as unanswered. For each interrogative sentence we stored...

work page