pith. machine review for the scientific record. sign in

arxiv: 2604.20982 · v1 · submitted 2026-04-22 · 💻 cs.SI · cs.CY

Recognition: unknown

MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:25 UTC · model grok-4.3

classification 💻 cs.SI cs.CY
keywords entity co-occurrencereporting preferencesnetwork analysisfarmers protestsindian mediacentrality measureslink predictability
0
0 comments X

The pith

Co-occurrence networks reveal distinct reporting preferences across Indian news sources on the farmers' protests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that networks built from entities mentioned together in articles can expose how different media outlets choose to cover the same events. When applied to reporting on the 2020-21 and 2024 Farmers Protests by four sources, the networks display clear variations in which entities receive central roles and how they cluster together. Farmer leaders stand out as consistently less prominent in these structures than other figures involved. A measure of how reliably entity connections repeat across time periods is added to track reporting stability. A sympathetic reader would value this because it supplies a way to compare media choices at scale by examining only who appears alongside whom, without manual review of wording or assigned categories.

Core claim

The authors establish that source-specific entity co-occurrence networks around the Farmers Protests exhibit significant differences in centrality, community structure, and link predictability across the four outlets, indicating varied reporting preferences, together with a consistent under-representation of farmer leaders.

What carries the argument

Entity co-occurrence networks in which shared mentions form links between entities, examined through centrality, community structure, and a link predictability measure that tracks consistency of associations over time.

If this is right

  • Different outlets prioritize distinct sets of entities and their associations when covering identical events.
  • Farmer leaders receive lower prominence in the networks constructed from every source examined.
  • The link predictability metric can quantify stability in how media sources link entities across separate time periods.
  • Relational patterns alone allow comparison of reporting behavior without reliance on textual labels or sentiment scores.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same network construction could be used on coverage of other political events to identify recurring patterns of emphasis.
  • If the observed differences in entity prominence align with audience reach or policy outcomes, they may help explain variations in public awareness of the protests.
  • Combining the networks with basic checks on article length or placement could test whether structural signals match surface-level reporting volume.

Load-bearing premise

That the co-occurrence of entities in articles directly reflects the reporting preferences and potential biases of the media outlets without needing additional validation against ground truth or textual context.

What would settle it

A manual content analysis of the same articles that finds similar levels of entity emphasis across sources despite the computed differences in network centrality and predictability would challenge the results.

Figures

Figures reproduced from arXiv: 2604.20982 by Aditya Bali, Anirban Sen, Rupsha, Vidur Kaushik.

Figure 1
Figure 1. Figure 1: The supervised link prediction approach that trains the encoder and decoder modules together on the downstream link [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: eigenvector centralities for Outlets [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: betweenness centralities for Outlets [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
read the original abstract

We present MediaGraph, a network-theoretic framework for analyzing reporting preferences in news media through entity co-occurrence networks. Using articles from four Indian news-sources, two mainstream (The Times of India and The Indian Express) and two fringe outlets (dna and firstpost), we construct source-specific co-occurrence networks around the 2020-21 and 2024 Farmers Protests. We analyze these networks along three network theoretic axes of centrality, community structure, and co-occurrence link predictability. The link predictability metric is a novel metric proposed that quantifies the consistency of entity associations over time using a GraphSAGE-based model. Our results reveal significant differences in reporting preferences across sources for the same event, and a consistent under-representation of farmer leaders across sources. By shifting the focus from textual signals to relational structures, our approach offers a scalable, label-independent perspective on media analysis and introduces link predictability as a complementary measure of reporting behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces MediaGraph, a network-theoretic framework that builds source-specific entity co-occurrence networks from articles in four Indian news outlets (two mainstream: The Times of India and The Indian Express; two fringe: dna and firstpost) covering the 2020-21 and 2024 Farmers' Protests. It evaluates these networks along three axes—centrality, community structure, and a novel link-predictability metric implemented via GraphSAGE—to quantify differences in reporting preferences and consistency of entity associations over time. The central claims are that the networks reveal significant source-specific differences in reporting and a consistent under-representation of farmer leaders across all outlets.

Significance. If the mapping from co-occurrence structure to editorial preference can be validated, the framework offers a scalable, label-free complement to text-based media-bias methods and introduces a temporal consistency metric that could be useful in computational social science. The work is technically straightforward and leverages standard network tools, but its interpretive claims rest on an untested proxy assumption.

major comments (3)
  1. [§3 (Link Predictability)] §3 (Link Predictability): The GraphSAGE model is trained on the same temporal co-occurrence edges it is later asked to predict; this creates a circularity risk in which the reported 'consistency' score largely reflects the model's reconstruction fidelity on the input graph rather than an independent signal of reporting behavior. An explicit train/test split across time periods or an external baseline is required to substantiate the metric.
  2. [§4 (Centrality and Under-representation)] §4 (Centrality and Under-representation): Lower centrality of farmer-leader nodes is interpreted as under-representation, yet no ground-truth comparison (e.g., official protest participant lists, manual framing annotations, or mention polarity) is provided. Without such validation, the observed centrality differences could simply mirror the factual peripheral role of these entities in the covered events rather than editorial choice.
  3. [§2 (Data Construction)] §2 (Data Construction): The core assumption that raw entity co-occurrence frequency directly encodes reporting preferences is not tested against textual context or alternative explanations such as event-driven factual coverage. This assumption is load-bearing for all three analytic axes and the final claims.
minor comments (2)
  1. [Abstract] The abstract states 'significant differences' without naming the statistical tests or effect-size thresholds used; these should be stated explicitly.
  2. [Figures] Network figures would benefit from consistent node labeling conventions and legends that distinguish farmer leaders from other entity types.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important methodological considerations regarding our assumptions and metrics. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: §3 (Link Predictability): The GraphSAGE model is trained on the same temporal co-occurrence edges it is later asked to predict; this creates a circularity risk in which the reported 'consistency' score largely reflects the model's reconstruction fidelity on the input graph rather than an independent signal of reporting behavior. An explicit train/test split across time periods or an external baseline is required to substantiate the metric.

    Authors: We agree this is a valid concern and that the current setup risks conflating reconstruction with predictive consistency. We will revise Section 3 to implement an explicit temporal train/test split: the GraphSAGE model will be trained on the 2020-21 co-occurrence networks and evaluated on its ability to predict links in the 2024 networks for each source. We will also add a simple external baseline (e.g., preferential attachment or random prediction) for comparison. This change will be reflected in updated results and methodology. revision: yes

  2. Referee: §4 (Centrality and Under-representation): Lower centrality of farmer-leader nodes is interpreted as under-representation, yet no ground-truth comparison (e.g., official protest participant lists, manual framing annotations, or mention polarity) is provided. Without such validation, the observed centrality differences could simply mirror the factual peripheral role of these entities in the covered events rather than editorial choice.

    Authors: We acknowledge that the interpretation relies on a proxy and lacks direct ground-truth validation, which limits causal claims about editorial choice versus event facts. We cannot add comprehensive external lists or full annotations without new data collection. In revision, we will add an explicit limitations paragraph in Section 4 discussing this proxy nature, emphasizing that the under-representation finding is relative (consistent low centrality across all four sources) and that source-specific differences in other entities still support reporting preference claims. We will also outline future validation steps. revision: partial

  3. Referee: §2 (Data Construction): The core assumption that raw entity co-occurrence frequency directly encodes reporting preferences is not tested against textual context or alternative explanations such as event-driven factual coverage. This assumption is load-bearing for all three analytic axes and the final claims.

    Authors: This assumption underpins the framework and merits explicit testing. We will revise Section 2 to include a validation subsection: a random sample of 100 articles per source will be manually inspected to confirm that high co-occurrence pairs reflect substantive joint coverage rather than incidental mentions. We will also add discussion of alternative explanations (e.g., event-driven coverage) and how cross-source comparisons help isolate preferences. Updated text and any revised figures will be included. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper builds entity co-occurrence networks directly from article data for four sources and applies standard network measures (centrality, community structure) plus a proposed temporal link predictability metric via GraphSAGE. No equations or sections reduce any claimed result to its inputs by construction, nor do any load-bearing steps rely on self-citations that themselves assume the target outcome. The link predictability is framed as a consistency measure over time rather than a fitted parameter renamed as a prediction on the identical data; the central claims about reporting differences and under-representation follow from independent computation on the constructed graphs. The derivation remains self-contained against the input co-occurrence data without tautological collapse.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available for review, so specific free parameters, axioms, or invented entities cannot be identified in detail. The approach relies on standard assumptions in network science and graph neural networks, but the novel metric's implementation details are not provided.

pith-pipeline@v0.9.0 · 5467 in / 1285 out tokens · 50073 ms · 2026-05-09T22:25:04.538697+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

42 extracted references · 10 canonical work pages

  1. [1]

    Agung Farid Agustian. 2025. Analyzing Media Bias in Support of Government Policies: A Critical Discourse Analysis in a Newspaper.PAROLE: Journal of Linguistics and Education15, 1 (2025), 38–46

  2. [2]

    ANI. 2021. UP: CM Yogi Adityanath calls Lakhimpur Kheri incident ’unfortunate’. https://timesofindia.indiatimes.com/india/up-cm-yogi-adityanath-calls- lakhimpur-kheri-incident-unfortunate/articleshow/86734610.cms. Accessed: 2026-04-22

  3. [3]

    David P. Baron. 2006. Persistent media bias.Journal of Public Economics90, 1 (2006), 1–36. doi:10.1016/j.jpubeco.2004.10.006

  4. [4]

    Hadjer Belghoul and Abdellah Baraka. 2025. Media Bias in Reporting the 2021 Sheikh Jarrah Evictions: A Van Dijkian Discourse Analysis.Majallat al-Nas12, 2 (2025), 592–608

  5. [5]

    Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Eui- jong Whang, and Jennifer Widom. 2009. Swoosh: a generic approach to entity resolution.The VLDB Journal18, 1 (2009), 255–276

  6. [6]

    Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantify- ing media bias through crowdsourced content analysis.Public Opinion Quarterly 80, S1 (2016), 250–271

  7. [7]

    Ceren Budak, Sharad Goel, and Justin M. Rao. 2016. Fair and Balanced? Quan- tifying Media Bias through Crowdsourced Content Analysis.Public Opinion Quarterly80, S1 (04 2016), 250–271. doi:10.1093/poq/nfw007

  8. [8]

    Yutong Chen, Gaurav Chiplunkar, Sheetal Sekhri, Anirban Sen, and Aaditeshwar Seth. 2025. How do political connections of firms matter during an economic crisis?Journal of Development Economics175 (2025), 103471. doi:10.1016/j.jd eveco.2025.103471

  9. [9]

    Tomas Cicchini, Sofia Morena Del Pozo, Enzo Tagliazucchi, and Pablo Balenzuela

  10. [10]

    News sharing on Twitter reveals emergent fragmentation of media agenda and persistent polarization.EPJ Data Science11, 1 (2022), 48

  11. [11]

    Omar Daoudi, Jason Gainous, Syed Ali Hussain, and Khaled Zamoum. 2026. Media bias in sports journalism: A comparative study of Qatar 2022 World Cup coverage.Communication & Sport14, 1 (2026), 207–230

  12. [12]

    Simeon Djankov, Caralee McLiesh, Tatiana Nenova, and Andrei Shleifer. 2003. Who owns the media?The Journal of Law and Economics46, 2 (2003), 341–382

  13. [13]

    BV Elasticsearch. 2018. Elasticsearch.software], version6, 1 (2018)

  14. [14]

    James Flamino, Alessandro Galeazzi, Stuart Feldman, Michael W Macy, Brendan Cross, Zhenkun Zhou, Matteo Serafino, Alexandre Bovet, Hernán A Makse, and Boleslaw K Szymanski. 2023. Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections.Nature Human Behaviour7, 6 (2023), 904–916

  15. [15]

    Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. arXiv:1607.00653 [cs.SI] https://arxiv.org/abs/1607.00653

  16. [16]

    Felix Hamborg and Karsten Donnay. 2021. NewsMTSC: a dataset for (multi-) target-dependent sentiment classification in political news articles. InProceedings of the 16th Conference of the European Chapter of the Association for Computa- tional Linguistics: Main Volume. 1663–1675

  17. [17]

    Inductive Representation Learning on Large Graphs

    William L. Hamilton, Rex Ying, and Jure Leskovec. 2018. Inductive Representa- tion Learning on Large Graphs. arXiv:1706.02216 [cs.SI] https://arxiv.org/abs/17 06.02216

  18. [18]

    Ignatius Haryanto. 2011. Media ownership and its implications for journalists and journalism in Indonesia.Politics and the media in twenty-first century Indonesia: Decade of democracy(2011), 104–118

  19. [19]

    Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd

  20. [20]

    spaCy: Industrial-strength Natural Language Processing in Python. (2020). doi:10.5281/zenodo.1212303

  21. [21]

    Tomáš Horych, Christoph Mandl, Terry Ruas, André Greiner-Petter, Bela Gipp, Akiko Aizawa, and Timo Spinde. 2025. The promises and pitfalls of LLM annotations in dataset labeling: A case study on media bias detection. InFindings of the Association for Computational Linguistics: NAACL 2025. 1370–1386

  22. [22]

    Homa Hosseinmardi, Samuel Wolken, David M Rothschild, and Duncan J Watts

  23. [23]

    Unpacking media bias in the growing divide between cable and network news.Scientific Reports15, 1 (2025), 17607

  24. [24]

    Byunghwee Lee, Hyo-sun Ryu, Jae Kook Lee, Hawoong Jeong, and Beom Jun Kim. 2025. Network analysis reveals news press landscape and asymmetric user polarization.Physica A: Statistical Mechanics and its Applications(2025), 130842

  25. [25]

    Sibo Liu, Alexey Makarin, Jinfeng Wu, and Dong Zhang. 2026. The War of Ideas: Institutions and Global Media Bias. (2026)

  26. [26]

    Nicholas Kah Yean Low and Andrew Melatos. 2022. Discerning media bias within a network of political allies and opponents: The idealized example of a biased coin.Physica A: Statistical Mechanics and its Applications590 (2022), 126722. doi:10.1016/j.physa.2021.126722

  27. [27]

    Maxwell E McCombs and Donald L Shaw. 1972. The agenda-setting function of mass media.Public opinion quarterly36, 2 (1972), 176–187

  28. [28]

    Horst Po¨ ttker. 2003. News and its communicative quality: the inverted pyramid- when and why did it appear?Journalism Studies4, 4 (2003), 501–511

  29. [29]

    Elad Segev. 2020. Textual network analysis: Detecting pre- vailing themes and biases in international news and so- cial media.Sociology Compass14, 4 (2020), e12779. arXiv:https://compass.onlinelibrary.wiley.com/doi/pdf/10.1111/soc4.12779 doi:10.1111/soc4.12779

  30. [30]

    Anirban Sen, Debanjan Ghatak, Gurjeet Khanuja, Kumari Rekha, Mehak Gupta, Sanket Dhakate, Kartikeya Sharma, and Aaditeshwar Seth. 2022. Analysis of media bias in policy discourse in india. InProceedings of the 5th ACM SIG- CAS/SIGCHI Conference on Computing and Sustainable Societies. 57–77

  31. [31]

    Anirban Sen, Debanjan Ghatak, Kapil Kumar, Gurjeet Khanuja, Deepak Bansal, Mehak Gupta, Kumari Rekha, Saloni Bhogale, Priyamvada Trivedi, and Aaditesh- war Seth. 2019. Studying the discourse on economic policies in India using mass media, social media, and the parliamentary question hour data. InProceedings of the 2nd ACM SIGCAS Conference on Computing an...

  32. [32]

    Nidaa Shahid and Bilal Ghazanfar. 2025. Mapping Media Bias: Global Islam- ophobic Trends and their Reflections in South Asia.The Beacon Journal5, 02 (2025)

  33. [33]

    Ankur Sharma, Navreet Kaur, Anirban Sen, and Aaditeshwar Seth. 2020. Ideology Detection in the Indian Mass Media. In2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 627–634. doi:10.1109/ASONAM49781.2020.9381344

  34. [34]

    Timo Spinde, Lada Rudnitckaia, Jelena Mitrovi´c, Felix Hamborg, Michael Gran- itzer, Bela Gipp, and Karsten Donnay. 2021. Automated identification of bias inducing words in news articles using linguistic and context-oriented features. Information Processing & Management58, 3 (2021), 102505

  35. [35]

    Vincent A Traag, Ridho Reinanda, and Gerry van Klinken. 2016. Structure of a media co-occurrence network. InProceedings of ECCS 2014: European MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Conference on Complex Systems. Springer, 81–91

  36. [36]

    Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities.Scientific reports9, 1 (2019), 5233

  37. [37]

    Jenny S Wang, Samar Haider, Amir Tohidi, Anushkaa Gupta, Yuxuan Zhang, Chris Callison-Burch, David Rothschild, and Duncan J Watts. 2025. Media bias detector: Designing and implementing a tool for real-time selection and framing bias analysis in news coverage. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–27

  38. [38]

    Ze Wang, Zekun Wu, Yichi Zhang, Xin Guan, Navya Jain, Qinyang Lu, Saloni Gupta, and Adriano Koshiyama. 2025. Bias amplification: Large language models as increasingly biased media. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia- Pacific Chapter of the Association for Computational...

  39. [39]

    Wikipedia contributors. 2026. 2020–2021 Indian farmers’ protest. https://en.wik ipedia.org/wiki/2020%E2%80%932021_Indian_farmers%27_protest Accessed: 2026-04-02

  40. [40]

    Wikipedia contributors. 2026. 2024–2025 Indian farmers’ protest. https://en.wik ipedia.org/wiki/2024%E2%80%942025_Indian_farmers%27_protest Accessed: 2026-04-02

  41. [41]

    Bennett WL. 1990. Toward a theory of press-state relations in the United States. Journal of Communication40, 2 (1990), 103–125

  42. [42]

    Dvir Yogev, Criminal Law, and Justice Center. [n. d.]. Measuring Media Bias Toward Reform Prosecutors: A Multi-Method NLP Analysis of Bay Area News Coverage, 2019–2024. ([n. d.]). 8 Appendix 8.1 Detailed Experimental Results We report here our detailed experimental results for the link predic- tion experiments using GraphSAGE, across the four news-sources...