arxiv: 2604.20982 · v1 · submitted 2026-04-22 · 💻 cs.SI · cs.CY

Recognition: unknown

MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media

Aditya Bali , Rupsha , Vidur Kaushik , Anirban Sen

Authors on Pith no claims yet

Pith reviewed 2026-05-09 22:25 UTC · model grok-4.3

classification 💻 cs.SI cs.CY

keywords entity co-occurrencereporting preferencesnetwork analysisfarmers protestsindian mediacentrality measureslink predictability

0 comments

The pith

Co-occurrence networks reveal distinct reporting preferences across Indian news sources on the farmers' protests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that networks built from entities mentioned together in articles can expose how different media outlets choose to cover the same events. When applied to reporting on the 2020-21 and 2024 Farmers Protests by four sources, the networks display clear variations in which entities receive central roles and how they cluster together. Farmer leaders stand out as consistently less prominent in these structures than other figures involved. A measure of how reliably entity connections repeat across time periods is added to track reporting stability. A sympathetic reader would value this because it supplies a way to compare media choices at scale by examining only who appears alongside whom, without manual review of wording or assigned categories.

Core claim

The authors establish that source-specific entity co-occurrence networks around the Farmers Protests exhibit significant differences in centrality, community structure, and link predictability across the four outlets, indicating varied reporting preferences, together with a consistent under-representation of farmer leaders.

What carries the argument

Entity co-occurrence networks in which shared mentions form links between entities, examined through centrality, community structure, and a link predictability measure that tracks consistency of associations over time.

If this is right

Different outlets prioritize distinct sets of entities and their associations when covering identical events.
Farmer leaders receive lower prominence in the networks constructed from every source examined.
The link predictability metric can quantify stability in how media sources link entities across separate time periods.
Relational patterns alone allow comparison of reporting behavior without reliance on textual labels or sentiment scores.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same network construction could be used on coverage of other political events to identify recurring patterns of emphasis.
If the observed differences in entity prominence align with audience reach or policy outcomes, they may help explain variations in public awareness of the protests.
Combining the networks with basic checks on article length or placement could test whether structural signals match surface-level reporting volume.

Load-bearing premise

That the co-occurrence of entities in articles directly reflects the reporting preferences and potential biases of the media outlets without needing additional validation against ground truth or textual context.

What would settle it

A manual content analysis of the same articles that finds similar levels of entity emphasis across sources despite the computed differences in network centrality and predictability would challenge the results.

Figures

Figures reproduced from arXiv: 2604.20982 by Aditya Bali, Anirban Sen, Rupsha, Vidur Kaushik.

**Figure 1.** Figure 1: The supervised link prediction approach that trains the encoder and decoder modules together on the downstream link [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗

**Figure 2.** Figure 2: eigenvector centralities for Outlets [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: betweenness centralities for Outlets [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

read the original abstract

We present MediaGraph, a network-theoretic framework for analyzing reporting preferences in news media through entity co-occurrence networks. Using articles from four Indian news-sources, two mainstream (The Times of India and The Indian Express) and two fringe outlets (dna and firstpost), we construct source-specific co-occurrence networks around the 2020-21 and 2024 Farmers Protests. We analyze these networks along three network theoretic axes of centrality, community structure, and co-occurrence link predictability. The link predictability metric is a novel metric proposed that quantifies the consistency of entity associations over time using a GraphSAGE-based model. Our results reveal significant differences in reporting preferences across sources for the same event, and a consistent under-representation of farmer leaders across sources. By shifting the focus from textual signals to relational structures, our approach offers a scalable, label-independent perspective on media analysis and introduces link predictability as a complementary measure of reporting behavior.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MediaGraph introduces a GraphSAGE predictability score for tracking entity link consistency in news co-occurrence networks, but treats those networks as direct readouts of reporting preferences without checks against text or external facts.

read the letter

The paper builds source-specific co-occurrence graphs from coverage of the 2020-21 and 2024 Indian farmers' protests in four outlets, then measures centrality, community structure, and a new link predictability metric. The predictability score trains a GraphSAGE model on temporal slices of the same graphs to quantify how stable entity associations remain across periods. That metric is the clearest addition; it gives a label-free way to score consistency that goes beyond standard degree or betweenness numbers. The comparison across mainstream and fringe sources is also straightforward and lets them flag differences in which entities sit at the center of each network. They report lower centrality for farmer leaders across all four outlets, which they read as under-representation. The setup is clean enough that someone could replicate the pipeline on new event data without much trouble. The main weakness is the leap from co-occurrence counts to editorial preference. Nothing in the work compares the networks to actual article text, protest participant lists, or polarity annotations, so it is hard to tell whether lower centrality reflects deliberate downplaying or simply the factual distribution of who appeared in the events. The GraphSAGE model is trained on the co-occurrence data itself, which raises the usual circularity worry for any predictability measure derived from the same edges it is trying to explain. Scope is narrow—one country, two protest waves—so the framework is more a proof-of-concept than a general tool. Readers working on network methods for media or computational social science could pick up the predictability idea and test it on better-validated corpora. The paper is coherent on its own terms and engages the literature without obvious contradictions, so it is worth sending out for referee comments even though the central mapping from graph to bias needs more evidence before the claims can be taken at face value.

Referee Report

3 major / 2 minor

Summary. The paper introduces MediaGraph, a network-theoretic framework that builds source-specific entity co-occurrence networks from articles in four Indian news outlets (two mainstream: The Times of India and The Indian Express; two fringe: dna and firstpost) covering the 2020-21 and 2024 Farmers' Protests. It evaluates these networks along three axes—centrality, community structure, and a novel link-predictability metric implemented via GraphSAGE—to quantify differences in reporting preferences and consistency of entity associations over time. The central claims are that the networks reveal significant source-specific differences in reporting and a consistent under-representation of farmer leaders across all outlets.

Significance. If the mapping from co-occurrence structure to editorial preference can be validated, the framework offers a scalable, label-free complement to text-based media-bias methods and introduces a temporal consistency metric that could be useful in computational social science. The work is technically straightforward and leverages standard network tools, but its interpretive claims rest on an untested proxy assumption.

major comments (3)

[§3 (Link Predictability)] §3 (Link Predictability): The GraphSAGE model is trained on the same temporal co-occurrence edges it is later asked to predict; this creates a circularity risk in which the reported 'consistency' score largely reflects the model's reconstruction fidelity on the input graph rather than an independent signal of reporting behavior. An explicit train/test split across time periods or an external baseline is required to substantiate the metric.
[§4 (Centrality and Under-representation)] §4 (Centrality and Under-representation): Lower centrality of farmer-leader nodes is interpreted as under-representation, yet no ground-truth comparison (e.g., official protest participant lists, manual framing annotations, or mention polarity) is provided. Without such validation, the observed centrality differences could simply mirror the factual peripheral role of these entities in the covered events rather than editorial choice.
[§2 (Data Construction)] §2 (Data Construction): The core assumption that raw entity co-occurrence frequency directly encodes reporting preferences is not tested against textual context or alternative explanations such as event-driven factual coverage. This assumption is load-bearing for all three analytic axes and the final claims.

minor comments (2)

[Abstract] The abstract states 'significant differences' without naming the statistical tests or effect-size thresholds used; these should be stated explicitly.
[Figures] Network figures would benefit from consistent node labeling conventions and legends that distinguish farmer leaders from other entity types.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important methodological considerations regarding our assumptions and metrics. We address each point below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: §3 (Link Predictability): The GraphSAGE model is trained on the same temporal co-occurrence edges it is later asked to predict; this creates a circularity risk in which the reported 'consistency' score largely reflects the model's reconstruction fidelity on the input graph rather than an independent signal of reporting behavior. An explicit train/test split across time periods or an external baseline is required to substantiate the metric.

Authors: We agree this is a valid concern and that the current setup risks conflating reconstruction with predictive consistency. We will revise Section 3 to implement an explicit temporal train/test split: the GraphSAGE model will be trained on the 2020-21 co-occurrence networks and evaluated on its ability to predict links in the 2024 networks for each source. We will also add a simple external baseline (e.g., preferential attachment or random prediction) for comparison. This change will be reflected in updated results and methodology. revision: yes
Referee: §4 (Centrality and Under-representation): Lower centrality of farmer-leader nodes is interpreted as under-representation, yet no ground-truth comparison (e.g., official protest participant lists, manual framing annotations, or mention polarity) is provided. Without such validation, the observed centrality differences could simply mirror the factual peripheral role of these entities in the covered events rather than editorial choice.

Authors: We acknowledge that the interpretation relies on a proxy and lacks direct ground-truth validation, which limits causal claims about editorial choice versus event facts. We cannot add comprehensive external lists or full annotations without new data collection. In revision, we will add an explicit limitations paragraph in Section 4 discussing this proxy nature, emphasizing that the under-representation finding is relative (consistent low centrality across all four sources) and that source-specific differences in other entities still support reporting preference claims. We will also outline future validation steps. revision: partial
Referee: §2 (Data Construction): The core assumption that raw entity co-occurrence frequency directly encodes reporting preferences is not tested against textual context or alternative explanations such as event-driven factual coverage. This assumption is load-bearing for all three analytic axes and the final claims.

Authors: This assumption underpins the framework and merits explicit testing. We will revise Section 2 to include a validation subsection: a random sample of 100 articles per source will be manually inspected to confirm that high co-occurrence pairs reflect substantive joint coverage rather than incidental mentions. We will also add discussion of alternative explanations (e.g., event-driven coverage) and how cross-source comparisons help isolate preferences. Updated text and any revised figures will be included. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper builds entity co-occurrence networks directly from article data for four sources and applies standard network measures (centrality, community structure) plus a proposed temporal link predictability metric via GraphSAGE. No equations or sections reduce any claimed result to its inputs by construction, nor do any load-bearing steps rely on self-citations that themselves assume the target outcome. The link predictability is framed as a consistency measure over time rather than a fitted parameter renamed as a prediction on the identical data; the central claims about reporting differences and under-representation follow from independent computation on the constructed graphs. The derivation remains self-contained against the input co-occurrence data without tautological collapse.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available for review, so specific free parameters, axioms, or invented entities cannot be identified in detail. The approach relies on standard assumptions in network science and graph neural networks, but the novel metric's implementation details are not provided.

pith-pipeline@v0.9.0 · 5467 in / 1285 out tokens · 50073 ms · 2026-05-09T22:25:04.538697+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

42 extracted references · 10 canonical work pages

[1]

Agung Farid Agustian. 2025. Analyzing Media Bias in Support of Government Policies: A Critical Discourse Analysis in a Newspaper.PAROLE: Journal of Linguistics and Education15, 1 (2025), 38–46

2025
[2]

ANI. 2021. UP: CM Yogi Adityanath calls Lakhimpur Kheri incident ’unfortunate’. https://timesofindia.indiatimes.com/india/up-cm-yogi-adityanath-calls- lakhimpur-kheri-incident-unfortunate/articleshow/86734610.cms. Accessed: 2026-04-22

work page arXiv 2021
[3]

David P. Baron. 2006. Persistent media bias.Journal of Public Economics90, 1 (2006), 1–36. doi:10.1016/j.jpubeco.2004.10.006

work page doi:10.1016/j.jpubeco.2004.10.006 2006
[4]

Hadjer Belghoul and Abdellah Baraka. 2025. Media Bias in Reporting the 2021 Sheikh Jarrah Evictions: A Van Dijkian Discourse Analysis.Majallat al-Nas12, 2 (2025), 592–608

2025
[5]

Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Eui- jong Whang, and Jennifer Widom. 2009. Swoosh: a generic approach to entity resolution.The VLDB Journal18, 1 (2009), 255–276

2009
[6]

Ceren Budak, Sharad Goel, and Justin M Rao. 2016. Fair and balanced? Quantify- ing media bias through crowdsourced content analysis.Public Opinion Quarterly 80, S1 (2016), 250–271

2016
[7]

Ceren Budak, Sharad Goel, and Justin M. Rao. 2016. Fair and Balanced? Quan- tifying Media Bias through Crowdsourced Content Analysis.Public Opinion Quarterly80, S1 (04 2016), 250–271. doi:10.1093/poq/nfw007

work page doi:10.1093/poq/nfw007 2016
[8]

Yutong Chen, Gaurav Chiplunkar, Sheetal Sekhri, Anirban Sen, and Aaditeshwar Seth. 2025. How do political connections of firms matter during an economic crisis?Journal of Development Economics175 (2025), 103471. doi:10.1016/j.jd eveco.2025.103471

work page doi:10.1016/j.jd 2025
[9]

Tomas Cicchini, Sofia Morena Del Pozo, Enzo Tagliazucchi, and Pablo Balenzuela
[10]

News sharing on Twitter reveals emergent fragmentation of media agenda and persistent polarization.EPJ Data Science11, 1 (2022), 48

2022
[11]

Omar Daoudi, Jason Gainous, Syed Ali Hussain, and Khaled Zamoum. 2026. Media bias in sports journalism: A comparative study of Qatar 2022 World Cup coverage.Communication & Sport14, 1 (2026), 207–230

2026
[12]

Simeon Djankov, Caralee McLiesh, Tatiana Nenova, and Andrei Shleifer. 2003. Who owns the media?The Journal of Law and Economics46, 2 (2003), 341–382

2003
[13]

BV Elasticsearch. 2018. Elasticsearch.software], version6, 1 (2018)

2018
[14]

James Flamino, Alessandro Galeazzi, Stuart Feldman, Michael W Macy, Brendan Cross, Zhenkun Zhou, Matteo Serafino, Alexandre Bovet, Hernán A Makse, and Boleslaw K Szymanski. 2023. Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections.Nature Human Behaviour7, 6 (2023), 904–916

2023
[15]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. arXiv:1607.00653 [cs.SI] https://arxiv.org/abs/1607.00653

work page Pith review arXiv 2016
[16]

Felix Hamborg and Karsten Donnay. 2021. NewsMTSC: a dataset for (multi-) target-dependent sentiment classification in political news articles. InProceedings of the 16th Conference of the European Chapter of the Association for Computa- tional Linguistics: Main Volume. 1663–1675

2021
[17]

Inductive Representation Learning on Large Graphs

William L. Hamilton, Rex Ying, and Jure Leskovec. 2018. Inductive Representa- tion Learning on Large Graphs. arXiv:1706.02216 [cs.SI] https://arxiv.org/abs/17 06.02216

work page Pith review arXiv 2018
[18]

Ignatius Haryanto. 2011. Media ownership and its implications for journalists and journalism in Indonesia.Politics and the media in twenty-first century Indonesia: Decade of democracy(2011), 104–118

2011
[19]

Matthew Honnibal, Ines Montani, Sofie Van Landeghem, and Adriane Boyd
[20]

spaCy: Industrial-strength Natural Language Processing in Python. (2020). doi:10.5281/zenodo.1212303

work page doi:10.5281/zenodo.1212303 2020
[21]

Tomáš Horych, Christoph Mandl, Terry Ruas, André Greiner-Petter, Bela Gipp, Akiko Aizawa, and Timo Spinde. 2025. The promises and pitfalls of LLM annotations in dataset labeling: A case study on media bias detection. InFindings of the Association for Computational Linguistics: NAACL 2025. 1370–1386

2025
[22]

Homa Hosseinmardi, Samuel Wolken, David M Rothschild, and Duncan J Watts
[23]

Unpacking media bias in the growing divide between cable and network news.Scientific Reports15, 1 (2025), 17607

2025
[24]

Byunghwee Lee, Hyo-sun Ryu, Jae Kook Lee, Hawoong Jeong, and Beom Jun Kim. 2025. Network analysis reveals news press landscape and asymmetric user polarization.Physica A: Statistical Mechanics and its Applications(2025), 130842

2025
[25]

Sibo Liu, Alexey Makarin, Jinfeng Wu, and Dong Zhang. 2026. The War of Ideas: Institutions and Global Media Bias. (2026)

2026
[26]

Nicholas Kah Yean Low and Andrew Melatos. 2022. Discerning media bias within a network of political allies and opponents: The idealized example of a biased coin.Physica A: Statistical Mechanics and its Applications590 (2022), 126722. doi:10.1016/j.physa.2021.126722

work page doi:10.1016/j.physa.2021.126722 2022
[27]

Maxwell E McCombs and Donald L Shaw. 1972. The agenda-setting function of mass media.Public opinion quarterly36, 2 (1972), 176–187

1972
[28]

Horst Po¨ ttker. 2003. News and its communicative quality: the inverted pyramid- when and why did it appear?Journalism Studies4, 4 (2003), 501–511

2003
[29]

Elad Segev. 2020. Textual network analysis: Detecting pre- vailing themes and biases in international news and so- cial media.Sociology Compass14, 4 (2020), e12779. arXiv:https://compass.onlinelibrary.wiley.com/doi/pdf/10.1111/soc4.12779 doi:10.1111/soc4.12779

work page doi:10.1111/soc4.12779 2020
[30]

Anirban Sen, Debanjan Ghatak, Gurjeet Khanuja, Kumari Rekha, Mehak Gupta, Sanket Dhakate, Kartikeya Sharma, and Aaditeshwar Seth. 2022. Analysis of media bias in policy discourse in india. InProceedings of the 5th ACM SIG- CAS/SIGCHI Conference on Computing and Sustainable Societies. 57–77

2022
[31]

Anirban Sen, Debanjan Ghatak, Kapil Kumar, Gurjeet Khanuja, Deepak Bansal, Mehak Gupta, Kumari Rekha, Saloni Bhogale, Priyamvada Trivedi, and Aaditesh- war Seth. 2019. Studying the discourse on economic policies in India using mass media, social media, and the parliamentary question hour data. InProceedings of the 2nd ACM SIGCAS Conference on Computing an...

2019
[32]

Nidaa Shahid and Bilal Ghazanfar. 2025. Mapping Media Bias: Global Islam- ophobic Trends and their Reflections in South Asia.The Beacon Journal5, 02 (2025)

2025
[33]

Ankur Sharma, Navreet Kaur, Anirban Sen, and Aaditeshwar Seth. 2020. Ideology Detection in the Indian Mass Media. In2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). 627–634. doi:10.1109/ASONAM49781.2020.9381344

work page doi:10.1109/asonam49781.2020.9381344 2020
[34]

Timo Spinde, Lada Rudnitckaia, Jelena Mitrovi´c, Felix Hamborg, Michael Gran- itzer, Bela Gipp, and Karsten Donnay. 2021. Automated identification of bias inducing words in news articles using linguistic and context-oriented features. Information Processing & Management58, 3 (2021), 102505

2021
[35]

Vincent A Traag, Ridho Reinanda, and Gerry van Klinken. 2016. Structure of a media co-occurrence network. InProceedings of ECCS 2014: European MediaGraph: A Network Theoretic Framework to Analyze Reporting Preferences in Indian News Media Conference acronym ’XX, June 03–05, 2018, Woodstock, NY Conference on Complex Systems. Springer, 81–91

2016
[36]

Vincent A Traag, Ludo Waltman, and Nees Jan Van Eck. 2019. From Louvain to Leiden: guaranteeing well-connected communities.Scientific reports9, 1 (2019), 5233

2019
[37]

Jenny S Wang, Samar Haider, Amir Tohidi, Anushkaa Gupta, Yuxuan Zhang, Chris Callison-Burch, David Rothschild, and Duncan J Watts. 2025. Media bias detector: Designing and implementing a tool for real-time selection and framing bias analysis in news coverage. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems. 1–27

2025
[38]

Ze Wang, Zekun Wu, Yichi Zhang, Xin Guan, Navya Jain, Qinyang Lu, Saloni Gupta, and Adriano Koshiyama. 2025. Bias amplification: Large language models as increasingly biased media. InProceedings of the 14th International Joint Conference on Natural Language Processing and the 4th Conference of the Asia- Pacific Chapter of the Association for Computational...

2025
[39]

Wikipedia contributors. 2026. 2020–2021 Indian farmers’ protest. https://en.wik ipedia.org/wiki/2020%E2%80%932021_Indian_farmers%27_protest Accessed: 2026-04-02

2026
[40]

Wikipedia contributors. 2026. 2024–2025 Indian farmers’ protest. https://en.wik ipedia.org/wiki/2024%E2%80%942025_Indian_farmers%27_protest Accessed: 2026-04-02

2026
[41]

Bennett WL. 1990. Toward a theory of press-state relations in the United States. Journal of Communication40, 2 (1990), 103–125

1990
[42]

Dvir Yogev, Criminal Law, and Justice Center. [n. d.]. Measuring Media Bias Toward Reform Prosecutors: A Multi-Method NLP Analysis of Bay Area News Coverage, 2019–2024. ([n. d.]). 8 Appendix 8.1 Detailed Experimental Results We report here our detailed experimental results for the link predic- tion experiments using GraphSAGE, across the four news-sources...

2019