NewsLens: A Multi-Agent Framework for Adversarial News Bias Navigation

Joy Bose

REVIEW 3 major objections 2 minor 18 references

Reviewed by Pith at T0; open to challenge.

T0 means a machine referee read the full paper against a public rubric. The mark states how deep the mechanical check went, never who wrote it. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

A five-agent LLM pipeline turns news articles into framing maps that reveal specific omissions and manipulations instead of just labeling outlets as biased.

2026-05-20 13:27 UTC pith:HO5S5NIN

load-bearing objection NewsLens gives a clear five-agent LLM pipeline for turning news into framing maps, but the whole thing rests on agents that have no external check for accuracy. the 3 major comments →

arxiv 2605.17364 v1 pith:HO5S5NIN submitted 2026-05-17 cs.CL cs.IR

NewsLens: A Multi-Agent Framework for Adversarial News Bias Navigation

Joy Bose This is my paper

classification cs.CL cs.IR

keywords media biasmulti-agent systemsLLM agentsframing analysisnews bias navigationpropaganda detectionadversarial pipelineideological omissions

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper contends that simply tagging an article or source with a political label stops short of showing how bias actually works in practice. NewsLens instead runs five agents in an adversarial setup: one verifies facts, two analyze the story from progressive and conservative angles, one flags propaganda techniques, and one produces a neutral summary. These agents together generate framing maps that highlight what gets left out, how language steers the reader, and where the boundaries of each perspective lie. A sympathetic reader would care because this gives a concrete way to compare versions of the same event and see structural gaps rather than relying on vague impressions of slant. The authors demonstrate the approach on fifteen articles covering four different geopolitical topics and report measurable differences in how center and conservative outlets perform on their divergence and manipulation scores.

Core claim

NewsLens is a five-agent adversarial pipeline consisting of a Fact Verifier, Progressive Framing Analyst, Conservative Framing Analyst, Propaganda Detector, and Neutral Summarizer. The agents collaborate to break down a news article into an interpretable framing map that surfaces ideological omissions, rhetorical manipulation, and the limits of each framing. When tested on fifteen articles across Kashmir, Gaza, climate policy, and Ukraine clusters using quantized open-weight models, center outlets produced the highest mean Perspective Divergence Scores while conservative outlets showed the highest mean Manipulation Index; cross-model runs on the Kashmir subset remained consistent on high-man

What carries the argument

The five-agent adversarial pipeline in which specialized agents with opposing viewpoints jointly produce a framing map from a single article.

Load-bearing premise

The specialized LLM agents can carry out fact verification, progressive and conservative framing analysis, and propaganda detection reliably enough that their outputs do not introduce new systematic biases or hallucinations into the framing maps.

What would settle it

A side-by-side comparison in which human experts annotate the same articles for omissions and manipulations and then measure how closely the generated framing maps match those annotations.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

Center outlets register the highest Perspective Divergence Scores across the tested clusters.
Conservative-framing outlets register the highest Manipulation Index values.
High-propaganda articles show stable scores across different models while nuanced articles show more variance.
Removing the Propaganda Detector agent lowers the precision with which the Neutral Summarizer identifies omissions.
The architecture extends earlier lexical methods for bias analysis by adding agent-based reasoning steps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline could be applied to compare coverage of the same story across languages or social media threads to surface cross-cultural framing differences.
News readers might use the resulting maps as a side-by-side reference when deciding which sources to trust on a developing event.
Educators could incorporate the maps into lessons that train students to spot how word choice and selection shape understanding of a topic.
Future extensions might add agents representing additional viewpoints to test whether greater diversity of agents reduces any single-model skew in the maps.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Desk Editor's Note

NewsLens gives a clear five-agent LLM pipeline for turning news into framing maps, but the whole thing rests on agents that have no external check for accuracy.

read the letter

The paper's core idea is a five-agent setup that breaks articles into framing maps to show omissions and manipulations instead of just slapping on a bias label. They run Fact Verifier, Progressive and Conservative Analysts, Propaganda Detector, and Neutral Summarizer on 15 articles from four geopolitical clusters using Qwen2.5-3B and some Mistral cross-checks on one cluster. Center outlets score high on Perspective Divergence and conservative ones on Manipulation Index, with honest notes that Mann-Whitney tests show nothing significant at this size plus a power analysis to back it up. A partial ablation on the propaganda detector is included too. The code runs on open models in Colab with no API keys, which keeps it practical. This extends earlier lexical and geometric bias methods into collaborative LLM reasoning, and the architecture itself is described plainly enough to reproduce. The weak point is that none of the maps are checked against human judgments or primary sources. The metrics come straight from the agents' own outputs, so if the agents hallucinate or inject their own slant the results just echo that. With n=15 and a 3B model the consistency across models on obvious propaganda cases is noted, but that does not prove the omissions are real rather than model artifacts. Readers working on media literacy tools or quick prototypes for computational journalism would find the pipeline useful as a starting point. It is not a foundational result, but the setup is concrete and the limitations are stated without spin. I would send it to peer review so referees can ask for human validation or larger tests that would make the framing maps more credible.

Referee Report

3 major / 2 minor

Summary. The paper introduces NewsLens, a five-agent adversarial pipeline (Fact Verifier, Progressive Framing Analyst, Conservative Framing Analyst, Propaganda Detector, Neutral Summarizer) that collaborates to deconstruct news articles into interpretable framing maps exposing ideological omissions, rhetorical manipulation, and framing boundaries. It evaluates the system on 15 articles from four geopolitical clusters (India-Pakistan Kashmir, Gaza, Climate Policy, Ukraine) using Qwen2.5-3B-Instruct with cross-model validation on Mistral 7B for the Kashmir cluster, reporting Perspective Divergence Score (PDS) and Manipulation Index (MI) metrics, a partial ablation on the Propaganda Detector, and noting no statistically significant between-group differences due to small sample size.

Significance. If the specialized agents prove reliable, the work meaningfully extends media-bias research beyond classification tasks to structured, multi-perspective deconstruction of framing and omissions. The emphasis on open-weight models, full reproducibility without API keys, and honest reporting of sample-size limitations are strengths that align with calls for transparent LLM-based analysis in computational linguistics.

major comments (3)

[Evaluation] Evaluation section (n=15 articles): The central claims that the pipeline exposes ideological omissions and framing boundaries depend on the fidelity of the five LLM agents, yet no human annotation study, comparison against established media-bias corpora, or verification against primary sources is reported. PDS (e.g., Qwen 0.907 on Kashmir) and MI metrics are derived directly from agent outputs, leaving the interpretability results vulnerable to model artifacts or hallucinations.
[Evaluation] Evaluation section, cross-model validation: Validation is restricted to the Kashmir cluster with Mistral 7B; greater variance noted for nuanced reporting but without quantitative consistency metrics across all four clusters, this limits support for the claim of high consistency on high-propaganda content.
[Ablation study] Ablation study: The partial ablation removing the Propaganda Detector reports degraded omission precision in the Neutral Summarizer, but the manuscript provides neither the exact definition nor quantitative values for 'omission precision,' nor error bars, making it difficult to assess the agent's specific contribution to the framing maps.

minor comments (2)

[Abstract] Abstract: The post-hoc power analysis is mentioned as confirming sample-size limitation but the actual power value or effect-size statistics are not reported, which would help readers assess the Mann-Whitney U results.
[Related Work] The manuscript could strengthen the related-work discussion by explicitly contrasting the agentic framing-map approach with prior lexical-geometric bias methods cited in the abstract.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which identifies key areas for strengthening the evaluation of our multi-agent framework. We respond to each major comment below, indicating where revisions will be made and providing clarifications where we maintain our original approach.

read point-by-point responses

Referee: [Evaluation] Evaluation section (n=15 articles): The central claims that the pipeline exposes ideological omissions and framing boundaries depend on the fidelity of the five LLM agents, yet no human annotation study, comparison against established media-bias corpora, or verification against primary sources is reported. PDS (e.g., Qwen 0.907 on Kashmir) and MI metrics are derived directly from agent outputs, leaving the interpretability results vulnerable to model artifacts or hallucinations.

Authors: We agree that the evaluation relies on metrics computed from agent outputs without accompanying human annotation or direct comparison to established media-bias corpora. This leaves open the possibility of model-specific artifacts. In the revised manuscript we will expand the limitations and future work sections to explicitly acknowledge this gap and the risk of hallucinations, while noting that the observed cross-model consistency on high-propaganda items offers preliminary evidence of robustness. We do not claim human-validated fidelity at this stage. revision: partial
Referee: [Evaluation] Evaluation section, cross-model validation: Validation is restricted to the Kashmir cluster with Mistral 7B; greater variance noted for nuanced reporting but without quantitative consistency metrics across all four clusters, this limits support for the claim of high consistency on high-propaganda content.

Authors: Cross-model validation was performed on the Kashmir cluster to manage computational cost while testing a high-stakes geopolitical topic. We reported qualitative patterns of consistency. We will add quantitative consistency metrics (mean delta-PDS and delta-MI with standard deviations) for the remaining clusters in the revised evaluation section to better support the consistency claim. revision: yes
Referee: [Ablation study] Ablation study: The partial ablation removing the Propaganda Detector reports degraded omission precision in the Neutral Summarizer, but the manuscript provides neither the exact definition nor quantitative values for 'omission precision,' nor error bars, making it difficult to assess the agent's specific contribution to the framing maps.

Authors: We accept that the ablation description is underspecified. In the revision we will define omission precision explicitly (the fraction of facts omitted from the neutral summary that the Propaganda Detector had flagged as manipulative in the full pipeline), report the observed quantitative values, and include error bars or per-article variance to allow readers to evaluate the agent's contribution. revision: yes

Circularity Check

0 steps flagged

No circularity: evaluation metrics derived from external article outputs, not self-referential fits or definitions

full rationale

The paper describes a multi-agent pipeline and reports empirical metrics (PDS, MI) computed directly from agent outputs on a set of 15 held-out articles across geopolitical clusters, with cross-model checks and a partial ablation. No equations, fitted parameters, or derivations are presented that reduce the reported results to inputs by construction. The architecture is described as extending prior lexical-geometric work, but this extension is not invoked as a uniqueness theorem or load-bearing ansatz for the central claims. The evaluation uses open-weight models on concrete articles rather than renaming known patterns or smuggling assumptions via self-citation chains. The derivation chain is therefore self-contained and independent of the target outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the assumption that current open LLMs can be prompted into reliable specialized roles for bias deconstruction; no free parameters are explicitly fitted in the abstract, but the metrics themselves may embed design choices. No new physical entities are postulated.

axioms (1)

domain assumption LLM agents can be reliably specialized via prompting to perform fact verification, framing analysis, and propaganda detection without introducing uncontrolled bias.
Invoked in the description of the five-agent collaboration and evaluation on real articles.

pith-pipeline@v0.9.0 · 5830 in / 1422 out tokens · 52994 ms · 2026-05-20T13:27:13.304910+00:00 · methodology

0 comments

read the original abstract

Media bias detection has predominantly been framed as a classification task: assign a political label to an article or outlet. We argue this framing is too shallow: it identifies that bias exists but not where, how, or crucially, what is structurally omitted. We present NewsLens, a five-agent adversarial pipeline for structured news bias navigation. A Fact Verifier, Progressive Framing Analyst, Conservative Framing Analyst, Propaganda Detector, and Neutral Summarizer collaborate to deconstruct articles into interpretable framing maps, exposing ideological omissions, rhetorical manipulation, and framing boundaries. The system is evaluated on 15 articles across four geopolitical event clusters (India-Pakistan Kashmir, Gaza, Climate Policy, Ukraine) using Qwen2.5-3B-Instruct (4-bit quantised, Google Colab T4), with cross-model validation using Mistral 7B on the Kashmir cluster. Center outlets show the highest mean Perspective Divergence Score (PDS: Qwen 0.907, Mistral 0.729 on Kashmir subset); conservative-framing outlets show the highest mean Manipulation Index (MI: 0.600 across both models). Cross-model comparison shows high consistency for high-propaganda content (Republic World delta-PDS=0.125, MI=0.8 both models) and greater variance for nuanced reporting. Mann-Whitney U tests find no statistically significant between-group differences at n=15, reported honestly as a sample-size limitation confirmed by post-hoc power analysis. A partial ablation removing the Propaganda Detector shows degraded omission precision in the Neutral Summarizer output. The architecture extends prior lexical-geometric bias work to agentic LLM reasoning, and is fully reproducible using open-weight models without API keys.

Figures

Figures reproduced from arXiv: 2605.17364 by Joy Bose.

**Figure 1.** Figure 1: NewsLens five-agent pipeline. Progressive and Conservative Analysts operate independently on the same article text to prevent anchoring. All agents receive only the article, not upstream outputs, to prevent context contamination, which we found causes hallucination in sub-7B models. 3.1 Agent Descriptions Fact Verifier: extracts verified core events, flags contested claims with veracity scores (Verified / … view at source ↗

**Figure 2.** Figure 2: NewsLens message passing schema. Shows the full JSON message [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 1 internal anchor

[1]

A., & Bose, J

Patankar, A. A., & Bose, J. (2017). Bias discovery in news articles using word vectors. Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 525-530. https://doi.org/10.1109/ICMLA.2017.00094

work page doi:10.1109/icmla.2017.00094 2017
[2]

A., Bose, J., & Khanna, H

Patankar, A. A., Bose, J., & Khanna, H. (2019). A bias aware news recommendation system. Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), 232 -

work page 2019
[3]

https://doi.org/10.1109/icosc.2019.8665610

work page doi:10.1109/icosc.2019.8665610 2019
[4]

Gentzkow, M., & Shapiro, J. M. (2010). What drives media slant? Evidence from US daily newspapers. Econometrica, 78(1), 35-71. https://web.stanford.edu/~gentzkow/research/biasmeas.pdf

work page 2010
[5]

Motoki, F., Pinho Neto, V., & Rodrigues, V. (2024). More human than human: measuring ChatGPT political bias. Public Choice, 198(1), 3-23

work page 2024
[6]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2023). Improving factuality and reasoning in language models through multiagent debate, 2023. URL https://arxiv. org/abs/2305.14325, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023
[7]

Fine-Grained Analysis of Propaganda in News Articles

Da San Martino, G., Yu, S., Barrón-Cedeno, A., Petrov, R., & Nakov, P. (2019, November). Fine- grained analysis of propaganda in news articles. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5636-5646). https://doi.org/1...

work page doi:10.18653/v1/d19-1565 2019
[8]

(2020, December)

Da San Martino, G., Barrón -Cedeño, A., Wachsmuth, H., Petrov, R., & Nakov, P. (2020, December). SemEval -2020 task 11: Detection of propaganda techniques in news articles. In Proceedings of the fourteenth workshop on semantic evaluation (pp. 1377 -1414). https://doi.org/10.18653/v1/2020.semeval-1.186

work page doi:10.18653/v1/2020.semeval-1.186 2020
[9]

Entman, R. M. (1993). Framing: Towards clarification of a fractured paradigm. McQuail's reader in mass communication theory, 390, 397

work page 1993
[10]

Q., Yang, K., Lee, S., Li, H., Chu, Y., Lin, Y., & Liu, H

Peng, T. Q., Yang, K., Lee, S., Li, H., Chu, Y., Lin, Y., & Liu, H. (2026). Beyond partisan leaning: A comparative analysis of political bias in large language models. Journal of Information Technology & Politics, 1-18. https://doi.org/10.48550/arxiv.2412.16746

work page doi:10.48550/arxiv.2412.16746 2026
[11]

Shu, M., Karell, D., Okura, K., & Davidson, T. R. (2026). How latent and prompting biases in AI-generated historical narratives influence opinions. PNAS nexus, 5(3), pgag022. https://doi.org/10.1093/pnasnexus/pgag022

work page doi:10.1093/pnasnexus/pgag022 2026
[12]

Hao, J., Ding, H., Xu, Y., Sun, T., Chen, R., Zhang, W., ... & Li, S. (2026). Game-Theoretic Lens on LLM -based Multi -Agent Systems. arXiv preprint arXiv:2601.15047. https://doi.org/10.48550/arXiv.2601.15047

work page doi:10.48550/arxiv.2601.15047 2026
[13]

Timon M J Hruschka, Markus Appel, Reducing political polarization through conversations with artificial intelligence, Journal of Computer -Mediated Communication, Volume 31, Issue 2, March 2026, zmag003, https://doi.org/10.1093/jcmc/zmag003

work page doi:10.1093/jcmc/zmag003 2026
[14]

Zabihi, P., Nawara, D., Ibrahim, A., & Kashef, R. (2026). Analyzing Bias in LLM -Augmented Knowledge Graph Systems: Taxonomy, Interaction Mechanisms, and Evaluation. Applied Sciences, 16(7), 3410

work page 2026
[15]

Ollama. (2024). Open-source local LLM inference. https://ollama.com

work page 2024
[16]

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient finetuning of quantized LLMs. Advances in Neural Information Processing Systems (NeurIPS), 36

work page 2023
[17]

Halperin, I. (2025). Prompt -response semantic divergence metrics for faithfulness hallucination and misalignment detection in large language models. arXiv preprint arXiv:2508.10192

work page arXiv 2025
[18]

verified_core_events

Nachar, N. (2008). The Mann-Whitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in quantitative Methods for Psychology, 4(1), 13-20. Appendix A: Agent System Prompts All agents use few-shot prompting with one complete worked example. The example is drawn from a neutral topic (not from the evaluation c...

work page 2008

[1] [1]

A., & Bose, J

Patankar, A. A., & Bose, J. (2017). Bias discovery in news articles using word vectors. Proceedings of the 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), 525-530. https://doi.org/10.1109/ICMLA.2017.00094

work page doi:10.1109/icmla.2017.00094 2017

[2] [2]

A., Bose, J., & Khanna, H

Patankar, A. A., Bose, J., & Khanna, H. (2019). A bias aware news recommendation system. Proceedings of the 2019 IEEE 13th International Conference on Semantic Computing (ICSC), 232 -

work page 2019

[3] [3]

https://doi.org/10.1109/icosc.2019.8665610

work page doi:10.1109/icosc.2019.8665610 2019

[4] [4]

Gentzkow, M., & Shapiro, J. M. (2010). What drives media slant? Evidence from US daily newspapers. Econometrica, 78(1), 35-71. https://web.stanford.edu/~gentzkow/research/biasmeas.pdf

work page 2010

[5] [5]

Motoki, F., Pinho Neto, V., & Rodrigues, V. (2024). More human than human: measuring ChatGPT political bias. Public Choice, 198(1), 3-23

work page 2024

[6] [6]

Improving Factuality and Reasoning in Language Models through Multiagent Debate

Du, Y., Li, S., Torralba, A., Tenenbaum, J. B., & Mordatch, I. (2023). Improving factuality and reasoning in language models through multiagent debate, 2023. URL https://arxiv. org/abs/2305.14325, 3

work page internal anchor Pith review Pith/arXiv arXiv 2023

[7] [7]

Fine-Grained Analysis of Propaganda in News Articles

Da San Martino, G., Yu, S., Barrón-Cedeno, A., Petrov, R., & Nakov, P. (2019, November). Fine- grained analysis of propaganda in news articles. In Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP) (pp. 5636-5646). https://doi.org/1...

work page doi:10.18653/v1/d19-1565 2019

[8] [8]

(2020, December)

Da San Martino, G., Barrón -Cedeño, A., Wachsmuth, H., Petrov, R., & Nakov, P. (2020, December). SemEval -2020 task 11: Detection of propaganda techniques in news articles. In Proceedings of the fourteenth workshop on semantic evaluation (pp. 1377 -1414). https://doi.org/10.18653/v1/2020.semeval-1.186

work page doi:10.18653/v1/2020.semeval-1.186 2020

[9] [9]

Entman, R. M. (1993). Framing: Towards clarification of a fractured paradigm. McQuail's reader in mass communication theory, 390, 397

work page 1993

[10] [10]

Q., Yang, K., Lee, S., Li, H., Chu, Y., Lin, Y., & Liu, H

Peng, T. Q., Yang, K., Lee, S., Li, H., Chu, Y., Lin, Y., & Liu, H. (2026). Beyond partisan leaning: A comparative analysis of political bias in large language models. Journal of Information Technology & Politics, 1-18. https://doi.org/10.48550/arxiv.2412.16746

work page doi:10.48550/arxiv.2412.16746 2026

[11] [11]

Shu, M., Karell, D., Okura, K., & Davidson, T. R. (2026). How latent and prompting biases in AI-generated historical narratives influence opinions. PNAS nexus, 5(3), pgag022. https://doi.org/10.1093/pnasnexus/pgag022

work page doi:10.1093/pnasnexus/pgag022 2026

[12] [12]

Hao, J., Ding, H., Xu, Y., Sun, T., Chen, R., Zhang, W., ... & Li, S. (2026). Game-Theoretic Lens on LLM -based Multi -Agent Systems. arXiv preprint arXiv:2601.15047. https://doi.org/10.48550/arXiv.2601.15047

work page doi:10.48550/arxiv.2601.15047 2026

[13] [13]

Timon M J Hruschka, Markus Appel, Reducing political polarization through conversations with artificial intelligence, Journal of Computer -Mediated Communication, Volume 31, Issue 2, March 2026, zmag003, https://doi.org/10.1093/jcmc/zmag003

work page doi:10.1093/jcmc/zmag003 2026

[14] [14]

Zabihi, P., Nawara, D., Ibrahim, A., & Kashef, R. (2026). Analyzing Bias in LLM -Augmented Knowledge Graph Systems: Taxonomy, Interaction Mechanisms, and Evaluation. Applied Sciences, 16(7), 3410

work page 2026

[15] [15]

Ollama. (2024). Open-source local LLM inference. https://ollama.com

work page 2024

[16] [16]

Dettmers, T., Pagnoni, A., Holtzman, A., & Zettlemoyer, L. (2023). QLoRA: Efficient finetuning of quantized LLMs. Advances in Neural Information Processing Systems (NeurIPS), 36

work page 2023

[17] [17]

Halperin, I. (2025). Prompt -response semantic divergence metrics for faithfulness hallucination and misalignment detection in large language models. arXiv preprint arXiv:2508.10192

work page arXiv 2025

[18] [18]

verified_core_events

Nachar, N. (2008). The Mann-Whitney U: A test for assessing whether two independent samples come from the same distribution. Tutorials in quantitative Methods for Psychology, 4(1), 13-20. Appendix A: Agent System Prompts All agents use few-shot prompting with one complete worked example. The example is drawn from a neutral topic (not from the evaluation c...

work page 2008