Disentangling Fact from Sentiment: A Dynamic Conflict-Consensus Framework for Multimodal Fake News Detection

Chunlei Meng; Deyue Zhang; Dongdong Yang; Enhao Gu; Mingze Liu; Quanchen Zou; Rongchen Zhao; Weilin Zhou; Xiangzheng Zhang; Zonghao Ying

arxiv: 2512.20670 · v2 · submitted 2025-12-19 · 💻 cs.LG · cs.AI

Disentangling Fact from Sentiment: A Dynamic Conflict-Consensus Framework for Multimodal Fake News Detection

Weilin Zhou , Zonghao Ying , Rongchen Zhao , Chunlei Meng , Quanchen Zou , Deyue Zhang , Enhao Gu , Mingze Liu

show 2 more authors

Dongdong Yang Xiangzheng Zhang

This is my paper

Pith reviewed 2026-05-16 20:57 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords multimodal fake news detectionconflict-consensus frameworkcross-modal discrepanciesfact-sentiment disentanglementinconsistency-seeking paradigmfeature polarizationdynamic fusion

0 comments

The pith

A new framework for multimodal fake news detection amplifies cross-modal contradictions instead of smoothing them away.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Mainstream multimodal fake news detectors rely on consistency-based fusion that treats differences between text and images as noise to be minimized. The paper argues this approach discards the very signals that often indicate fabrication. It introduces the Dynamic Conflict-Consensus Framework to separate fact from sentiment, then actively polarize the representations to extract informative conflicts. Experiments on three real-world datasets show an average accuracy gain of 3.52 percent over prior methods. A reader would care because preserving these discrepancies could lead to more reliable detection of misleading content that mixes real facts with misleading tone.

Core claim

The paper claims that by decoupling inputs into independent Fact and Sentiment spaces and applying physics-inspired feature dynamics to iteratively polarize representations, the Dynamic Conflict-Consensus Framework extracts maximally informative conflicts; standardizing these local discrepancies against global context then supports more robust judgment than consistency-driven alignment.

What carries the argument

The Dynamic Conflict-Consensus Framework (DCCF), which decouples fact and sentiment spaces then uses iterative polarization to amplify rather than suppress cross-modal contradictions.

If this is right

DCCF achieves an average accuracy improvement of 3.52% over state-of-the-art baselines on three real-world datasets.
Separating fact from sentiment allows objective mismatches to be distinguished from emotional dissonance.
Iterative polarization extracts conflicts that would otherwise be smoothed in standard fusion.
Standardizing local discrepancies against global context produces more reliable deliberative judgments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same inconsistency-seeking logic could extend to other multimodal tasks where alignment hides important mismatches, such as medical image-text diagnosis.
Training pipelines might benefit from explicitly rewarding models for detecting contradictions rather than only minimizing them.
Adversarial tests using fabricated but internally consistent multimodal items would provide a direct check on whether the performance gain depends on real-world discrepancy patterns.

Load-bearing premise

Critical cross-modal discrepancies are the primary evidence of fabrication and consistency-based fusion mainly erases useful signals by treating them as noise.

What would settle it

A dataset of fabricated news items engineered to have highly consistent text-image pairs where consistency-based methods achieve higher accuracy than DCCF would falsify the central claim.

Figures

Figures reproduced from arXiv: 2512.20670 by Chunlei Meng, Deyue Zhang, Dongdong Yang, Enhao Gu, Mingze Liu, Quanchen Zou, Rongchen Zhao, Weilin Zhou, Xiangzheng Zhang, Zonghao Ying.

**Figure 2.** Figure 2: The DCCF framework: (a) Fact-Sentiment Feature Extraction projects features into Fact ( [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Analysis of hyperparameter sensitivity. This figure [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: T-SNE visualization of test set features. Same color [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

read the original abstract

Prevalent multimodal fake news detection relies on consistency-based fusion, yet this paradigm fundamentally misinterprets critical cross-modal discrepancies as noise, leading to over-smoothing, which dilutes critical evidence of fabrication. Mainstream consistency-based fusion inherently minimizes feature discrepancies to align modalities, yet this approach fundamentally fails because it inadvertently smoothes out the subtle cross-modal contradictions that serve as the primary evidence of fabrication. To address this, we propose the Dynamic Conflict-Consensus Framework (DCCF), an inconsistency-seeking paradigm designed to amplify rather than suppress contradictions. First, DCCF decouples inputs into independent Fact and Sentiment spaces to distinguish objective mismatches from emotional dissonance. Second, we employ physics-inspired feature dynamics to iteratively polarize these representations, actively extracting maximally informative conflicts. Finally, a conflict-consensus mechanism standardizes these local discrepancies against the global context for robust deliberative judgment.Extensive experiments conducted on three real world datasets demonstrate that DCCF consistently outperforms state-of-the-art baselines, achieving an average accuracy improvement of 3.52\%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DCCF flips the script on multimodal fusion by seeking out cross-modal conflicts as fake news signals, but the 3.52% accuracy claim lacks ablations to show those components actually drive the gains.

read the letter

The main thing here is that this paper wants to treat cross-modal discrepancies as useful evidence of fabrication instead of noise that consistency-based fusion smooths away. DCCF does this by first splitting the input into separate fact and sentiment spaces, then running physics-inspired dynamics to polarize the features and pull conflicts into sharper relief, and finally applying a conflict-consensus step to weigh those local differences against the overall context. That three-stage structure is the clearest new piece; it directly targets the over-smoothing problem the authors flag in prior work. The motivation lands reasonably well because real fabrication often shows up as mismatched signals across modalities, and separating fact from sentiment gives a cleaner way to look for objective mismatches versus emotional ones. The polarization step is an interesting modeling choice that avoids just stacking more attention layers. The soft spot is the evaluation. The abstract reports a 3.52% average accuracy lift over baselines on three datasets, yet there is no ablation that removes the polarization or conflict-consensus modules while keeping everything else fixed. Without that, or without confirmation that baselines were matched on capacity and training schedule, it is hard to credit the inconsistency-seeking parts specifically for the improvement. Dataset details and statistical tests are also thin in the description, so robustness is difficult to judge. This is aimed at people already working on multimodal misinformation detectors who are open to trying a conflict-focused fusion strategy. A reader in that area could extract the decoupling idea and test it, but they would probably have to run their own controls to trust the numbers. I would send it to peer review. The idea is coherent enough that referees can press on the experimental gaps and help clarify what actually moves the needle.

Referee Report

2 major / 1 minor

Summary. The paper proposes the Dynamic Conflict-Consensus Framework (DCCF) for multimodal fake news detection. It critiques consistency-based fusion methods for treating cross-modal discrepancies as noise that leads to over-smoothing and loss of fabrication signals. DCCF decouples inputs into independent Fact and Sentiment spaces, applies physics-inspired iterative polarization to amplify conflicts, and employs a conflict-consensus mechanism to standardize local discrepancies against global context. Experiments on three real-world datasets report an average 3.52% accuracy improvement over state-of-the-art baselines.

Significance. If the reported gains prove robust, the work offers a meaningful paradigm shift from alignment-focused to inconsistency-seeking multimodal fusion. The explicit decoupling of objective fact mismatches from emotional dissonance, combined with polarization dynamics, provides a concrete mechanism for preserving fabrication cues that prior methods suppress. This could influence downstream tasks in misinformation detection and conflict-aware representation learning.

major comments (2)

[Experiments] Experiments section: the headline 3.52% accuracy improvement is presented without any ablation studies that isolate the contribution of the fact-sentiment decoupling or the physics-inspired polarization modules. Baselines must be matched on parameter count and training schedule; without such controls the performance lift cannot be attributed to the inconsistency-seeking components rather than capacity or optimization differences.
[Method] Method section (framework description): the polarization step is described only at the conceptual level ('physics-inspired feature dynamics to iteratively polarize'). The manuscript must supply the precise update equations or pseudocode so that the claimed active extraction of conflicts can be verified and reproduced.

minor comments (1)

[Abstract] Abstract: the phrase 'extensive experiments' should be accompanied by at least the number of runs, reported standard deviation, and a brief statement of statistical testing to support the 3.52% figure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Experiments] Experiments section: the headline 3.52% accuracy improvement is presented without any ablation studies that isolate the contribution of the fact-sentiment decoupling or the physics-inspired polarization modules. Baselines must be matched on parameter count and training schedule; without such controls the performance lift cannot be attributed to the inconsistency-seeking components rather than capacity or optimization differences.

Authors: We agree that ablation studies are needed to isolate the contributions of the fact-sentiment decoupling and polarization modules. In the revised version we will add targeted ablations that remove or replace each component while keeping the rest of the architecture fixed. We will also report parameter counts and training schedules for all baselines to ensure fair comparison, and we will include these controls in the updated experimental tables. revision: yes
Referee: [Method] Method section (framework description): the polarization step is described only at the conceptual level ('physics-inspired feature dynamics to iteratively polarize'). The manuscript must supply the precise update equations or pseudocode so that the claimed active extraction of conflicts can be verified and reproduced.

Authors: We acknowledge that the current description of the polarization dynamics remains at a high level. In the revision we will insert the exact iterative update equations, including the polarization force terms and convergence criteria, together with pseudocode for the full procedure. This will allow direct verification and reproduction of the conflict extraction process. revision: yes

Circularity Check

0 steps flagged

No circularity: DCCF is a proposed architecture, not a self-referential derivation

full rationale

The manuscript introduces DCCF as a new inconsistency-seeking framework that decouples fact/sentiment spaces and applies physics-inspired polarization plus conflict-consensus. No equations or derivations appear that reduce the claimed accuracy gains to fitted parameters, self-definitions, or prior self-citations by construction. Performance results are presented as empirical outcomes on three external datasets rather than as predictions forced by the model's own inputs. The central premise (consistency fusion smooths useful signals) is argued from stated limitations of prior work, not from any internal loop. This is the normal case of an architectural proposal whose validity rests on experimental evidence rather than on any definitional reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; concrete free parameters, axioms, and entities inside the physics-inspired dynamics and conflict-consensus mechanism are not specified.

axioms (1)

domain assumption Cross-modal discrepancies constitute the primary evidence of fabrication
Invoked to justify why consistency-based fusion fails and why amplification is required.

pith-pipeline@v0.9.0 · 5510 in / 1196 out tokens · 25642 ms · 2026-05-16T20:57:58.590041+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

T(t)i,j=(f(t)i−f(t)j)2 ; W(t)i,j=softmaxj(−T(t)i,j) ; f(t+1)i=f(t)i+g(∑W f)
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

physics-inspired feature dynamics to iteratively polarize these representations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

[1]

Bootstrapping multi-view representations for fake news detection,

Q. Ying, X. Hu, Y . Zhou, Z. Qian, D. Zeng, and S. Ge, “Bootstrapping multi-view representations for fake news detection,” inProceedings of the AAAI Conference on Artificial Intelligence, 2023, vol. 37, pp. 5384– 5392

work page 2023
[2]

Seer: Semantic enhancement and emotional reasoning network for multimodal fake news detection,

P. Zhu, Y . Jing, L. Cheng, B. Chen, X. Cui, L. Wu, and K. Tang, “Seer: Semantic enhancement and emotional reasoning network for multimodal fake news detection,”arXiv preprint arXiv:2507.13415, 2025

work page arXiv 2025
[3]

Bridging thoughts and words: Graph-based intent-semantic joint learning for fake news detection,

Z. Wang, Q. Sheng, D. Wang, B. Hu, and J. Cao, “Bridging thoughts and words: Graph-based intent-semantic joint learning for fake news detection,”arXiv preprint arXiv:2509.01660, 2025

work page arXiv 2025
[4]

Prompt- induced linguistic fingerprints for llm-generated fake news detection,

C. Wang, M. Gao, Z. Wang, J. Yin, K. Shu, and C. Lin, “Prompt- induced linguistic fingerprints for llm-generated fake news detection,” arXiv preprint arXiv:2508.12632, 2025

work page arXiv 2025
[5]

Tension-field theory,

David J Steigmann, “Tension-field theory,”Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences, vol. 429, no. 1876, pp. 141–173, 1990

work page 1990
[6]

You only look once: Unified, real-time object detection,

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788

work page 2016
[7]

Senticnet 7: A commonsense-based neurosymbolic ai frame- work for explainable sentiment analysis,

Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, and Kenneth Kwok, “Senticnet 7: A commonsense-based neurosymbolic ai frame- work for explainable sentiment analysis,” inProceedings of the 13th Conference on Language Resources and Evaluation (LREC), 2022, pp. 3829–3839

work page 2022
[8]

Spotfake: A multi-modal framework for fake news detection,

S. Singhal, R. R. Shah, T. Chakraborty, P. Kumaraguru, and S. Satoh, “Spotfake: A multi-modal framework for fake news detection,” in Proceedings of the IEEE 5th International Conference on Multimedia Big Data (BigMM). 2019, pp. 39–47, IEEE

work page 2019
[9]

Cross- modal ambiguity learning for multimodal fake news detection,

Y . Chen, D. Li, P. Zhang, J. Sui, Q. Lv, L. Tun, and L. Shang, “Cross- modal ambiguity learning for multimodal fake news detection,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 2897–2905

work page 2022
[10]

Mvan: Multi-view attention networks for fake news detection on social media,

S. Ni, J. Li, and H.-Y . Kao, “Mvan: Multi-view attention networks for fake news detection on social media,”IEEE Access, vol. 9, pp. 106907– 106917, 2021

work page 2021
[11]

Eann: Event adversarial neural networks for multi-modal fake news detection,

Y . Wang, F. Ma, Z. Jin, Y . Yuan, G. Xun, K. Jha, L. Su, and J. Gao, “Eann: Event adversarial neural networks for multi-modal fake news detection,” inProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 849– 857

work page 2018
[12]

Multimodal fake news detection via clip-guided learning,

Y . Zhou, Y . Yang, Q. Ying, Z. Qian, and X. Zhang, “Multimodal fake news detection via clip-guided learning,” in2023 IEEE International Conference on Multimedia and Expo (ICME). 2023, pp. 2825–2830, IEEE

work page 2023
[13]

Bert: Pre-training of deep bidirectional transformers for language understanding,

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171–4186

work page 2019
[14]

An image is worth 16x16 words: Transformers for image recognition at scale,

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR), 2021

work page 2021
[15]

Safe: Similarity-aware multi-modal fake news detection,

X. Zhou, J. Wu, and R. Zafarani, “Safe: Similarity-aware multi-modal fake news detection,”arXiv preprint arXiv:2003.04981, 2020

work page arXiv 2003
[16]

Modality interactive mixture-of-experts for fake news detection,

Y . Liu, Y . Liu, Z. Li, R. Yao, Y . Zhang, and D. Wang, “Modality interactive mixture-of-experts for fake news detection,” inProceedings of the ACM on Web Conference 2025, 2025, pp. 5139–5150

work page 2025
[17]

Ken: Knowledge augmentation and emotion guidance network for multimodal fake news detection,

P. Zhu, Y . Jing, L. Cheng, K. Tang, and Y . Guo, “Ken: Knowledge augmentation and emotion guidance network for multimodal fake news detection,”arXiv preprint arXiv:2507.09647, 2025

work page arXiv 2025
[18]

Synergizing llms with global label propagation for multimodal fake news detection,

S. Hu, J. Hu, and H. Zhang, “Synergizing llms with global label propagation for multimodal fake news detection,”arXiv preprint arXiv:2506.00488, 2025

work page arXiv 2025
[19]

Masked autoencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked autoencoders are scalable vision learners,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16000–16009

work page 2022
[20]

Chinese clip: Contrastive vision-language pretraining in chinese.arXiv preprint arXiv:2211.01335,

A. Yang, J. Pan, J. Lin, R. Men, Y . Zhang, J. Zhou, and C. Zhou, “Chinese clip: Contrastive vision-language pretraining in chinese,”arXiv preprint arXiv:2211.01335, 2022

work page arXiv 2022

[1] [1]

Bootstrapping multi-view representations for fake news detection,

Q. Ying, X. Hu, Y . Zhou, Z. Qian, D. Zeng, and S. Ge, “Bootstrapping multi-view representations for fake news detection,” inProceedings of the AAAI Conference on Artificial Intelligence, 2023, vol. 37, pp. 5384– 5392

work page 2023

[2] [2]

Seer: Semantic enhancement and emotional reasoning network for multimodal fake news detection,

P. Zhu, Y . Jing, L. Cheng, B. Chen, X. Cui, L. Wu, and K. Tang, “Seer: Semantic enhancement and emotional reasoning network for multimodal fake news detection,”arXiv preprint arXiv:2507.13415, 2025

work page arXiv 2025

[3] [3]

Bridging thoughts and words: Graph-based intent-semantic joint learning for fake news detection,

Z. Wang, Q. Sheng, D. Wang, B. Hu, and J. Cao, “Bridging thoughts and words: Graph-based intent-semantic joint learning for fake news detection,”arXiv preprint arXiv:2509.01660, 2025

work page arXiv 2025

[4] [4]

Prompt- induced linguistic fingerprints for llm-generated fake news detection,

C. Wang, M. Gao, Z. Wang, J. Yin, K. Shu, and C. Lin, “Prompt- induced linguistic fingerprints for llm-generated fake news detection,” arXiv preprint arXiv:2508.12632, 2025

work page arXiv 2025

[5] [5]

Tension-field theory,

David J Steigmann, “Tension-field theory,”Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences, vol. 429, no. 1876, pp. 141–173, 1990

work page 1990

[6] [6]

You only look once: Unified, real-time object detection,

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi, “You only look once: Unified, real-time object detection,” inProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788

work page 2016

[7] [7]

Senticnet 7: A commonsense-based neurosymbolic ai frame- work for explainable sentiment analysis,

Erik Cambria, Qian Liu, Sergio Decherchi, Frank Xing, and Kenneth Kwok, “Senticnet 7: A commonsense-based neurosymbolic ai frame- work for explainable sentiment analysis,” inProceedings of the 13th Conference on Language Resources and Evaluation (LREC), 2022, pp. 3829–3839

work page 2022

[8] [8]

Spotfake: A multi-modal framework for fake news detection,

S. Singhal, R. R. Shah, T. Chakraborty, P. Kumaraguru, and S. Satoh, “Spotfake: A multi-modal framework for fake news detection,” in Proceedings of the IEEE 5th International Conference on Multimedia Big Data (BigMM). 2019, pp. 39–47, IEEE

work page 2019

[9] [9]

Cross- modal ambiguity learning for multimodal fake news detection,

Y . Chen, D. Li, P. Zhang, J. Sui, Q. Lv, L. Tun, and L. Shang, “Cross- modal ambiguity learning for multimodal fake news detection,” in Proceedings of the ACM Web Conference 2022, 2022, pp. 2897–2905

work page 2022

[10] [10]

Mvan: Multi-view attention networks for fake news detection on social media,

S. Ni, J. Li, and H.-Y . Kao, “Mvan: Multi-view attention networks for fake news detection on social media,”IEEE Access, vol. 9, pp. 106907– 106917, 2021

work page 2021

[11] [11]

Eann: Event adversarial neural networks for multi-modal fake news detection,

Y . Wang, F. Ma, Z. Jin, Y . Yuan, G. Xun, K. Jha, L. Su, and J. Gao, “Eann: Event adversarial neural networks for multi-modal fake news detection,” inProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 849– 857

work page 2018

[12] [12]

Multimodal fake news detection via clip-guided learning,

Y . Zhou, Y . Yang, Q. Ying, Z. Qian, and X. Zhang, “Multimodal fake news detection via clip-guided learning,” in2023 IEEE International Conference on Multimedia and Expo (ICME). 2023, pp. 2825–2830, IEEE

work page 2023

[13] [13]

Bert: Pre-training of deep bidirectional transformers for language understanding,

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” inProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), 2019, pp. 4171–4186

work page 2019

[14] [14]

An image is worth 16x16 words: Transformers for image recognition at scale,

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” in International Conference on Learning Representations (ICLR), 2021

work page 2021

[15] [15]

Safe: Similarity-aware multi-modal fake news detection,

X. Zhou, J. Wu, and R. Zafarani, “Safe: Similarity-aware multi-modal fake news detection,”arXiv preprint arXiv:2003.04981, 2020

work page arXiv 2003

[16] [16]

Modality interactive mixture-of-experts for fake news detection,

Y . Liu, Y . Liu, Z. Li, R. Yao, Y . Zhang, and D. Wang, “Modality interactive mixture-of-experts for fake news detection,” inProceedings of the ACM on Web Conference 2025, 2025, pp. 5139–5150

work page 2025

[17] [17]

Ken: Knowledge augmentation and emotion guidance network for multimodal fake news detection,

P. Zhu, Y . Jing, L. Cheng, K. Tang, and Y . Guo, “Ken: Knowledge augmentation and emotion guidance network for multimodal fake news detection,”arXiv preprint arXiv:2507.09647, 2025

work page arXiv 2025

[18] [18]

Synergizing llms with global label propagation for multimodal fake news detection,

S. Hu, J. Hu, and H. Zhang, “Synergizing llms with global label propagation for multimodal fake news detection,”arXiv preprint arXiv:2506.00488, 2025

work page arXiv 2025

[19] [19]

Masked autoencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked autoencoders are scalable vision learners,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 16000–16009

work page 2022

[20] [20]

Chinese clip: Contrastive vision-language pretraining in chinese.arXiv preprint arXiv:2211.01335,

A. Yang, J. Pan, J. Lin, R. Men, Y . Zhang, J. Zhou, and C. Zhou, “Chinese clip: Contrastive vision-language pretraining in chinese,”arXiv preprint arXiv:2211.01335, 2022

work page arXiv 2022