pith. sign in

arxiv: 2606.03348 · v1 · pith:5GTEEVRCnew · submitted 2026-06-02 · 💻 cs.CV · cs.AI

SynCred-Bench: Benchmarking Synthetic Credibility in AI-Generated Visual Misinformation

Pith reviewed 2026-06-28 11:14 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords synthetic credibilityAI-generated misinformationvisual misinformationbenchmarkMLLM evaluationAIGC detectionfalse positive ratecredible-form categories
0
0 comments X

The pith

Existing detectors and humans fail to spot most AI-generated images that mimic credible sources.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates SYNCRED-Bench, a set of 600 AI-generated images that embed realistic text and layouts to look like credible news or reports, balanced across six form categories and seven circulation styles, plus a set of 450 real images to measure false alarms. Tests on this benchmark show that 15 multimodal large language models reach only 10.5 percent true positive rate when false positives are capped at 5 percent. Open-source AIGC detectors stay below 5 percent, commercial APIs reach 57.6 percent, and human annotators reach 63 percent. These results indicate that synthetic credibility forms a distinct visual misinformation threat that current tools do not handle reliably.

Core claim

SYNCRED-Bench supplies 600 AI-generated misinformation images balanced across six credible-form categories and seven fine-grained circulation styles together with FP450 real-image negatives. Under a 5 percent false-positive-rate constraint the benchmark shows 15 MLLMs achieve only 10.5 percent true positive rate, open-source AIGC detectors less than 5 percent, commercial APIs 57.6 percent, and human annotators 63 percent, establishing synthetic credibility as a severe underexplored challenge that requires detectors able to reason beyond superficial cues.

What carries the argument

SYNCRED-Bench, a balanced collection of 600 AI-generated images across credible-form categories and circulation styles paired with real negative examples for false-positive control.

If this is right

  • Current multimodal large language models remain inadequate for identifying synthetic credibility at practical false-positive levels.
  • Open-source AIGC detectors perform markedly worse than commercial APIs on this task.
  • Human annotators also fail to reach high accuracy, showing the difficulty is not limited to automated systems.
  • Effective detectors will need to examine deeper credibility reasoning instead of relying on visual artifacts alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use of such images could increase the reach of misinformation formatted to resemble legitimate news sources.
  • The benchmark's circulation styles point to risks in social-media and news-sharing environments where these fakes would appear.
  • Development of new detectors focused on text-layout consistency and source plausibility becomes a direct next step.

Load-bearing premise

The 600 generated images and 450 real negatives form a representative and unbiased test of the synthetic credibility threat.

What would settle it

A detection method that reaches substantially above 57.6 percent true positive rate at 5 percent false positive rate on the SYNCRED-Bench set while remaining stable on additional real images.

Figures

Figures reproduced from arXiv: 2606.03348 by Haoran Liu, Hongning Wang, Junxiao Yang, Minghao Zhang, Minlie Huang, Shiyao Cui, Xiaoce Wang.

Figure 1
Figure 1. Figure 1: Example of synthetic credibility: The image [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of SYNCRED-BENCH. imposed on otherwise authentic photographs (Liu et al., 2025). The challenge is that these gener￾ated artifacts draw their persuasiveness from two interrelated credibility traits, as illustrated in Fig￾ure 1. First, credible form refers to their imi￾tation of visual formats associated with authori￾tative or formal communication genres, such as news layouts and government notices,… view at source ↗
Figure 3
Figure 3. Figure 3: False-negative rationale cues for MLLM judges. Bars show aggregate non-exclusive cue frequencies, [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: TPR results under increasing FPR budgets [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: TPR change of each circulation style relative to Native Rendering for closed-source MLLM judges. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Example metadata record. The prompt text is translated to English for readability. [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
Figure 10
Figure 10. Figure 10: Examples by circulation style. Some images offer different circulation style variants for the same [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Examples by artifact type. Each row shows one randomly sampled example from an artifact type. The [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Examples of document images grouped by artifact type and image provenance. The top row shows [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Examples of document images grouped by circulation and capture style. The top row shows AI [PITH_FULL_IMAGE:figures/full_fig_p021_13.png] view at source ↗
read the original abstract

Recent generative models can now produce visual artifacts with realistic embedded text and layouts, creating a new misinformation threat: synthetic credibility. We introduce SYNCRED-Bench, a benchmark of 600 AI-generated misinformation images balanced across six credible-form categories and seven fine-grained circulation styles, together with FP450, a real-image negative set for measuring false positives. Extensive evaluation shows that existing systems remain unreliable: under a 5% false-positive-rate constraint, 15 MLLMs achieve only 10.5% true positive rate (TPR), open-source AIGC detectors achieve less than 5%, and commercial APIs reach 57.6%. Human annotators also struggled to identify synthetic credibility, reaching only 63% TPR. These findings establish synthetic credibility as a severe and underexplored visual misinformation challenge, and provide a benchmark for developing detectors that reason beyond superficial credibility cues.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SynCred-Bench, a benchmark consisting of 600 AI-generated misinformation images balanced across six credible-form categories and seven fine-grained circulation styles, paired with the FP450 real-image negative set. It evaluates 15 MLLMs, open-source AIGC detectors, commercial APIs, and human annotators for synthetic credibility detection, reporting TPRs of 10.5%, less than 5%, 57.6%, and 63% respectively under a fixed 5% FPR constraint, and concludes that this constitutes a severe and underexplored challenge for existing detection systems.

Significance. If the benchmark distribution is representative of real-world conditions, the low TPR results would demonstrate a meaningful gap in current detectors' ability to handle AI-generated images with realistic embedded text and layouts. The release of a new, publicly usable benchmark dataset with controlled category and style axes is a concrete contribution that can support future detector development.

major comments (2)
  1. [§3 (Benchmark Construction)] §3 (Benchmark Construction): The 600-image set is stated to be balanced across six credible-form categories and seven circulation styles, yet the manuscript supplies no frequency statistics drawn from real misinformation corpora, no external validation of category prevalence, and no ablation showing that the reported TPRs remain stable under re-weighting to observed real-world distributions. Because the central claim that existing detectors are unreliable on synthetic credibility (and that the threat is severe) rests on this set being representative, the absence of such grounding is load-bearing.
  2. [§4 (Evaluation Protocol)] §4 (Evaluation Protocol): The 5% FPR operating point is used to report all TPR numbers, but the manuscript does not detail how thresholds were chosen on the FP450 negative set or whether per-category or per-style calibration was performed; without this, it is unclear whether the aggregate 10.5% MLLM TPR (or the <5% open-source figure) could shift materially under different negative-set constructions.
minor comments (2)
  1. [Table 2] Table 2: The per-model TPR columns would be easier to interpret if they also reported the exact number of images per category on which each model was evaluated.
  2. The abstract's phrasing that the images are 'balanced across' categories would be strengthened by an explicit statement of the per-category count (e.g., 100 images each) in the main text.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments. Below we respond point-by-point to the two major comments, indicating planned revisions where appropriate. We have aimed to strengthen the manuscript without overstating what the current benchmark can claim.

read point-by-point responses
  1. Referee: [§3 (Benchmark Construction)] The 600-image set is stated to be balanced across six credible-form categories and seven circulation styles, yet the manuscript supplies no frequency statistics drawn from real misinformation corpora, no external validation of category prevalence, and no ablation showing that the reported TPRs remain stable under re-weighting to observed real-world distributions. Because the central claim that existing detectors are unreliable on synthetic credibility (and that the threat is severe) rests on this set being representative, the absence of such grounding is load-bearing.

    Authors: We agree that the manuscript lacks frequency statistics from real misinformation corpora and external validation of prevalence. The benchmark was constructed to ensure coverage across a diverse set of credible-form categories and circulation styles drawn from observed patterns in recent AI-generated misinformation, rather than to match empirical prevalence distributions. The central claim concerns the existence of a detection gap on images exhibiting synthetic credibility, which is demonstrated by the uniformly low TPRs across the balanced axes. In revision we will expand §3 to (1) cite the literature sources used for category and style selection, (2) explicitly state that the set is not prevalence-weighted, and (3) add a limitations paragraph together with a sensitivity analysis that re-weights the reported TPRs under several hypothetical real-world distributions. We cannot supply the requested frequency statistics, as they would require a separate large-scale corpus study outside the scope of this benchmark paper. revision: partial

  2. Referee: [§4 (Evaluation Protocol)] The 5% FPR operating point is used to report all TPR numbers, but the manuscript does not detail how thresholds were chosen on the FP450 negative set or whether per-category or per-style calibration was performed; without this, it is unclear whether the aggregate 10.5% MLLM TPR (or the <5% open-source figure) could shift materially under different negative-set constructions.

    Authors: The current manuscript does not provide the requested procedural details. Thresholds were selected globally on the full FP450 set to enforce exactly 5% FPR for each detector independently, without per-category or per-style stratification. In the revised manuscript we will expand §4 with (1) a precise description of the threshold-selection procedure (including the formula used), (2) per-category and per-style TPR tables evaluated at the global 5% FPR point, and (3) an additional experiment that subsamples FP450 to test sensitivity of the aggregate TPRs to negative-set composition. These additions will make the evaluation protocol fully reproducible and allow readers to assess potential shifts. revision: yes

standing simulated objections not resolved
  • Providing quantitative frequency statistics drawn from real misinformation corpora and external validation of category prevalence

Circularity Check

0 steps flagged

Empirical benchmark with no derivations or fitted predictions

full rationale

The paper creates a benchmark of 600 generated images plus FP450 negatives and reports empirical TPR/FPR numbers for existing MLLMs, detectors, APIs, and humans. No equations, parameters, or derivations appear in the provided text. The six credible-form categories and seven circulation styles are used only to construct the test set; performance numbers are direct measurements on that set rather than predictions derived from fitted inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatz smuggling are present. The skeptic concern about representativeness is a question of external validity, not circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that the chosen categories and negative set capture the intended threat; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption The six credible-form categories and seven circulation styles, plus the FP450 real-image set, constitute a valid and balanced test of synthetic credibility detection.
    Stated in the abstract as the basis for the benchmark construction and evaluation protocol.

pith-pipeline@v0.9.1-grok · 5701 in / 1277 out tokens · 20097 ms · 2026-06-28T11:14:18.597688+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

87 extracted references · 5 canonical work pages

  1. [1]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    High-Resolution Image Synthesis with Latent Diffusion Models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2022 , url =

  2. [2]

    2022 , url =

    Hierarchical Text-Conditional Image Generation with CLIP Latents , author =. 2022 , url =. 2204.06125 , archivePrefix =

  3. [3]

    Advances in Neural Information Processing Systems , volume =

    Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding , author =. Advances in Neural Information Processing Systems , volume =. 2022 , url =

  4. [4]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    InstructPix2Pix: Learning to Follow Image Editing Instructions , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2023 , url =

  5. [5]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    Adding Conditional Control to Text-to-Image Diffusion Models , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2023 , url =

  6. [6]

    Advances in Neural Information Processing Systems , volume =

    TextDiffuser: Diffusion Models as Text Painters , author =. Advances in Neural Information Processing Systems , volume =. 2023 , url =

  7. [7]

    European Conference on Computer Vision , year =

    TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering , author =. European Conference on Computer Vision , year =

  8. [8]

    International Conference on Learning Representations , year =

    AnyText: Multilingual Visual Text Generation and Editing , author =. International Conference on Learning Representations , year =

  9. [9]

    for Now , author =

    CNN-Generated Images Are Surprisingly Easy to Spot... for Now , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2020 , url =

  10. [10]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Towards Universal Fake Image Detectors that Generalize Across Generative Models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2023 , url =

  11. [11]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    DIRE for Diffusion-Generated Image Detection , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2023 , url =

  12. [12]

    2023 , url =

    Zhu, Mingjian and Chen, Hanting and Yan, Qiangyu and Huang, Xudong and Lin, Guanyu and Li, Wei and Tu, Zhijun and Hu, Hailin and Hu, Jie and Wang, Yunhe , booktitle =. 2023 , url =. 2306.08571 , archivePrefix =

  13. [13]

    International Conference on Learning Representations , year =

    A Sanity Check for AI-Generated Image Detection , author =. International Conference on Learning Representations , year =. 2406.19435 , archivePrefix =

  14. [14]

    2025 , url =

    Is Artificial Intelligence Generated Image Detection a Solved Problem? , author =. 2025 , url =. 2505.12335 , archivePrefix =

  15. [15]

    2025 , doi =

    Pellegrini, Lorenzo and Cozzolino, Davide and Pandolfini, Serafino and Maltoni, Davide and Ferrara, Matteo and Verdoliva, Luisa and Prati, Marco and Ramilli, Marco , booktitle =. 2025 , doi =. 2504.20865 , archivePrefix =

  16. [16]

    2025 , url =

    DFBench: Benchmarking Deepfake Image Detection Capability of Large Multimodal Models , author =. 2025 , url =. 2506.03007 , archivePrefix =

  17. [17]

    2025 , url =

    ForensicHub: A Unified Benchmark & Codebase for All-Domain Fake Image Detection and Localization , author =. 2025 , url =. 2505.11003 , archivePrefix =

  18. [18]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    FaceForensics++: Learning to Detect Manipulated Facial Images , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2019 , url =

  19. [19]

    2023 , url =

    Yan, Zhiyuan and Zhang, Yong and Yuan, Xinhang and Lyu, Siwei and Wu, Baoyuan , booktitle =. 2023 , url =. 2307.01426 , archivePrefix =

  20. [20]

    2024 , url =

    DF40: Toward Next-Generation Deepfake Detection , author =. 2024 , url =. 2406.13495 , archivePrefix =

  21. [21]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    ManTra-Net: Manipulation Tracing Network for Detection and Localization of Image Forgeries with Anomalous Features , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2019 , url =

  22. [22]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    Image Manipulation Detection by Multi-View Multi-Scale Supervision , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2021 , url =

  23. [23]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    TruFor: Leveraging All-Round Clues for Trustworthy Image Forgery Detection and Localization , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2023 , url =

  24. [24]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Towards Robust Tampered Text Detection in Document Image: New Dataset and New Solution , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2023 , url =

  25. [25]

    2026 , url =

    AIForge-Doc: A Benchmark for Detecting AI-Forged Tampering in Financial and Form Documents , author =. 2026 , url =. 2602.20569 , archivePrefix =

  26. [26]

    2026 , url =

    DOCFORGE-BENCH: A Comprehensive 0-shot Benchmark for Document Forgery Detection and Analysis , author =. 2026 , url =. 2603.01433 , archivePrefix =

  27. [27]

    2026 , url =

    When the Forger Is the Judge: GPT-Image-2 Cannot Recognize Its Own Faked Documents , author =. 2026 , url =. 2604.25213 , archivePrefix =

  28. [28]

    Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =

    ``Liar, Liar Pants on Fire'': A New Benchmark Dataset for Fake News Detection , author =. Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) , pages =. 2017 , doi =

  29. [29]

    Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages =

    FEVER: A Large-Scale Dataset for Fact Extraction and VERification , author =. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) , pages =. 2018 , doi =

  30. [30]

    2020 , publisher =

    Nakamura, Kai and Levy, Sharon and Wang, William Yang , booktitle =. 2020 , publisher =

  31. [31]

    Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages =

    Visual News: Benchmark and Challenges in News Image Captioning , author =. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages =. 2021 , doi =

  32. [32]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Aneja, Shivangi and Bregler, Chris and Nie. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2023 , doi =. 2101.06278 , archivePrefix =

  33. [33]

    2021 , publisher =

    Luo, Grace and Darrell, Trevor and Rohrbach, Anna , booktitle =. 2021 , publisher =. doi:10.18653/v1/2021.emnlp-main.545 , url =

  34. [34]

    International Journal of Multimedia Information Retrieval , volume =

    VERITE: A Robust Benchmark for Multimodal Misinformation Detection Accounting for Unimodal Bias , author =. International Journal of Multimedia Information Retrieval , volume =. 2024 , doi =

  35. [35]

    Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =

    End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models , author =. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages =. 2023 , doi =

  36. [36]

    2023 , url =

    Factify 2: A Multimodal Fake News and Satire News Dataset , author =. 2023 , url =. 2304.03897 , archivePrefix =

  37. [37]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , year =

    FACTIFY3M: A Benchmark for Multimodal Fact Verification with Explainability through 5W Question-Answering , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , year =. doi:10.18653/v1/2023.emnlp-main.945 , url =

  38. [38]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =

    Detecting and Grounding Multi-Modal Media Manipulation , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages =. 2023 , url =

  39. [39]

    2023 , url =

    MME: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models , author =. 2023 , url =. 2306.13394 , archivePrefix =

  40. [40]

    Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =

    Evaluating Object Hallucination in Large Vision-Language Models , author =. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages =. 2023 , url =

  41. [41]

    Findings of the Association for Computational Linguistics: ACL 2024 , year =

    Aligning Large Multimodal Models with Factually Augmented RLHF , author =. Findings of the Association for Computational Linguistics: ACL 2024 , year =

  42. [42]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

    HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models , author =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year =

  43. [43]

    European Conference on Computer Vision , pages =

    MM-SafetyBench: A Benchmark for Safety Evaluation of Multimodal Large Language Models , author =. European Conference on Computer Vision , pages =. 2024 , publisher =. doi:10.1007/978-3-031-72992-8_22 , url =

  44. [44]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts , author =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =. 2025 , doi =

  45. [45]

    International Journal of Computer Vision , year =

    SafeBench: A Safety Evaluation Framework for Multimodal Large Language Models , author =. International Journal of Computer Vision , year =. doi:10.1007/s11263-025-02613-1 , url =

  46. [46]

    2025 , url =

    C2PA Technical Specification, Version 2.4 , author =. 2025 , url =

  47. [47]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =

    The Stable Signature: Rooting Watermarks in Latent Diffusion Models , author =. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages =. 2023 , url =

  48. [48]

    Advances in Neural Information Processing Systems , volume =

    Tree-Ring Watermarks: Fingerprints for Diffusion Images that are Invisible and Robust , author =. Advances in Neural Information Processing Systems , volume =. 2023 , url =

  49. [49]

    2025 , url =

    SynthID-Image: Image Watermarking at Internet Scale , author =. 2025 , url =. 2510.09263 , archivePrefix =

  50. [50]

    He, Yinan and Gan, Bei and Chen, Siyu and Zhou, Yichun and Yin, Guojun and Song, Luchuan and Sheng, Lu and Shao, Jing and Liu, Ziwei , booktitle =

  51. [51]

    Bammey, Quentin , journal =

  52. [52]

    Ye, Junyan and Zhou, Baichuan and Huang, Zilong and Zhang, Junan and Bai, Tianyi and Kang, Hengrui and He, Jun and Lin, Honglin and Wang, Zihao and Wu, Tong and Wu, Zhizheng and Chen, Yiping and Lin, Dahua and He, Conghui and Li, Weijia , booktitle =

  53. [53]

    Wang, Jin and Lv, Chenghui and Li, Xian and Dong, Shichao and Li, Huadong and Yao, Kelu and Li, Chao and Shao, Wenqi and Luo, Ping , booktitle =

  54. [54]

    2025 , url =

    Liu, Xuannan and Li, Zekun and Li, Pei Pei and Huang, Huaibo and Xia, Shuhan and Cui, Xing and Huang, Linzhi and Deng, Weihong and He, Zhaofeng , booktitle =. 2025 , url =

  55. [55]

    Proceedings of the International AAAI Conference on Web and Social Media , volume =

    Identifying Misinformation from Website Screenshots , author =. Proceedings of the International AAAI Conference on Web and Social Media , volume =. 2021 , doi =

  56. [56]

    2018 , publisher =

    Wang, Yaqing and Ma, Fenglong and Jin, Zhiwei and Yuan, Ye and Xun, Guangxu and Jha, Kishlay and Su, Lu and Gao, Jing , booktitle =. 2018 , publisher =

  57. [57]

    2020 , publisher =

    Zhou, Xinyi and Wu, Jindi and Zafarani, Reza , booktitle =. 2020 , publisher =. doi:10.1007/978-3-030-47436-2_27 , series =

  58. [58]

    2020 , doi =

    Shu, Kai and Mahudeswaran, Deepak and Wang, Suhang and Lee, Dongwon and Liu, Huan , journal =. 2020 , doi =

  59. [59]

    Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =

    Multimodal Misinformation Detection using Large Vision-Language Models , author =. Proceedings of the 33rd ACM International Conference on Information and Knowledge Management , pages =. 2024 , publisher =

  60. [60]

    Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal

    Zeng, Fengzhu and Li, Wenqian and Gao, Wei and Pang, Yan , booktitle =. Multimodal Misinformation Detection by Learning from Synthetic Data with Multimodal. 2024 , publisher =

  61. [61]

    2024 , doi =

    Liu, Xuannan and Li, Peipei and Huang, Huaibo and Li, Zekun and Cui, Xing and Liang, Jiahao and Qin, Lixiong and Deng, Weihong and He, Zhaofeng , journal =. 2024 , doi =

  62. [62]

    Computer Vision -- ECCV 2022 , pages =

    Detecting Tampered Scene Text in the Wild , author =. Computer Vision -- ECCV 2022 , pages =. 2022 , publisher =

  63. [63]

    Revisiting Tampered Scene Text Detection in the Era of Generative

    Qu, Chenfan and Zhong, Yiwu and Guo, Fengjun and Jin, Lianwen , booktitle =. Revisiting Tampered Scene Text Detection in the Era of Generative

  64. [64]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , year=

    The Stable Signature: Rooting Watermarks in Latent Diffusion Models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , year=

  65. [65]

    arXiv preprint arXiv:2309.14525 , year=

    Aligning Large Multimodal Models with Factually Augmented RLHF , author=. arXiv preprint arXiv:2309.14525 , year=

  66. [66]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

    RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , year=

  67. [67]

    2023 , url =

    Improving Image Generation with Better Captions , author =. 2023 , url =

  68. [68]

    2025 , month = mar, url =

    Addendum to. 2025 , month = mar, url =

  69. [69]

    Singh, Mandeep and Okcular, Emre , year =

  70. [70]

    Proceedings of the First Workshop on Writing Aids at the Crossroads of AI, Cognitive Science and NLP (WRAICOGS 2025). 2025

  71. [71]

    Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts

    Buhnila, Ioana and Cislaru, Georgeta and Todirascu, Amalia. Chain-of- M eta W riting: Linguistic and Textual Analysis of How Small Language Models Write Young Students Texts. 2025

  72. [72]

    Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities

    Shi, Ken and Penn, Gerald. Semantic Masking in a Needle-in-a-haystack Test for Evaluating Large Language Model Long-Text Capabilities. 2025

  73. [73]

    Reading Between the Lines: A dataset and a study on why some texts are tougher than others

    Khallaf, Nouran and Eugeni, Carlo and Sharoff, Serge. Reading Between the Lines: A dataset and a study on why some texts are tougher than others. 2025

  74. [74]

    P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction

    Jourdan, L \'e ane and Boudin, Florian and Dufour, Richard and Hernandez, Nicolas and Aizawa, Akiko. P ara R ev : Building a dataset for Scientific Paragraph Revision annotated with revision instruction. 2025

  75. [75]

    Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts

    Maggi, Chiara and Vitaletti, Andrea. Towards an operative definition of creative writing: a preliminary assessment of creativeness in AI and human texts. 2025

  76. [76]

    Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models

    Sato, Anna and Kobayashi, Ichiro. Decoding Semantic Representations in the Brain Under Language Stimuli with Large Language Models. 2025

  77. [77]

    Proceedings of the 4th Workshop on Arabic Corpus Linguistics (WACL-4). 2025

  78. [78]

    A rabic S ense: A Benchmark for Evaluating Commonsense Reasoning in A rabic with Large Language Models

    Lamsiyah, Salima and Zeinalipour, Kamyar and El amrany, Samir and Brust, Matthias and Maggini, Marco and Bouvry, Pascal and Schommer, Christoph. A rabic S ense: A Benchmark for Evaluating Commonsense Reasoning in A rabic with Large Language Models. 2025

  79. [79]

    Lahjawi: A rabic Cross-Dialect Translator

    Hamed, Mohamed Motasim and Hreden, Muhammad and Hennara, Khalil and Aldallal, Zeina and Chrouf, Sara and AlModhayan, Safwan. Lahjawi: A rabic Cross-Dialect Translator. 2025

  80. [80]

    Lost in Variation: An Unsupervised Methodology for Mining Lexico-syntactic Patterns in Middle A rabic Texts

    Bezan. Lost in Variation: An Unsupervised Methodology for Mining Lexico-syntactic Patterns in Middle A rabic Texts. 2025

Showing first 80 references.