Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

Bing Wang; Changchun Li; Chen Shen; Jieping Ye; Kaiyuan Liu; Rui Miao; Shaotian Yan; Xiaosong Yuan; Ximing Li

arxiv: 2605.19285 · v1 · pith:SLI446FVnew · submitted 2026-05-19 · 💻 cs.CL · cs.AI· cs.CY

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

Bing Wang , Rui Miao , Ximing Li , Chen Shen , Shaotian Yan , Changchun Li , Kaiyuan Liu , Xiaosong Yuan

show 1 more author

Jieping Ye

This is my paper

Pith reviewed 2026-05-20 06:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY

keywords misinformation detectionlarge language modelsexplainable AIrationalesfine-tuningnecessity and sufficiencydata synthesisverification steps

0 comments

The pith

A metric that scores each verification step's contribution lets LLMs be fine-tuned on only the necessary and sufficient rationales for misinformation detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that simply keeping training examples where an LLM gets the final true-or-false label right is not enough when the goal is to produce clear explanations. Such filtering often keeps rationales that leave out needed facts or include extra steps that do not change the outcome. The authors therefore build LONSREX to measure how much each part of an explanation actually moves the model toward its final verdict and retain only the parts that are both required and enough. A reader would care because social-media platforms need systems that can not only flag false claims but also show focused, non-redundant reasons for doing so. If the approach works, fine-tuned models could deliver explanations that are shorter, more accurate, and easier for people to check.

Core claim

The paper claims that naive filtering of LLM-generated rationales by label correctness alone produces either insufficient rationales that fail to support the decision or unnecessary rationales caused by over-verification. To fix this, LONSREX introduces a metric that quantifies the contribution of each verification step to the final prediction and uses the metric to select only those rationales that are necessary and sufficient. The resulting data is then used to fine-tune a dedicated LLM for explainable misinformation detection.

What carries the argument

LONSREX, a data-synthesis pipeline that applies a contribution metric to each verification step so that only necessary and sufficient rationales are kept for fine-tuning.

If this is right

Rationales filtered only by correct final labels tend to be either too sparse to support the verdict or too verbose because of over-verification.
The contribution metric isolates verification steps whose removal would alter the prediction, thereby marking them as necessary.
Fine-tuned models trained on the filtered rationales generate explanations that better match the requirements of necessity and sufficiency.
The pipeline reduces the over-verification behavior that appears when stronger off-the-shelf LLMs are used without filtering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same contribution-based filtering could be tested on explanation tasks outside misinformation, such as legal or medical reasoning.
One could measure whether models trained this way produce fewer contradictory statements inside a single rationale.
Real-world deployment would require checking whether users find the resulting explanations more helpful for deciding what to believe online.

Load-bearing premise

The metric that scores how much each verification step changes the final prediction correctly identifies which steps are necessary and which are sufficient.

What would settle it

If LLMs fine-tuned on the LONSREX-filtered data produce explanations that are no more concise, accurate, or faithful than those from simple label-correctness filtering, the value of the contribution metric would be called into question.

Figures

Figures reproduced from arXiv: 2605.19285 by Bing Wang, Changchun Li, Chen Shen, Jieping Ye, Kaiyuan Liu, Rui Miao, Shaotian Yan, Xiaosong Yuan, Ximing Li.

**Figure 3.** Figure 3: Distribution of the number of minimal sufficient [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Precision and recall scores of instruct-tuned and [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 6.** Figure 6: Distribution of the ratio of unnecessary rationales. [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗

**Figure 7.** Figure 7: Overview of LonsRex. Given a claim, we generate 𝐾 rationales using advanced LLMs, which are then filtered by basic heuristics and self-attribution and mutual-attribution scores. Finally, we use the filtered rationales to tune a lightweight LLM. in Eq. (2) across all its verification steps, and we incorporate the ratio of unnecessary steps in Eq. (4) as a penalty term. The sufficiency score 𝑠suf for each r… view at source ↗

**Figure 8.** Figure 8: Distribution of Δ for correct / incorrect rationales [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

The rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LONSREX adds a filtering step that tries to pick necessary and sufficient rationales for fine-tuning LLMs on explainable misinformation detection, but the metric's exact definition and validation leave room for doubt on whether it truly fixes the stated problems.

read the letter

The main thing here is a pipeline called LONSREX that generates training data for fine-tuning an LLM on explainable misinformation detection. It starts with fact-checked articles, gets predictions and rationales from strong LLMs, then filters using a new metric that scores each verification step's contribution to the final call. The goal is to drop both insufficient rationales and the overly verbose ones that strong models tend to produce under naive correctness filtering.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LONSREX, a data synthesis pipeline for fine-tuning LLMs on explainable misinformation detection. It collects fact-checked articles, generates veracity predictions and rationales via strong LLMs, applies naive filtering on label correctness, and identifies two limitations of that approach: insufficient rationales (due to coarse binary labels) and unnecessary over-verification rationales. LONSREX introduces a metric quantifying the contribution of each verification step to the final prediction in order to locate necessary and sufficient rationales, with experiments claimed to show improved training data quality.

Significance. If the contribution metric correctly operationalizes necessity (counterfactual removal flips the veracity prediction) and sufficiency (the step alone supports the prediction), the pipeline could yield higher-quality fine-tuning data than label-only filtering, improving both predictive performance and rationale faithfulness in LLM-based misinformation detection. The multi-LLM generation step and explicit focus on rationale quality are practical strengths that could generalize beyond the reported setting.

major comments (2)

[Abstract] Abstract (paragraph describing LONSREX): the metric is described only as one that 'quantifies the contribution of each verification step to the final prediction' without an explicit formula, algorithm, or proof that it implements necessity via removal (prediction flip) and sufficiency via isolation (prediction preserved). This definition is load-bearing for the central claim that LONSREX fixes the two stated limitations of naive filtering.
[Experiments] Experiments section: results are reported to demonstrate effectiveness of LONSREX, yet no ablation isolates the metric's contribution, no comparison is made to human-annotated necessity/sufficiency labels, and no counterfactual verification (remove/add step and re-predict) is described. Without these, the claim that selected rationales are both necessary and sufficient rather than merely predictive correlates cannot be assessed.

minor comments (2)

[Abstract] The abstract states that 'naive filtering based solely on label correctness is insufficient' but does not quantify how often the two failure modes occur in the collected data (e.g., percentage of instances with insufficient or over-verbose rationales).
[Abstract] Dataset details (source of fact-checked articles, number of instances, veracity label distribution) are referenced but not reported with concrete numbers or splits in the provided abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our LONSREX pipeline. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph describing LONSREX): the metric is described only as one that 'quantifies the contribution of each verification step to the final prediction' without an explicit formula, algorithm, or proof that it implements necessity via removal (prediction flip) and sufficiency via isolation (prediction preserved). This definition is load-bearing for the central claim that LONSREX fixes the two stated limitations of naive filtering.

Authors: We agree that the abstract should provide a more precise description of the metric to support the central claims. In the revised manuscript, we will expand the abstract to include the explicit formula for the contribution metric, a brief outline of the algorithm used to compute necessity and sufficiency, and an explanation of how it operationalizes these concepts through counterfactual removal (where removing the step flips the prediction) and isolation (where the step alone preserves the prediction). This will strengthen the presentation without altering the core methodology. revision: yes
Referee: [Experiments] Experiments section: results are reported to demonstrate effectiveness of LONSREX, yet no ablation isolates the metric's contribution, no comparison is made to human-annotated necessity/sufficiency labels, and no counterfactual verification (remove/add step and re-predict) is described. Without these, the claim that selected rationales are both necessary and sufficient rather than merely predictive correlates cannot be assessed.

Authors: We acknowledge the value of these additional analyses for rigorously validating that the selected rationales are necessary and sufficient. Our current experiments focus on end-to-end performance improvements in predictive accuracy and rationale quality when using LONSREX-filtered data compared to naive filtering. To address this, we will add an ablation study that isolates the contribution metric by comparing against a version without it. We will also include a comparison with a small set of human-annotated necessity and sufficiency labels on a subset of the data. Furthermore, we will describe the counterfactual verification process used in developing the metric, including examples of removal and addition of steps and the resulting prediction changes. These additions will be incorporated into the Experiments section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline and metric are defined independently of target performance

full rationale

The paper describes an empirical pipeline: collect fact-checked articles, generate predictions/rationales from strong LLMs, apply a new contribution metric to filter for necessary/sufficient rationales, then fine-tune. No derivation chain reduces the final claim (higher-quality data or better explainable MD) to a fitted parameter or self-citation by construction. The metric is introduced as a novel proposal whose correctness is evaluated experimentally rather than assumed via definition or prior self-work. This matches the default expectation of a non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a quantifiable contribution metric can isolate necessary and sufficient rationales; no free parameters or invented entities are identifiable from the abstract alone.

axioms (1)

domain assumption A metric can be defined that quantifies the contribution of each verification step to the final veracity prediction and thereby determines necessity and sufficiency.
This assumption is required for LONSREX to improve upon naive label-based filtering.

pith-pipeline@v0.9.0 · 5862 in / 1181 out tokens · 39020 ms · 2026-05-20T06:04:02.433734+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 4 internal anchors

[1]

Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, and Junzhou Huang. 2020. Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks. InAAAI. 549–556

work page 2020
[2]

Xiaoshu Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, and Xinwang Liu. 2025. Skip- Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster. InEMNLP. 12153–12168

work page 2025
[3]

Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, and Kyle Richard- son. 2023. DISCO: Distilling Counterfactuals with Large Language Models. In ACL. 5514–5528

work page 2023
[4]

Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, and Jacob Andreas. 2023. Guiding Pretraining in Rein- forcement Learning with Large Language Models. InICML, Vol. 202. 8657–8677

work page 2023
[5]

Yaqian Dun, Kefei Tu, Chen Chen, Chunyan Hou, and Xiaojie Yuan. 2021. KAN: Knowledge-aware Attention Network for Fake News Detection. InAAAI. 81–89

work page 2021
[6]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al . 2025. DeepSeek-R1 in- centivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638

work page 2025
[7]

Namgyu Ho, Laura Schmid, and Se-Young Yun. 2023. Large Language Models Are Reasoning Teachers. InACL. 14852–14882

work page 2023
[8]

Rongpei Hong, Jian Lang, Jin Xu, Zhangtao Cheng, Ting Zhong, and Fan Zhou

work page
[9]

Following Clues, Approaching the Truth: Explainable Micro-Video Rumor Detection via Chain-of-Thought Reasoning. InWWW. 4684–4698

work page
[10]

Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, and Peng Qi. 2024. Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection. InAAAI. 22105–22113

work page 2024
[11]

Xingyue Huang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, et al. 2025. Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers.CoRRabs/2509.03059 (2025)

work page arXiv 2025
[12]

Di Jin, Jun Yang, Xiaobao Wang, Junwei Zhang, Shuqi Li, and Dongxiao He. 2025. A Dynamic Knowledge Update-Driven Model with Large Language Models for Fake News Detection. InIJCAI. 3000–3008

work page 2025
[13]

Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. 2025. Disentangling Memory and Reasoning Ability in Large Language Models. InACL. 1681–1701

work page 2025
[14]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. InNeurIPS

work page 2022
[15]

Zhiqiang Kou, Junyang Chen, Xin-Qiang Cai, Ming-Kun Xie, Biao Liu, Chang- wei Wang, Lei Feng, Yuheng Jia, Gang Niu, Masashi Sugiyama, and Xin Geng

work page
[16]

Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective.CoRRabs/2510.15007 (2025)

work page arXiv 2025
[17]

David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny- cook, David Rothschild, et al. 2018. The science of fake news.Science359, 6380 (2018), 1094–1096

work page 2018
[18]

Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, and Jundong Li. 2025. Learning from Diverse Reasoning Paths with Routing and Collaboration. InEMNLP. 2832–2845

work page 2025
[19]

Kaiyuan Liu, Shaotian Yan, Rui Miao, Bing Wang, Chen Shen, Jun Zhang, and Jieping Ye. 2026. Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation. InICLR

work page 2026
[20]

Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, and Rui Yan. 2025. The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News. InSIGIR. 504–514

work page 2025
[21]

Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine De Kock, Sophia Ananiadou, and Eduard Hovy. 2025. Raemollm: Retrieval augmented llms for cross-domain mis- information detection using in-context learning based on emotional information. InACL. 16508–16523

work page 2025
[22]

Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, Gengru Chen, Wenbo Su, and Bo Zheng. 2025. Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation.CoRR abs/2503.16385 (2025)

work page arXiv 2025
[23]

Arkadiusz Modzelewski, Witold Sosnowski, Tiziano Labruna, Adam Wierzbicki, and Giovanni Da San Martino. 2025. PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation. InACL. 24959–24983

work page 2025
[24]

s1: Simple test-time scaling

Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel J. Candès, and Tatsunori Hashimoto. 2025. s1: Simple test-time scaling.CoRRabs/2501.19393 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Qiong Nan, Qiang Sheng, Juan Cao, Beizhe Hu, Danding Wang, and Jintao Li. 2024. Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models. InCIKM. 1732–1742

work page 2024
[26]

OpenAI. 2025. gpt-oss-120b & gpt-oss-20b Model Card.CoRRabs/2508.10925 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[27]

Piotr Przybyla. 2020. Capturing the Style of Fake News. InAAAI. 490–497

work page 2020
[28]

Peng Qi, Yuyan Bu, Juan Cao, Wei Ji, Ruihao Shui, Junbin Xiao, Danding Wang, and Tat-Seng Chua. 2023. FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms. InAAAI. 14444– 14452

work page 2023
[29]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.CoRRabs/2402.03300 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[30]

Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, and Preslav Nakov. 2023. Fake News Detectors are Biased against Texts Generated by Large Language Models.CoRRabs/2309.08674 (2023)

work page arXiv 2023
[31]

Zhao Tong, Yimeng Gu, Huidong Liu, Qiang Liu, Shu Wu, Haichao Shi, and Xiao- Yu Zhang. 2025. Generate First, Then Sample: Enhancing Fake News Detection with LLM-Augmented Reinforced Sampling. InACL. 24276–24290

work page 2025
[32]

Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online.Science359, 6380 (2018), 1146–1151

work page 2018
[33]

Herun Wan, Shangbin Feng, Zhaoxuan Tan, Heng Wang, Yulia Tsvetkov, and Minnan Luo. 2024. DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection. InFindings of ACL. 2637–2667

work page 2024
[34]

Bing Wang, Ximing Li, Changchun Li, Bo Fu, Songwen Pei, and Shengsheng Wang. 2024. Why Misinformation is Created? Detecting them by Integrating Intent Features. InCIKM. 2304–2314

work page 2024
[35]

Bing Wang, Ximing Li, Changchun Li, Bingrui Zhao, Bo Fu, Renchu Guan, and Shengsheng Wang. 2025. Robust Misinformation Detection by Visiting Potential Commonsense Conflict. InIJCAI. 7760–7768

work page 2025
[36]

Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. 2024. Explainable Fake News Detection with Large Language Model via Defense Among Competing Wisdom. InWWW. 2452–2463

work page 2024
[37]

Bing Wang, Bingrui Zhao, Ximing Li, Changchun Li, Wanfu Gao, and Sheng- sheng Wang. 2025. Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator. InSIGIR. 468–478

work page 2025
[38]

Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin, and Xiang Ren

work page
[39]

SCOTT: Self-Consistent Chain-of-Thought Distillation. InACL. 5546–5558

work page
[40]

Yifeng Wang, Zhouhong Gu, Siwei Zhang, Suhang Zheng, Tao Wang, Tianyu Li, Hongwei Feng, and Yanghua Xiao. 2025. LLM-GAN: Constructing Generative Adversarial Network Through Large Language Models for Explainable Fake News Detection. InIEEE International Conference on Acoustics, Speech and Signal Processing. 1–5

work page 2025
[41]

Zhengjia Wang, Danding Wang, Qiang Sheng, Juan Cao, Siyuan Ma, and Haonan Cheng. 2025. Exploring news intent and its application: A theory-driven approach. Information Processing & Management62, 6 (2025), 104229

work page 2025
[42]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InNeurIPS

work page 2022
[43]

Jiaying Wu and Bryan Hooi. 2023. DECOR: Degree-Corrected Social Graph Refinement for Fake News Detection. InKDD. 2582–2593

work page 2023
[44]

Shaotian Yan, Kaiyuan Liu, Chen Shen, Bing Wang, Sinan Fan, Jun Zhang, Yue Wu, Zheng Wang, and Jieping Ye. 2026. Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning.CoRRabs/2601.09088 (2026)

work page arXiv 2026
[45]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 Technical Report.CoRRabs/2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[46]

Xiaofang Yang, Lijun Li, Heng Zhou, Tong Zhu, Xiaoye Qu, Yuchen Fan, Qianshan Wei, Rui Ye, Li Kang, Yiran Qin, Zhiqiang Kou, Daizong Liu, Qi Li, Ning Ding, Siheng Chen, and Jing Shao. 2026. Toward Efficient Agents: Memory, Tool learning, and Planning.CoRRabs/2601.14192 (2026)

work page arXiv 2026
[47]

Zhiwei Yang, Jing Ma, Hechang Chen, Hongzhan Lin, Ziyang Luo, and Yi Chang

work page
[48]

InCOLING

A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Explainable Fake News Detection. InCOLING. 2608–2621

work page
[49]

Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei Liu. 2025. LIMO: Less is More for Reasoning. InCOLM

work page 2025
[50]

Xiaosong Yuan, Chen Shen, Shaotian Yan, Kaiyuan Liu, Xiaofeng Zhang, Sinan Fan, Liang Xie, Wenxiao Wang, Renchu Guan, Ying Wang, and Jieping Ye. 2026. Differential Fine-Tuning Large Language Models Towards Better Diverse Rea- soning Abilities. InICLR

work page 2026
[51]

Xiaosong Yuan, Chen Shen, Shaotian Yan, Xiaofeng Zhang, Liang Xie, Wenxiao Wang, Renchu Guan, Ying Wang, and Jieping Ye. 2024. Instance-adaptive Zero- shot Chain-of-Thought Prompting. InNeurIPS. 125469–125486

work page 2024
[52]

Zhenrui Yue, Huimin Zeng, Yimeng Lu, Lanyu Shang, Yang Zhang, and Dong Wang. 2024. Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation. InNAACL. 5628–5643

work page 2024
[53]

Dylan Zhang, Qirun Dai, and Hao Peng. 2025. The Best Instruction-Tuning Data are Those That Fit. InNeurIPS

work page 2025
[54]

Xueyao Zhang, Juan Cao, Xirong Li, Qiang Sheng, Lei Zhong, and Kai Shu. 2021. Mining Dual Emotion for Fake News Detection. InWWW. 3465–3476. Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection KDD ’26, 9–13 August, 2026, Jeju, Korea Table 8: Experimental results on more advanced LLMs across four MD benchmarks. LLM...

work page 2021

[1] [1]

Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, and Junzhou Huang. 2020. Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks. InAAAI. 549–556

work page 2020

[2] [2]

Xiaoshu Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, and Xinwang Liu. 2025. Skip- Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster. InEMNLP. 12153–12168

work page 2025

[3] [3]

Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, and Kyle Richard- son. 2023. DISCO: Distilling Counterfactuals with Large Language Models. In ACL. 5514–5528

work page 2023

[4] [4]

Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, and Jacob Andreas. 2023. Guiding Pretraining in Rein- forcement Learning with Large Language Models. InICML, Vol. 202. 8657–8677

work page 2023

[5] [5]

Yaqian Dun, Kefei Tu, Chen Chen, Chunyan Hou, and Xiaojie Yuan. 2021. KAN: Knowledge-aware Attention Network for Fake News Detection. InAAAI. 81–89

work page 2021

[6] [6]

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al . 2025. DeepSeek-R1 in- centivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638

work page 2025

[7] [7]

Namgyu Ho, Laura Schmid, and Se-Young Yun. 2023. Large Language Models Are Reasoning Teachers. InACL. 14852–14882

work page 2023

[8] [8]

Rongpei Hong, Jian Lang, Jin Xu, Zhangtao Cheng, Ting Zhong, and Fan Zhou

work page

[9] [9]

Following Clues, Approaching the Truth: Explainable Micro-Video Rumor Detection via Chain-of-Thought Reasoning. InWWW. 4684–4698

work page

[10] [10]

Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, and Peng Qi. 2024. Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection. InAAAI. 22105–22113

work page 2024

[11] [11]

Xingyue Huang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, et al. 2025. Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers.CoRRabs/2509.03059 (2025)

work page arXiv 2025

[12] [12]

Di Jin, Jun Yang, Xiaobao Wang, Junwei Zhang, Shuqi Li, and Dongxiao He. 2025. A Dynamic Knowledge Update-Driven Model with Large Language Models for Fake News Detection. InIJCAI. 3000–3008

work page 2025

[13] [13]

Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. 2025. Disentangling Memory and Reasoning Ability in Large Language Models. InACL. 1681–1701

work page 2025

[14] [14]

Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. InNeurIPS

work page 2022

[15] [15]

Zhiqiang Kou, Junyang Chen, Xin-Qiang Cai, Ming-Kun Xie, Biao Liu, Chang- wei Wang, Lei Feng, Yuheng Jia, Gang Niu, Masashi Sugiyama, and Xin Geng

work page

[16] [16]

Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective.CoRRabs/2510.15007 (2025)

work page arXiv 2025

[17] [17]

David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny- cook, David Rothschild, et al. 2018. The science of fake news.Science359, 6380 (2018), 1094–1096

work page 2018

[18] [18]

Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, and Jundong Li. 2025. Learning from Diverse Reasoning Paths with Routing and Collaboration. InEMNLP. 2832–2845

work page 2025

[19] [19]

Kaiyuan Liu, Shaotian Yan, Rui Miao, Bing Wang, Chen Shen, Jun Zhang, and Jieping Ye. 2026. Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation. InICLR

work page 2026

[20] [20]

Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, and Rui Yan. 2025. The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News. InSIGIR. 504–514

work page 2025

[21] [21]

Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine De Kock, Sophia Ananiadou, and Eduard Hovy. 2025. Raemollm: Retrieval augmented llms for cross-domain mis- information detection using in-context learning based on emotional information. InACL. 16508–16523

work page 2025

[22] [22]

Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, Gengru Chen, Wenbo Su, and Bo Zheng. 2025. Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation.CoRR abs/2503.16385 (2025)

work page arXiv 2025

[23] [23]

Arkadiusz Modzelewski, Witold Sosnowski, Tiziano Labruna, Adam Wierzbicki, and Giovanni Da San Martino. 2025. PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation. InACL. 24959–24983

work page 2025

[24] [24]

s1: Simple test-time scaling

Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel J. Candès, and Tatsunori Hashimoto. 2025. s1: Simple test-time scaling.CoRRabs/2501.19393 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

Qiong Nan, Qiang Sheng, Juan Cao, Beizhe Hu, Danding Wang, and Jintao Li. 2024. Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models. InCIKM. 1732–1742

work page 2024

[26] [26]

OpenAI. 2025. gpt-oss-120b & gpt-oss-20b Model Card.CoRRabs/2508.10925 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[27] [27]

Piotr Przybyla. 2020. Capturing the Style of Fake News. InAAAI. 490–497

work page 2020

[28] [28]

Peng Qi, Yuyan Bu, Juan Cao, Wei Ji, Ruihao Shui, Junbin Xiao, Danding Wang, and Tat-Seng Chua. 2023. FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms. InAAAI. 14444– 14452

work page 2023

[29] [29]

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.CoRRabs/2402.03300 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[30] [30]

Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, and Preslav Nakov. 2023. Fake News Detectors are Biased against Texts Generated by Large Language Models.CoRRabs/2309.08674 (2023)

work page arXiv 2023

[31] [31]

Zhao Tong, Yimeng Gu, Huidong Liu, Qiang Liu, Shu Wu, Haichao Shi, and Xiao- Yu Zhang. 2025. Generate First, Then Sample: Enhancing Fake News Detection with LLM-Augmented Reinforced Sampling. InACL. 24276–24290

work page 2025

[32] [32]

Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online.Science359, 6380 (2018), 1146–1151

work page 2018

[33] [33]

Herun Wan, Shangbin Feng, Zhaoxuan Tan, Heng Wang, Yulia Tsvetkov, and Minnan Luo. 2024. DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection. InFindings of ACL. 2637–2667

work page 2024

[34] [34]

Bing Wang, Ximing Li, Changchun Li, Bo Fu, Songwen Pei, and Shengsheng Wang. 2024. Why Misinformation is Created? Detecting them by Integrating Intent Features. InCIKM. 2304–2314

work page 2024

[35] [35]

Bing Wang, Ximing Li, Changchun Li, Bingrui Zhao, Bo Fu, Renchu Guan, and Shengsheng Wang. 2025. Robust Misinformation Detection by Visiting Potential Commonsense Conflict. InIJCAI. 7760–7768

work page 2025

[36] [36]

Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. 2024. Explainable Fake News Detection with Large Language Model via Defense Among Competing Wisdom. InWWW. 2452–2463

work page 2024

[37] [37]

Bing Wang, Bingrui Zhao, Ximing Li, Changchun Li, Wanfu Gao, and Sheng- sheng Wang. 2025. Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator. InSIGIR. 468–478

work page 2025

[38] [38]

Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin, and Xiang Ren

work page

[39] [39]

SCOTT: Self-Consistent Chain-of-Thought Distillation. InACL. 5546–5558

work page

[40] [40]

Yifeng Wang, Zhouhong Gu, Siwei Zhang, Suhang Zheng, Tao Wang, Tianyu Li, Hongwei Feng, and Yanghua Xiao. 2025. LLM-GAN: Constructing Generative Adversarial Network Through Large Language Models for Explainable Fake News Detection. InIEEE International Conference on Acoustics, Speech and Signal Processing. 1–5

work page 2025

[41] [41]

Zhengjia Wang, Danding Wang, Qiang Sheng, Juan Cao, Siyuan Ma, and Haonan Cheng. 2025. Exploring news intent and its application: A theory-driven approach. Information Processing & Management62, 6 (2025), 104229

work page 2025

[42] [42]

Chi, Quoc V

Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InNeurIPS

work page 2022

[43] [43]

Jiaying Wu and Bryan Hooi. 2023. DECOR: Degree-Corrected Social Graph Refinement for Fake News Detection. InKDD. 2582–2593

work page 2023

[44] [44]

Shaotian Yan, Kaiyuan Liu, Chen Shen, Bing Wang, Sinan Fan, Jun Zhang, Yue Wu, Zheng Wang, and Jieping Ye. 2026. Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning.CoRRabs/2601.09088 (2026)

work page arXiv 2026

[45] [45]

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 Technical Report.CoRRabs/2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[46] [46]

Xiaofang Yang, Lijun Li, Heng Zhou, Tong Zhu, Xiaoye Qu, Yuchen Fan, Qianshan Wei, Rui Ye, Li Kang, Yiran Qin, Zhiqiang Kou, Daizong Liu, Qi Li, Ning Ding, Siheng Chen, and Jing Shao. 2026. Toward Efficient Agents: Memory, Tool learning, and Planning.CoRRabs/2601.14192 (2026)

work page arXiv 2026

[47] [47]

Zhiwei Yang, Jing Ma, Hechang Chen, Hongzhan Lin, Ziyang Luo, and Yi Chang

work page

[48] [48]

InCOLING

A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Explainable Fake News Detection. InCOLING. 2608–2621

work page

[49] [49]

Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei Liu. 2025. LIMO: Less is More for Reasoning. InCOLM

work page 2025

[50] [50]

Xiaosong Yuan, Chen Shen, Shaotian Yan, Kaiyuan Liu, Xiaofeng Zhang, Sinan Fan, Liang Xie, Wenxiao Wang, Renchu Guan, Ying Wang, and Jieping Ye. 2026. Differential Fine-Tuning Large Language Models Towards Better Diverse Rea- soning Abilities. InICLR

work page 2026

[51] [51]

Xiaosong Yuan, Chen Shen, Shaotian Yan, Xiaofeng Zhang, Liang Xie, Wenxiao Wang, Renchu Guan, Ying Wang, and Jieping Ye. 2024. Instance-adaptive Zero- shot Chain-of-Thought Prompting. InNeurIPS. 125469–125486

work page 2024

[52] [52]

Zhenrui Yue, Huimin Zeng, Yimeng Lu, Lanyu Shang, Yang Zhang, and Dong Wang. 2024. Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation. InNAACL. 5628–5643

work page 2024

[53] [53]

Dylan Zhang, Qirun Dai, and Hao Peng. 2025. The Best Instruction-Tuning Data are Those That Fit. InNeurIPS

work page 2025

[54] [54]

Xueyao Zhang, Juan Cao, Xirong Li, Qiang Sheng, Lei Zhong, and Kai Shu. 2021. Mining Dual Emotion for Fake News Detection. InWWW. 3465–3476. Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection KDD ’26, 9–13 August, 2026, Jeju, Korea Table 8: Experimental results on more advanced LLMs across four MD benchmarks. LLM...

work page 2021