pith. sign in

arxiv: 2605.19285 · v1 · pith:SLI446FVnew · submitted 2026-05-19 · 💻 cs.CL · cs.AI· cs.CY

Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection

Pith reviewed 2026-05-20 06:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.CY
keywords misinformation detectionlarge language modelsexplainable AIrationalesfine-tuningnecessity and sufficiencydata synthesisverification steps
0
0 comments X

The pith

A metric that scores each verification step's contribution lets LLMs be fine-tuned on only the necessary and sufficient rationales for misinformation detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that simply keeping training examples where an LLM gets the final true-or-false label right is not enough when the goal is to produce clear explanations. Such filtering often keeps rationales that leave out needed facts or include extra steps that do not change the outcome. The authors therefore build LONSREX to measure how much each part of an explanation actually moves the model toward its final verdict and retain only the parts that are both required and enough. A reader would care because social-media platforms need systems that can not only flag false claims but also show focused, non-redundant reasons for doing so. If the approach works, fine-tuned models could deliver explanations that are shorter, more accurate, and easier for people to check.

Core claim

The paper claims that naive filtering of LLM-generated rationales by label correctness alone produces either insufficient rationales that fail to support the decision or unnecessary rationales caused by over-verification. To fix this, LONSREX introduces a metric that quantifies the contribution of each verification step to the final prediction and uses the metric to select only those rationales that are necessary and sufficient. The resulting data is then used to fine-tune a dedicated LLM for explainable misinformation detection.

What carries the argument

LONSREX, a data-synthesis pipeline that applies a contribution metric to each verification step so that only necessary and sufficient rationales are kept for fine-tuning.

If this is right

  • Rationales filtered only by correct final labels tend to be either too sparse to support the verdict or too verbose because of over-verification.
  • The contribution metric isolates verification steps whose removal would alter the prediction, thereby marking them as necessary.
  • Fine-tuned models trained on the filtered rationales generate explanations that better match the requirements of necessity and sufficiency.
  • The pipeline reduces the over-verification behavior that appears when stronger off-the-shelf LLMs are used without filtering.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same contribution-based filtering could be tested on explanation tasks outside misinformation, such as legal or medical reasoning.
  • One could measure whether models trained this way produce fewer contradictory statements inside a single rationale.
  • Real-world deployment would require checking whether users find the resulting explanations more helpful for deciding what to believe online.

Load-bearing premise

The metric that scores how much each verification step changes the final prediction correctly identifies which steps are necessary and which are sufficient.

What would settle it

If LLMs fine-tuned on the LONSREX-filtered data produce explanations that are no more concise, accurate, or faithful than those from simple label-correctness filtering, the value of the contribution metric would be called into question.

Figures

Figures reproduced from arXiv: 2605.19285 by Bing Wang, Changchun Li, Chen Shen, Jieping Ye, Kaiyuan Liu, Rui Miao, Shaotian Yan, Xiaosong Yuan, Ximing Li.

Figure 1
Figure 1. Figure 1: A representative case demonstrate one rationale [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of the number of minimal sufficient [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Precision and recall scores of instruct-tuned and [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of the ratio of unnecessary rationales. [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Overview of LonsRex. Given a claim, we generate 𝐾 rationales using advanced LLMs, which are then filtered by basic heuristics and self-attribution and mutual-attribution scores. Finally, we use the filtered rationales to tune a lightweight LLM. in Eq. (2) across all its verification steps, and we incorporate the ratio of unnecessary steps in Eq. (4) as a penalty term. The suffi￾ciency score 𝑠suf for each r… view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of Δ for correct / incorrect rationales [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗
read the original abstract

The rapid spread of misinformation on social media platforms has become a formidable challenge. To mitigate its proliferation, Misinformation Detection (MD) has emerged as a critical research topic. Traditional MD approaches based on small models typically perform binary classification through a black-box process. Recently, the rise of Large Language Models (LLMs) has enabled explainable MD, where models generate rationales that explain their decisions, thereby enhancing transparency. Existing explainable MD methods primarily focus on crafting sophisticated prompts to elicit rationales from off-the-shelf LLMs. In this work, we propose a pipeline to fine-tune a dedicated LLM specifically for explainable MD. Our pipeline begins by collecting large-scale fact-checked articles, and then uses multiple strong LLMs to produce veracity predictions and rationales. To ensure high-quality training data, we leverage a filtering strategy that selects only the correct instances for fine-tuning. While this pipeline is intuitive and prevalent, our experiments reveal that naive filtering based solely on label correctness is insufficient in practice and suffers from two critical limitations: (1) Coarse-grained labels cause insufficient rationales: Rationales filtered solely based on binary labels are insufficient to adequately support their decisions; (2) Over-verification behavior causes unnecessary rationales: Stronger LLMs tend to exhibit over-verification behavior, producing excessively verbose and unnecessary rationales. To address these issues, we introduce LONSREX, a novel data synthesis pipeline to Locate Necessary and Sufficient Rationales for Explainable MD. Specifically, we propose a metric that quantifies the contribution of each verification step to the final prediction, thereby evaluating its necessity and sufficiency. Experimental results demonstrate the effectiveness of LONSREX.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes LONSREX, a data synthesis pipeline for fine-tuning LLMs on explainable misinformation detection. It collects fact-checked articles, generates veracity predictions and rationales via strong LLMs, applies naive filtering on label correctness, and identifies two limitations of that approach: insufficient rationales (due to coarse binary labels) and unnecessary over-verification rationales. LONSREX introduces a metric quantifying the contribution of each verification step to the final prediction in order to locate necessary and sufficient rationales, with experiments claimed to show improved training data quality.

Significance. If the contribution metric correctly operationalizes necessity (counterfactual removal flips the veracity prediction) and sufficiency (the step alone supports the prediction), the pipeline could yield higher-quality fine-tuning data than label-only filtering, improving both predictive performance and rationale faithfulness in LLM-based misinformation detection. The multi-LLM generation step and explicit focus on rationale quality are practical strengths that could generalize beyond the reported setting.

major comments (2)
  1. [Abstract] Abstract (paragraph describing LONSREX): the metric is described only as one that 'quantifies the contribution of each verification step to the final prediction' without an explicit formula, algorithm, or proof that it implements necessity via removal (prediction flip) and sufficiency via isolation (prediction preserved). This definition is load-bearing for the central claim that LONSREX fixes the two stated limitations of naive filtering.
  2. [Experiments] Experiments section: results are reported to demonstrate effectiveness of LONSREX, yet no ablation isolates the metric's contribution, no comparison is made to human-annotated necessity/sufficiency labels, and no counterfactual verification (remove/add step and re-predict) is described. Without these, the claim that selected rationales are both necessary and sufficient rather than merely predictive correlates cannot be assessed.
minor comments (2)
  1. [Abstract] The abstract states that 'naive filtering based solely on label correctness is insufficient' but does not quantify how often the two failure modes occur in the collected data (e.g., percentage of instances with insufficient or over-verbose rationales).
  2. [Abstract] Dataset details (source of fact-checked articles, number of instances, veracity label distribution) are referenced but not reported with concrete numbers or splits in the provided abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our LONSREX pipeline. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph describing LONSREX): the metric is described only as one that 'quantifies the contribution of each verification step to the final prediction' without an explicit formula, algorithm, or proof that it implements necessity via removal (prediction flip) and sufficiency via isolation (prediction preserved). This definition is load-bearing for the central claim that LONSREX fixes the two stated limitations of naive filtering.

    Authors: We agree that the abstract should provide a more precise description of the metric to support the central claims. In the revised manuscript, we will expand the abstract to include the explicit formula for the contribution metric, a brief outline of the algorithm used to compute necessity and sufficiency, and an explanation of how it operationalizes these concepts through counterfactual removal (where removing the step flips the prediction) and isolation (where the step alone preserves the prediction). This will strengthen the presentation without altering the core methodology. revision: yes

  2. Referee: [Experiments] Experiments section: results are reported to demonstrate effectiveness of LONSREX, yet no ablation isolates the metric's contribution, no comparison is made to human-annotated necessity/sufficiency labels, and no counterfactual verification (remove/add step and re-predict) is described. Without these, the claim that selected rationales are both necessary and sufficient rather than merely predictive correlates cannot be assessed.

    Authors: We acknowledge the value of these additional analyses for rigorously validating that the selected rationales are necessary and sufficient. Our current experiments focus on end-to-end performance improvements in predictive accuracy and rationale quality when using LONSREX-filtered data compared to naive filtering. To address this, we will add an ablation study that isolates the contribution metric by comparing against a version without it. We will also include a comparison with a small set of human-annotated necessity and sufficiency labels on a subset of the data. Furthermore, we will describe the counterfactual verification process used in developing the metric, including examples of removal and addition of steps and the resulting prediction changes. These additions will be incorporated into the Experiments section of the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: pipeline and metric are defined independently of target performance

full rationale

The paper describes an empirical pipeline: collect fact-checked articles, generate predictions/rationales from strong LLMs, apply a new contribution metric to filter for necessary/sufficient rationales, then fine-tune. No derivation chain reduces the final claim (higher-quality data or better explainable MD) to a fitted parameter or self-citation by construction. The metric is introduced as a novel proposal whose correctness is evaluated experimentally rather than assumed via definition or prior self-work. This matches the default expectation of a non-circular empirical ML paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that a quantifiable contribution metric can isolate necessary and sufficient rationales; no free parameters or invented entities are identifiable from the abstract alone.

axioms (1)
  • domain assumption A metric can be defined that quantifies the contribution of each verification step to the final veracity prediction and thereby determines necessity and sufficiency.
    This assumption is required for LONSREX to improve upon naive label-based filtering.

pith-pipeline@v0.9.0 · 5862 in / 1181 out tokens · 39020 ms · 2026-05-20T06:04:02.433734+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 4 internal anchors

  1. [1]

    Tian Bian, Xi Xiao, Tingyang Xu, Peilin Zhao, Wenbing Huang, Yu Rong, and Junzhou Huang. 2020. Rumor Detection on Social Media with Bi-Directional Graph Convolutional Networks. InAAAI. 549–556

  2. [2]

    Xiaoshu Chen, Sihang Zhou, Ke Liang, Xiaoyu Sun, and Xinwang Liu. 2025. Skip- Thinking: Chunk-wise Chain-of-Thought Distillation Enable Smaller Language Models to Reason Better and Faster. InEMNLP. 12153–12168

  3. [3]

    Zeming Chen, Qiyue Gao, Antoine Bosselut, Ashish Sabharwal, and Kyle Richard- son. 2023. DISCO: Distilling Counterfactuals with Large Language Models. In ACL. 5514–5528

  4. [4]

    Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, and Jacob Andreas. 2023. Guiding Pretraining in Rein- forcement Learning with Large Language Models. InICML, Vol. 202. 8657–8677

  5. [5]

    Yaqian Dun, Kefei Tu, Chen Chen, Chunyan Hou, and Xiaojie Yuan. 2021. KAN: Knowledge-aware Attention Network for Fake News Detection. InAAAI. 81–89

  6. [6]

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al . 2025. DeepSeek-R1 in- centivizes reasoning in LLMs through reinforcement learning.Nature645, 8081 (2025), 633–638

  7. [7]

    Namgyu Ho, Laura Schmid, and Se-Young Yun. 2023. Large Language Models Are Reasoning Teachers. InACL. 14852–14882

  8. [8]

    Rongpei Hong, Jian Lang, Jin Xu, Zhangtao Cheng, Ting Zhong, and Fan Zhou

  9. [9]

    Following Clues, Approaching the Truth: Explainable Micro-Video Rumor Detection via Chain-of-Thought Reasoning. InWWW. 4684–4698

  10. [10]

    Beizhe Hu, Qiang Sheng, Juan Cao, Yuhui Shi, Yang Li, Danding Wang, and Peng Qi. 2024. Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection. InAAAI. 22105–22113

  11. [11]

    Xingyue Huang, Rishabh, Gregor Franke, Ziyi Yang, Jiamu Bai, Weijie Bai, Jinhe Bi, Zifeng Ding, Yiqun Duan, Chengyu Fan, et al. 2025. Loong: Synthesize Long Chain-of-Thoughts at Scale through Verifiers.CoRRabs/2509.03059 (2025)

  12. [12]

    Di Jin, Jun Yang, Xiaobao Wang, Junwei Zhang, Shuqi Li, and Dongxiao He. 2025. A Dynamic Knowledge Update-Driven Model with Large Language Models for Fake News Detection. InIJCAI. 3000–3008

  13. [13]

    Mingyu Jin, Weidi Luo, Sitao Cheng, Xinyi Wang, Wenyue Hua, Ruixiang Tang, William Yang Wang, and Yongfeng Zhang. 2025. Disentangling Memory and Reasoning Ability in Large Language Models. InACL. 1681–1701

  14. [14]

    Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large Language Models are Zero-Shot Reasoners. InNeurIPS

  15. [15]

    Zhiqiang Kou, Junyang Chen, Xin-Qiang Cai, Ming-Kun Xie, Biao Liu, Chang- wei Wang, Lei Feng, Yuheng Jia, Gang Niu, Masashi Sugiyama, and Xin Geng

  16. [16]

    Rethinking Toxicity Evaluation in Large Language Models: A Multi-Label Perspective.CoRRabs/2510.15007 (2025)

  17. [17]

    David MJ Lazer, Matthew A Baum, Yochai Benkler, Adam J Berinsky, Kelly M Greenhill, Filippo Menczer, Miriam J Metzger, Brendan Nyhan, Gordon Penny- cook, David Rothschild, et al. 2018. The science of fake news.Science359, 6380 (2018), 1094–1096

  18. [18]

    Zhenyu Lei, Zhen Tan, Song Wang, Yaochen Zhu, Zihan Chen, Yushun Dong, and Jundong Li. 2025. Learning from Diverse Reasoning Paths with Routing and Collaboration. InEMNLP. 2832–2845

  19. [19]

    Kaiyuan Liu, Shaotian Yan, Rui Miao, Bing Wang, Chen Shen, Jun Zhang, and Jieping Ye. 2026. Where Did This Sentence Come From? Tracing Provenance in LLM Reasoning Distillation. InICLR

  20. [20]

    Yuhan Liu, Yuxuan Liu, Xiaoqing Zhang, Xiuying Chen, and Rui Yan. 2025. The Truth Becomes Clearer Through Debate! Multi-Agent Systems with Large Language Models Unmask Fake News. InSIGIR. 504–514

  21. [21]

    Zhiwei Liu, Kailai Yang, Qianqian Xie, Christine De Kock, Sophia Ananiadou, and Eduard Hovy. 2025. Raemollm: Retrieval augmented llms for cross-domain mis- information detection using in-context learning based on emotional information. InACL. 16508–16523

  22. [22]

    Yijia Luo, Yulin Song, Xingyao Zhang, Jiaheng Liu, Weixun Wang, Gengru Chen, Wenbo Su, and Bo Zheng. 2025. Deconstructing Long Chain-of-Thought: A Structured Reasoning Optimization Framework for Long CoT Distillation.CoRR abs/2503.16385 (2025)

  23. [23]

    Arkadiusz Modzelewski, Witold Sosnowski, Tiziano Labruna, Adam Wierzbicki, and Giovanni Da San Martino. 2025. PCoT: Persuasion-Augmented Chain of Thought for Detecting Fake News and Social Media Disinformation. InACL. 24959–24983

  24. [24]

    s1: Simple test-time scaling

    Niklas Muennighoff, Zitong Yang, Weijia Shi, Xiang Lisa Li, Li Fei-Fei, Hannaneh Hajishirzi, Luke Zettlemoyer, Percy Liang, Emmanuel J. Candès, and Tatsunori Hashimoto. 2025. s1: Simple test-time scaling.CoRRabs/2501.19393 (2025)

  25. [25]

    Qiong Nan, Qiang Sheng, Juan Cao, Beizhe Hu, Danding Wang, and Jintao Li. 2024. Let Silence Speak: Enhancing Fake News Detection with Generated Comments from Large Language Models. InCIKM. 1732–1742

  26. [26]

    OpenAI. 2025. gpt-oss-120b & gpt-oss-20b Model Card.CoRRabs/2508.10925 (2025)

  27. [27]

    Piotr Przybyla. 2020. Capturing the Style of Fake News. InAAAI. 490–497

  28. [28]

    Peng Qi, Yuyan Bu, Juan Cao, Wei Ji, Ruihao Shui, Junbin Xiao, Danding Wang, and Tat-Seng Chua. 2023. FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms. InAAAI. 14444– 14452

  29. [29]

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. 2024. DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models.CoRRabs/2402.03300 (2024)

  30. [30]

    Jinyan Su, Terry Yue Zhuo, Jonibek Mansurov, Di Wang, and Preslav Nakov. 2023. Fake News Detectors are Biased against Texts Generated by Large Language Models.CoRRabs/2309.08674 (2023)

  31. [31]

    Zhao Tong, Yimeng Gu, Huidong Liu, Qiang Liu, Shu Wu, Haichao Shi, and Xiao- Yu Zhang. 2025. Generate First, Then Sample: Enhancing Fake News Detection with LLM-Augmented Reinforced Sampling. InACL. 24276–24290

  32. [32]

    Soroush Vosoughi, Deb Roy, and Sinan Aral. 2018. The spread of true and false news online.Science359, 6380 (2018), 1146–1151

  33. [33]

    Herun Wan, Shangbin Feng, Zhaoxuan Tan, Heng Wang, Yulia Tsvetkov, and Minnan Luo. 2024. DELL: Generating Reactions and Explanations for LLM-Based Misinformation Detection. InFindings of ACL. 2637–2667

  34. [34]

    Bing Wang, Ximing Li, Changchun Li, Bo Fu, Songwen Pei, and Shengsheng Wang. 2024. Why Misinformation is Created? Detecting them by Integrating Intent Features. InCIKM. 2304–2314

  35. [35]

    Bing Wang, Ximing Li, Changchun Li, Bingrui Zhao, Bo Fu, Renchu Guan, and Shengsheng Wang. 2025. Robust Misinformation Detection by Visiting Potential Commonsense Conflict. InIJCAI. 7760–7768

  36. [36]

    Bo Wang, Jing Ma, Hongzhan Lin, Zhiwei Yang, Ruichao Yang, Yuan Tian, and Yi Chang. 2024. Explainable Fake News Detection with Large Language Model via Defense Among Competing Wisdom. InWWW. 2452–2463

  37. [37]

    Bing Wang, Bingrui Zhao, Ximing Li, Changchun Li, Wanfu Gao, and Sheng- sheng Wang. 2025. Collaboration and Controversy Among Experts: Rumor Early Detection by Tuning a Comment Generator. InSIGIR. 468–478

  38. [38]

    Peifeng Wang, Zhengyang Wang, Zheng Li, Yifan Gao, Bing Yin, and Xiang Ren

  39. [39]

    SCOTT: Self-Consistent Chain-of-Thought Distillation. InACL. 5546–5558

  40. [40]

    Yifeng Wang, Zhouhong Gu, Siwei Zhang, Suhang Zheng, Tao Wang, Tianyu Li, Hongwei Feng, and Yanghua Xiao. 2025. LLM-GAN: Constructing Generative Adversarial Network Through Large Language Models for Explainable Fake News Detection. InIEEE International Conference on Acoustics, Speech and Signal Processing. 1–5

  41. [41]

    Zhengjia Wang, Danding Wang, Qiang Sheng, Juan Cao, Siyuan Ma, and Haonan Cheng. 2025. Exploring news intent and its application: A theory-driven approach. Information Processing & Management62, 6 (2025), 104229

  42. [42]

    Chi, Quoc V

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed H. Chi, Quoc V. Le, and Denny Zhou. 2022. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. InNeurIPS

  43. [43]

    Jiaying Wu and Bryan Hooi. 2023. DECOR: Degree-Corrected Social Graph Refinement for Fake News Detection. InKDD. 2582–2593

  44. [44]

    Shaotian Yan, Kaiyuan Liu, Chen Shen, Bing Wang, Sinan Fan, Jun Zhang, Yue Wu, Zheng Wang, and Jieping Ye. 2026. Distribution-Aligned Sequence Distillation for Superior Long-CoT Reasoning.CoRRabs/2601.09088 (2026)

  45. [45]

    An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. 2025. Qwen3 Technical Report.CoRRabs/2505.09388 (2025)

  46. [46]

    Xiaofang Yang, Lijun Li, Heng Zhou, Tong Zhu, Xiaoye Qu, Yuchen Fan, Qianshan Wei, Rui Ye, Li Kang, Yiran Qin, Zhiqiang Kou, Daizong Liu, Qi Li, Ning Ding, Siheng Chen, and Jing Shao. 2026. Toward Efficient Agents: Memory, Tool learning, and Planning.CoRRabs/2601.14192 (2026)

  47. [47]

    Zhiwei Yang, Jing Ma, Hechang Chen, Hongzhan Lin, Ziyang Luo, and Yi Chang

  48. [48]

    InCOLING

    A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Explainable Fake News Detection. InCOLING. 2608–2621

  49. [49]

    Yixin Ye, Zhen Huang, Yang Xiao, Ethan Chern, Shijie Xia, and Pengfei Liu. 2025. LIMO: Less is More for Reasoning. InCOLM

  50. [50]

    Xiaosong Yuan, Chen Shen, Shaotian Yan, Kaiyuan Liu, Xiaofeng Zhang, Sinan Fan, Liang Xie, Wenxiao Wang, Renchu Guan, Ying Wang, and Jieping Ye. 2026. Differential Fine-Tuning Large Language Models Towards Better Diverse Rea- soning Abilities. InICLR

  51. [51]

    Xiaosong Yuan, Chen Shen, Shaotian Yan, Xiaofeng Zhang, Liang Xie, Wenxiao Wang, Renchu Guan, Ying Wang, and Jieping Ye. 2024. Instance-adaptive Zero- shot Chain-of-Thought Prompting. InNeurIPS. 125469–125486

  52. [52]

    Zhenrui Yue, Huimin Zeng, Yimeng Lu, Lanyu Shang, Yang Zhang, and Dong Wang. 2024. Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation. InNAACL. 5628–5643

  53. [53]

    Dylan Zhang, Qirun Dai, and Hao Peng. 2025. The Best Instruction-Tuning Data are Those That Fit. InNeurIPS

  54. [54]

    Xueyao Zhang, Juan Cao, Xirong Li, Qiang Sheng, Lei Zhong, and Kai Shu. 2021. Mining Dual Emotion for Fake News Detection. InWWW. 3465–3476. Are Rationales Necessary and Sufficient? Tuning LLMs for Explainable Misinformation Detection KDD ’26, 9–13 August, 2026, Jeju, Korea Table 8: Experimental results on more advanced LLMs across four MD benchmarks. LLM...