pith. sign in

arxiv: 2605.17967 · v1 · pith:KXREPCC3new · submitted 2026-05-18 · 💻 cs.AI

Reconciling Contradictory Views on the Effectiveness of SFT in LLMs: An Interaction Perspective

Pith reviewed 2026-05-20 10:00 UTC · model grok-4.3

classification 💻 cs.AI
keywords supervised fine-tuninglarge language modelstoken interactionsnoise removaloverfittingearly stoppinginference patterns
0
0 comments X

The pith

Supervised fine-tuning primarily removes noise-like interactions in large language models rather than acquiring new reliable ones, with the beneficial phase being very short.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper aims to explain contradictory observations about supervised fine-tuning on large language models by examining how interactions between tokens evolve during the process. It establishes that SFT quickly eliminates noisy interactions but rarely learns dependable new ones, after which further training leads to overfitting. Readers would care because this accounts for why SFT can sometimes harm performance and suggests better ways to apply it in practice. The findings are validated on multiple models and datasets.

Core claim

We find that SFT primarily removes noise-like interactions, while rarely acquiring reliable new interactions. This denoising stage is extremely brief, after which continued fine-tuning tends to introduce overfitted interactions.

What carries the argument

The evolution of interactions between words or tokens during supervised fine-tuning, serving as a metric for inference patterns in LLMs.

If this is right

  • The denoising effect of SFT occurs rapidly and is followed by overfitting if training continues.
  • Early stopping can be used to maximize the benefits of SFT while avoiding detrimental overfitted interactions.
  • SFT is effective for LLMs mainly by cleaning up noise rather than by adding new capabilities.
  • These patterns hold across different LLMs and fine-tuning datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Interaction tracking could be extended to other fine-tuning techniques to identify optimal stopping points.
  • This view might reconcile similar inconsistencies seen in other large-scale training methods.
  • It implies that most reliable inference patterns are set during pre-training, with SFT serving a limited cleanup role.

Load-bearing premise

Interactions between tokens provide a faithful way to measure the inference patterns learned by large language models.

What would settle it

Count the number of noise-like and reliable interactions at successive stages of SFT and verify if performance improves only in the initial short phase before declining with added overfitted interactions.

Figures

Figures reproduced from arXiv: 2605.17967 by Guoxi Zhang, Hua Cai, Junpeng Zhang, Lei Cheng, Qing Xu, Quanshi Zhang.

Figure 1
Figure 1. Figure 1: (Top) The complex inference patterns of an LLM can be faithfully represented by a few [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Evolution of the distribution of newly emerged interactions ( [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Evolution of the representation quality of newly emerged, removed, and preserved inter [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Prediction utility and individual contributions of different types of interactions. We [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Empirical verification of universal matching on LLMs. Each row corresponds to a different [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Empirical verification of interaction sparsity on LLMs. We aggregate the interactions [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional results on the evolution of newly emerged, removed, and preserved interactions [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional results on the evolution of newly emerged, removed, and preserved interactions [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional results on the evolution of newly emerged, removed, and preserved interactions [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional results on the representation quality of newly emerged, removed, and preserved [PITH_FULL_IMAGE:figures/full_fig_p022_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: An example of AND-OR logical models constructed to faithfully explain the output scores [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 9
Figure 9. Figure 9: Another example of AND-OR logical models explaining the DeepSeek model (top) and the [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗
read the original abstract

This paper explores a scientific question in supervised fine-tuning (SFT): why SFT is broadly effective for small-scale deep neural networks, yet can produce inconsistent or even detrimental effects when applied to large language models (LLMs). Recent advances in interaction-based explanations suggest that interactions between words/tokens provide a faithful metric for quantifying the inference patterns encoded by LLMs. We find that the evolution of interactions during SFT can effectively explain the inconsistent effectiveness of SFT for LLMs. Specifically, we find that (1) SFT primarily removes noise-like interactions, while rarely acquiring reliable new interactions. (2) This denoising stage is extremely brief, after which continued fine-tuning tends to introduce overfitted interactions. We validate these findings across multiple LLMs and datasets. Our findings provide new insights into early stopping and offer practical guidance for LLM training.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript claims that inconsistent effectiveness of supervised fine-tuning (SFT) on LLMs versus small networks can be reconciled by tracking token interactions: SFT briefly removes noise-like interactions without acquiring reliable new ones, after which continued training introduces overfitted interactions; this is validated across multiple LLMs and datasets and yields guidance on early stopping.

Significance. If the interaction metric is shown to faithfully track inference patterns and causally explain SFT outcomes, the work could reconcile contradictory SFT results and supply concrete training heuristics. The approach is novel in applying interaction dynamics to the SFT puzzle, but its significance is limited by the absence of direct links between observed interaction changes and downstream task performance.

major comments (2)
  1. [Abstract and §4] Abstract and §4 (Experiments): the claim of validation 'across multiple LLMs and datasets' is stated without reporting controls, baseline comparisons, or the exact procedure for quantifying and classifying interactions as 'noise-like' versus 'overfitted'; this omission makes it impossible to assess whether the denoising-then-overfitting trajectory is robust or merely descriptive of the chosen metric.
  2. [§2 and §3] §2 (Interaction Metric) and §3 (Evolution Analysis): the central explanatory claim requires that changes in the interaction measure directly account for SFT effectiveness, yet no ablation, held-out prediction test, or alignment with known spurious/causal features is reported; without such evidence the narrative risks being a post-hoc description of metric dynamics rather than a causal account.
minor comments (2)
  1. [§2] Define 'noise-like' and 'overfitted' interactions with explicit mathematical criteria or thresholds rather than qualitative description.
  2. [§4] Add a table or figure caption clarifying the precise LLMs, datasets, and interaction-extraction method used in the validation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us improve the clarity and rigor of our work. Below, we provide detailed responses to each major comment and indicate the revisions made to the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): the claim of validation 'across multiple LLMs and datasets' is stated without reporting controls, baseline comparisons, or the exact procedure for quantifying and classifying interactions as 'noise-like' versus 'overfitted'; this omission makes it impossible to assess whether the denoising-then-overfitting trajectory is robust or merely descriptive of the chosen metric.

    Authors: We agree that additional details on the experimental procedure are necessary to allow readers to assess robustness. In the revised manuscript, we have expanded §4 with a dedicated subsection describing the exact quantification of interactions (including the mathematical definition and computation steps), the classification criteria for noise-like interactions (those whose removal improves validation performance without harming training) versus overfitted ones (those that boost training but degrade held-out performance), and the specific thresholds applied. We have also added baseline comparisons using randomly permuted token interactions and controls varying random seeds and hyperparameter settings across the reported LLMs and datasets. These revisions should enable a clearer evaluation of whether the observed trajectory is robust. revision: yes

  2. Referee: [§2 and §3] §2 (Interaction Metric) and §3 (Evolution Analysis): the central explanatory claim requires that changes in the interaction measure directly account for SFT effectiveness, yet no ablation, held-out prediction test, or alignment with known spurious/causal features is reported; without such evidence the narrative risks being a post-hoc description of metric dynamics rather than a causal account.

    Authors: We acknowledge that stronger evidence linking interaction changes directly to SFT outcomes would better support the causal narrative. The original §3 presents consistent temporal alignments between interaction evolution and performance shifts, but we agree that ablations and held-out tests were not included. In the revision, we have added a held-out prediction experiment in §3 that uses early interaction changes to forecast later SFT effectiveness and compares predictions against observed results. We have also included a brief alignment analysis with known spurious features in one dataset. Full causal interventions remain challenging due to scale, so we have noted this limitation and suggested it as future work. This constitutes a partial but substantive improvement to the explanatory section. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external interaction metric and empirical observations.

full rationale

The paper treats interactions between tokens as a pre-existing explanatory tool drawn from recent advances in interaction-based explanations, then tracks their evolution empirically across SFT stages on multiple LLMs and datasets. No equation or claim reduces the observed denoising/overfitting pattern to a definition or fit that is constructed from the target SFT-effectiveness conclusion itself. The central narrative is presented as an interpretation of measured changes rather than a self-referential loop, and the validation steps are independent of the interpretive framing.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on treating token interactions as a faithful explanatory metric; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Interactions between words/tokens provide a faithful metric for quantifying the inference patterns encoded by LLMs.
    Invoked in the abstract as the basis for using interaction evolution to explain SFT effectiveness.

pith-pipeline@v0.9.0 · 5686 in / 1186 out tokens · 36449 ms · 2026-05-20T10:00:03.024393+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 8 internal anchors

  1. [1]

    Qwen Technical Report

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

  2. [2]

    Proxyspex: Inference-efficient interpretability via sparse feature interactions in llms.arXiv preprint arXiv:2505.17495, 2025

    Landon Butler, Abhineet Agarwal, Justin Singh Kang, Yigit Efe Erginbas, Bin Yu, and Kannan Ramchandran. Proxyspex: Inference-efficient interpretability via sparse feature interactions in llms.arXiv preprint arXiv:2505.17495, 2025

  3. [3]

    Unilaw-r1: A large language model for legal reasoning with reinforcement learning and iterative inference

    Hua Cai, Shuang Zhao, Liang Zhang, Xuli Shen, Qing Xu, Weilin Shen, Zihao Wen, and Tianke Ban. Unilaw-r1: A large language model for legal reasoning with reinforcement learning and iterative inference. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 18128–18142, 2025

  4. [4]

    Ma-rlhf: Rein- forcement learning from human feedback with macro actions.arXiv preprint arXiv:2410.02743, 2024

    Yekun Chai, Haoran Sun, Huang Fang, Shuohuan Wang, Yu Sun, and Hua Wu. Ma-rlhf: Rein- forcement learning from human feedback with macro actions.arXiv preprint arXiv:2410.02743, 2024

  5. [5]

    Defining and extracting generalizable interaction primitives from DNNs

    Lu Chen, Siyu Lou, Benhao Huang, and Quanshi Zhang. Defining and extracting generalizable interaction primitives from DNNs. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=OCqyFVFNeF

  6. [6]

    Can llms reason soundly in law? auditing inference patterns for legal judgment

    Lu Chen, Yuxuan Huang, Yixing Li, Dongrui Liu, Qihan Ren, Kun Kuang, Zilong Zheng, Quanshi Zhang, et al. Can llms reason soundly in law? auditing inference patterns for legal judgment. InThe Fourteenth International Conference on Learning Representations, 2026

  7. [7]

    Scaling instruction-finetuned language models.Journal of Machine Learning Research, 25(70):1–53, 2024

    Hyung Won Chung, Le Hou, Shayne Longpre, Barret Zoph, Yi Tay, William Fedus, Yunxuan Li, Xuezhi Wang, Mostafa Dehghani, Siddhartha Brahma, et al. Scaling instruction-finetuned language models.Journal of Machine Learning Research, 25(70):1–53, 2024

  8. [8]

    Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023

    Mike Conover, Matt Hayes, Ankit Mathur, Jianwei Xie, Jun Wan, Sam Shah, Ali Ghodsi, Patrick Wendell, Matei Zaharia, and Reynold Xin. Free dolly: Introducing the world’s first truly open instruction-tuned llm, 2023. URL https://www.databricks.com/blog/2023/ 04/12/dolly-first-open-commercially-viable-instruction-tuned-llm

  9. [9]

    Safe RLHF: Safe Reinforcement Learning from Human Feedback

    Josef Dai, Xuehai Pan, Ruiyang Sun, Jiaming Ji, Xinbo Xu, Mickel Liu, Yizhou Wang, and Yaodong Yang. Safe rlhf: Safe reinforcement learning from human feedback.arXiv preprint arXiv:2310.12773, 2023

  10. [10]

    Goemotions: A dataset of fine-grained emotions

    Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, and Sujith Ravi. Goemotions: A dataset of fine-grained emotions. InProceedings of the 58th annual meeting of the association for computational linguistics, pages 4040–4054, 2020

  11. [11]

    Discovering and explaining the representation bottleneck of dnns.arXiv preprint arXiv:2111.06236, 2021

    Huiqi Deng, Qihan Ren, Hao Zhang, and Quanshi Zhang. Discovering and explaining the representation bottleneck of dnns.arXiv preprint arXiv:2111.06236, 2021

  12. [12]

    Bert: Pre-training of deep bidirectional transformers for language understanding

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019

  13. [13]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

  14. [14]

    The False Promise of Imitating Proprietary LLMs

    Arnav Gudibande, Eric Wallace, Charlie Snell, Xinyang Geng, Hao Liu, Pieter Abbeel, Sergey Levine, and Dawn Song. The false promise of imitating proprietary llms.arXiv preprint arXiv:2305.15717, 2023

  15. [15]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 11

  16. [16]

    Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3, 2022

  17. [17]

    Learning to understand: Identifying interactions via the möbius transform.Advances in Neural Information Processing Systems, 37:46160–46202, 2024

    Justin S Kang, Yigit E Erginbas, Landon Butler, Ramtin Pedarsani, and Kannan Ramchandran. Learning to understand: Identifying interactions via the möbius transform.Advances in Neural Information Processing Systems, 37:46160–46202, 2024

  18. [18]

    Spex: Scaling feature interaction explanations for llms

    Justin Singh Kang, Landon Butler, Abhineet Agarwal, Yigit Efe Erginbas, Ramtin Pedarsani, Kannan Ramchandran, and Bin Yu. Spex: Scaling feature interaction explanations for llms. arXiv preprint arXiv:2502.13870, 2025

  19. [19]

    Defining and quantifying and-or interactions for faithful and concise explanation of dnns.arXiv preprint arXiv:2304.13312, 2023

    Mingjie Li and Quanshi Zhang. Defining and quantifying and-or interactions for faithful and concise explanation of dnns.arXiv preprint arXiv:2304.13312, 2023

  20. [20]

    Does a neural network really encode symbolic concepts? In International conference on machine learning, pages 20452–20469, 2023

    Mingjie Li and Quanshi Zhang. Does a neural network really encode symbolic concepts? In International conference on machine learning, pages 20452–20469, 2023

  21. [21]

    An empirical study of catastrophic forgetting in large language models during continual fine-tuning.IEEE Transactions on Audio, Speech and Language Processing, 2025

    Yun Luo, Zhen Yang, Fandong Meng, Yafu Li, Jie Zhou, and Yue Zhang. An empirical study of catastrophic forgetting in large language models during continual fine-tuning.IEEE Transactions on Audio, Speech and Language Processing, 2025

  22. [22]

    Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

    Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, et al. Training language models to follow instructions with human feedback.Advances in neural information processing systems, 35:27730–27744, 2022

  23. [23]

    Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

    Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020

  24. [24]

    Defining and quantifying the emergence of sparse concepts in dnns

    Jie Ren, Mingjie Li, Qirui Chen, Huiqi Deng, and Quanshi Zhang. Defining and quantifying the emergence of sparse concepts in dnns. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 20280–20289, 2023

  25. [25]

    Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts

    Qihan Ren, Huiqi Deng, Yunuo Chen, Siyu Lou, and Quanshi Zhang. Bayesian neural networks avoid encoding complex and perturbation-sensitive concepts. InInternational Conference on Machine Learning, pages 28889–28913. PMLR, 2023

  26. [26]

    Where we have arrived in proving the emergence of sparse interaction primitives in dnns

    Qihan Ren, Jiayang Gao, Wen Shen, and Quanshi Zhang. Where we have arrived in proving the emergence of sparse interaction primitives in dnns. InThe Twelfth International Conference on Learning Representations, 2024

  27. [27]

    Towards the dynamics of a dnn learning symbolic interactions.Advances in Neural Information Processing Systems, 37:50653–50688, 2024

    Qihan Ren, Junpeng Zhang, Yang Xu, Yue Xin, Dongrui Liu, and Quanshi Zhang. Towards the dynamics of a dnn learning symbolic interactions.Advances in Neural Information Processing Systems, 37:50653–50688, 2024

  28. [28]

    A value for n-person games

    Lloyd S Shapley et al. A value for n-person games. 1953

  29. [29]

    Instruction tuning with loss over instructions.Advances in Neural Information Processing Systems, 37:69176–69205, 2024

    Zhengyan Shi, Adam X Yang, Bin Wu, Laurence Aitchison, Emine Yilmaz, and Aldo Lipani. Instruction tuning with loss over instructions.Advances in Neural Information Processing Systems, 37:69176–69205, 2024

  30. [30]

    Symtrustai: The world’s first verifiable ai mechanistic diagnostic platform, 2026

    SymTrustAI. Symtrustai: The world’s first verifiable ai mechanistic diagnostic platform, 2026. URLhttps://www.symtrustai.com/en/

  31. [31]

    Gemma Team. Gemma 3. 2025. URLhttps://goo.gle/Gemma3Report

  32. [32]

    Qwen2.5: A party of foundation models, September 2024

    Qwen Team. Qwen2.5: A party of foundation models, September 2024. URL https:// qwenlm.github.io/blog/qwen2.5/

  33. [33]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023. 12

  34. [34]

    A unified ap- proach to interpreting and boosting adversarial transferability.arXiv preprint arXiv:2010.04055, 2020

    Xin Wang, Jie Ren, Shuyun Lin, Xiangming Zhu, Yisen Wang, and Quanshi Zhang. A unified ap- proach to interpreting and boosting adversarial transferability.arXiv preprint arXiv:2010.04055, 2020

  35. [35]

    Two-stage llm fine-tuning with less specialization and more generalization

    Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, and Sanjiv Kumar. Two-stage llm fine-tuning with less specialization and more generalization. arXiv preprint arXiv:2211.00635, 2022

  36. [36]

    Finetuned Language Models Are Zero-Shot Learners

    Jason Wei, Maarten Bosma, Vincent Y Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero-shot learners.arXiv preprint arXiv:2109.01652, 2021

  37. [37]

    Robust reinforcement learning from human feedback for large language models fine-tuning.arXiv preprint arXiv:2504.03784, 2025

    Kai Ye, Hongyi Zhou, Jin Zhu, Francesco Quinzan, and Chengchun Shi. Robust reinforce- ment learning from human feedback for large language models fine-tuning.arXiv preprint arXiv:2504.03784, 2025

  38. [38]

    LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models

    Yaowei Zheng, Richong Zhang, Junhao Zhang, Yanhan Ye, Zheyan Luo, Zhangchi Feng, and Yongqiang Ma. Llamafactory: Unified efficient fine-tuning of 100+ language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand, 2024. Association for Computational Linguist...

  39. [39]

    Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

    Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

  40. [40]

    Explaining generalization power of a dnn using interactive concepts

    Huilin Zhou, Hao Zhang, Huiqi Deng, Dongrui Liu, Wen Shen, Shih-Han Chan, and Quanshi Zhang. Explaining generalization power of a dnn using interactive concepts. InProceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 17105–17113, 2024

  41. [41]

    Towards the first principles of explaining dnns: interactions explain the learning dynamics.Frontiers of Information Technology & Electronic Engineering, 26(7):1017–1026, 2025

    Huilin Zhou, Qihan Ren, Junpeng Zhang, and Quanshi Zhang. Towards the first principles of explaining dnns: interactions explain the learning dynamics.Frontiers of Information Technology & Electronic Engineering, 26(7):1017–1026, 2025. 13 Appendix This appendix provides detailed information that supports the main paper. For clarity, the appendix is organiz...