pith. machine review for the scientific record. sign in

arxiv: 2604.17051 · v1 · submitted 2026-04-18 · 💻 cs.CL · cs.AI

Recognition: unknown

Efficient Task Adaptation in Large Language Models via Selective Parameter Optimization

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:18 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords catastrophic forgettingparameter optimizationselective fine-tuninglarge language modelstask adaptationcore parametersLLM fine-tuning
0
0 comments X

The pith

Freezing core parameters during fine-tuning preserves general knowledge in LLMs while allowing task-specific adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models lose general capabilities when fully fine-tuned on domain tasks because updates overwrite pre-trained knowledge. This paper introduces a method to evaluate parameter importance and separate core parameters essential for broad language skills from non-core ones tuned to specific domains. By keeping core parameters fixed and updating only the rest, the approach aims to maintain overall performance while gaining expertise in areas like science and medicine. Experiments with models such as GPT-J and LLaMA-3 on relevant tasks indicate this selective strategy reduces forgetting and improves adaptability.

Core claim

The paper claims that by using a parameter importance evaluation method to identify and fix core parameters critical for general language ability, while fine-tuning only non-core parameters sensitive to specific tasks, LLMs can achieve better domain adaptation without catastrophic forgetting, as shown in tests on scientific, medical, and physical tasks using GPT-J and LLaMA-3.

What carries the argument

The parameter element importance evaluation method, which distinguishes parameters based on their contribution to general versus domain-specific tasks and enables selective freezing of the core set.

If this is right

  • General language understanding remains intact because core parameters are not updated.
  • Task performance on specific domains improves through targeted optimization of non-core parameters.
  • Overall model transferability increases compared to traditional full-parameter fine-tuning.
  • The method works across different model architectures like GPT-J and LLaMA-3.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such selective optimization could lower the risk of models losing broad utility when specialized.
  • Future work might explore automated ways to refine the importance evaluation for different tasks.
  • This division highlights that not all parameters contribute equally to model capabilities.

Load-bearing premise

Model parameters can be meaningfully divided into core and non-core categories using an importance evaluation method such that freezing the core set preserves general knowledge without harming task learning.

What would settle it

Observing that the model's accuracy on general language benchmarks drops significantly after selective fine-tuning compared to before, or that domain task performance fails to match or exceed full fine-tuning results.

Figures

Figures reproduced from arXiv: 2604.17051 by Jiangjiang Zhao, Weijie Wan.

Figure 1
Figure 1. Figure 1: Comparison of fine-tuning strategies for mitigating catastrophic [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
read the original abstract

Large Language Models (LLMs) have demonstrated excellent performance in general language understanding, generation and other tasks. However, when fine-tuning for specific domain tasks, the general knowledge accumulated in the pre-training phase is often partially overwritten or forgotten due to parameter updates, which severely limits the generalization ability and transferability of LLMs. Traditional fine-tuning strategies mostly train on the entire parameter space, ignoring the heterogeneity of model parameters, that is, some parameters are extremely important for general tasks, while other parameters are more sensitive to specific tasks. To alleviate the above problems, this paper innovatively proposes a parameter element importance evaluation method, which divides parameters into "core parameters" and "non-core parameters" by distinguishing the importance of parameters for general language ability tasks and specific domain tasks, and fixes the core parameters during fine-tuning, and only fine-tunes the non-core parameters. Extensive experiments on scientific, medical and physical tasks using GPT-J and LLaMA-3 show that our method can mitigate catastrophic forgetting while enhancing the adaptability of the model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes a selective parameter optimization method for adapting LLMs to domain tasks. It introduces a parameter importance evaluation to partition model parameters into 'core' (critical for general language ability) and 'non-core' (task-sensitive) sets, then freezes the core parameters while fine-tuning only the non-core ones. The central claim is that this mitigates catastrophic forgetting of pre-trained knowledge while improving adaptability, as demonstrated in experiments on scientific, medical, and physical tasks with GPT-J and LLaMA-3.

Significance. If the importance evaluation reliably identifies a non-arbitrary split that preserves general capabilities better than alternatives, the approach could enable more efficient domain adaptation than full fine-tuning or standard parameter-efficient methods, reducing compute costs and improving retention of broad knowledge in LLMs.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (Method): The parameter importance evaluation method used to divide parameters into core and non-core sets is not described with sufficient detail (e.g., no specification of the metric, computation procedure, or thresholds), preventing assessment of whether the partition has a causal link to forgetting or task sensitivity.
  2. [§4] §4 (Experiments): No ablation is reported comparing the proposed importance-based freezing against random selection of an equivalent fraction of parameters to update. Without this control, the experiments cannot isolate whether observed gains in retention and task performance arise from the specific evaluation or simply from updating fewer parameters overall.
  3. [§4] §4 (Experiments): The abstract and experimental description supply no quantitative results, baselines (e.g., full fine-tuning, LoRA, or other selective methods), or error analysis, so it is impossible to evaluate the magnitude of forgetting mitigation or statistical reliability of the claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and commit to revisions that will strengthen the clarity, controls, and reporting in the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Method): The parameter importance evaluation method used to divide parameters into core and non-core sets is not described with sufficient detail (e.g., no specification of the metric, computation procedure, or thresholds), preventing assessment of whether the partition has a causal link to forgetting or task sensitivity.

    Authors: We agree that the current description lacks sufficient detail for full reproducibility and causal assessment. In the revised manuscript we will expand §3 with the precise importance metric (parameter sensitivity measured via gradient norms on held-out general-language data versus domain-task data), the full computation procedure (including data splits, scoring formula, and aggregation across layers), and the explicit threshold or top-k selection rule used to designate core versus non-core parameters. These additions will make the link to forgetting mitigation transparent. revision: yes

  2. Referee: [§4] §4 (Experiments): No ablation is reported comparing the proposed importance-based freezing against random selection of an equivalent fraction of parameters to update. Without this control, the experiments cannot isolate whether observed gains in retention and task performance arise from the specific evaluation or simply from updating fewer parameters overall.

    Authors: We accept this criticism. The revised §4 will include a new ablation that freezes a randomly chosen set of parameters whose size matches the non-core set identified by our method. Results on the same scientific, medical, and physical tasks will be reported side-by-side with the importance-based runs, allowing readers to determine whether the observed retention and adaptation gains are attributable to the importance evaluation rather than to the mere reduction in updated parameters. revision: yes

  3. Referee: [§4] §4 (Experiments): The abstract and experimental description supply no quantitative results, baselines (e.g., full fine-tuning, LoRA, or other selective methods), or error analysis, so it is impossible to evaluate the magnitude of forgetting mitigation or statistical reliability of the claims.

    Authors: We acknowledge that the present version under-reports quantitative outcomes. The revised abstract will summarize key metrics (task accuracy gains and forgetting scores on general benchmarks). Section 4 will be expanded with full tables comparing our method against full fine-tuning, LoRA, and other selective baselines, together with mean performance, standard deviations across three random seeds, and statistical significance tests. This will allow direct assessment of the magnitude and reliability of the forgetting-mitigation effect. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with independent experimental validation

full rationale

The paper introduces an importance evaluation method to partition parameters into core (general-language) and non-core (task-sensitive) sets, then freezes the core set during fine-tuning. This is framed as an innovative proposal whose effectiveness is demonstrated via experiments on GPT-J and LLaMA-3 across scientific, medical, and physical tasks. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the provided text. The central claim rests on empirical outcomes rather than any self-definitional reduction or ansatz smuggled via prior work. The method's contribution is isolated by the experiments themselves, satisfying the requirement for self-contained, non-circular support.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claim depends on the postulated distinction between core and non-core parameters and the existence of an effective importance evaluation procedure, neither of which receives independent grounding or falsifiable criteria in the abstract.

invented entities (2)
  • core parameters no independent evidence
    purpose: Parameters identified as extremely important for general language ability tasks that are fixed during fine-tuning to prevent forgetting
    Introduced as a new category to distinguish from parameters more sensitive to specific domain tasks.
  • non-core parameters no independent evidence
    purpose: Parameters more sensitive to specific domain tasks that are selectively fine-tuned
    Defined as the complement to core parameters in the proposed division.

pith-pipeline@v0.9.0 · 5473 in / 1334 out tokens · 50763 ms · 2026-05-10T06:18:16.963220+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

58 extracted references · 30 canonical work pages · 8 internal anchors

  1. [1]

    Continual learning through synaptic intelligence,

    F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” inInternational conference on machine learning. PMLR, 2017, pp. 3987–3995

  2. [2]

    Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017

    J. Kirkpatrick, R. Pascanu, N. C. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2016. [Online]. Available: ht...

  3. [3]

    Language models meet world models: Embodied experiences enhance language models,

    J. Xiang, T. Tao, Y . Gu, T. Shu, Z. Wang, Z. Yang, and Z. Hu, “Language models meet world models: Embodied experiences enhance language models,”Advances in neural information processing systems

  4. [4]

    Analyzing and reducing catastrophic forgetting in parameter efficient tuning.arXiv preprint arXiv:2402.18865, 2024

    W. Ren, X. Li, L. Wang, T. Zhao, and W. Qin, “Analyzing and reducing catastrophic forgetting in parameter efficient tuning,”arXiv preprint arXiv:2402.18865, 2024

  5. [5]

    LoRA: Low-Rank Adaptation of Large Language Models

    E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021

  6. [6]

    Lora-pro: Are low-rank adapters properly optimized?

    Z. Wang, J. Liang, R. He, Z. Wang, and T. Tan, “Lora-pro: Are low-rank adapters properly optimized?”arXiv preprint arXiv:2407.18242, 2024

  7. [7]

    arXiv preprint arXiv:2306.09782 , year=

    K. Lv, Y . Yang, T. Liu, Q. Gao, Q. Guo, and X. Qiu, “Full parameter fine-tuning for large language models with limited resources,”arXiv preprint arXiv:2306.09782, 2023

  8. [8]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

  9. [9]

    LLaMA: Open and Efficient Foundation Language Models

    H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

  10. [10]

    Learning without forgetting,

    Z. Li and D. Hoiem, “Learning without forgetting,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935– 2947, 2017

  11. [11]

    Mea- suring catastrophic forgetting in neural networks,

    R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan, “Mea- suring catastrophic forgetting in neural networks,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

  12. [12]

    The Pile: An 800GB Dataset of Diverse Text for Language Modeling

    L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshimaet al., “The pile: An 800gb dataset of diverse text for language modeling,”arXiv preprint arXiv:2101.00027, 2020

  13. [13]

    Medmcqa: A large- scale multi-subject multi-choice dataset for medical domain question answering,

    A. Pal, L. K. Umapathi, and M. Sankarasubbu, “Medmcqa: A large- scale multi-subject multi-choice dataset for medical domain question answering,” inConference on health, inference, and learning. PMLR, 2022, pp. 248–260

  14. [14]

    Crowdsourcing Multiple Choice Science Questions

    J. Welbl, N. F. Liu, and M. Gardner, “Crowdsourcing multiple choice science questions,”arXiv preprint arXiv:1707.06209, 2017

  15. [15]

    Piqa: Reasoning about physical commonsense in natural language,

    Y . Bisk, R. Zellers, J. Gao, Y . Choiet al., “Piqa: Reasoning about physical commonsense in natural language,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 05, 2020, pp. 7432– 7439

  16. [16]

    GPT-J-6B: A 6 Billion Param- eter Autoregressive Language Model,

    B. Wang and A. Komatsuzaki, “GPT-J-6B: A 6 Billion Param- eter Autoregressive Language Model,” https://github.com/kingoflolz/ mesh-transformer-jax, May 2021

  17. [17]

    The Llama 3 Herd of Models

    A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

  18. [18]

    A rank stabilization scaling factor for fine-tuning with lora,

    D. Kalajdzievski, “A rank stabilization scaling factor for fine-tuning with lora,”arXiv preprint arXiv:2312.03732, 2023

  19. [19]

    Memory Aware Synapses: Learning What (Not) to Forget,

    R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory Aware Synapses: Learning What (Not) to Forget,”arXiv preprint arXiv:1711.09601, 2017. [Online]. Available: https://arxiv.org/ abs/1711.09601

  20. [20]

    Adaptive multi-attention network incorporating answer information for duplicate question detection,

    D. Liang, F. Zhang, W. Zhang, Q. Zhang, J. Fu, M. Peng, T. Gui, and X. Huang, “Adaptive multi-attention network incorporating answer information for duplicate question detection,” inProceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 95–104

  21. [21]

    IMAGDressing-v1: Customizable virtual dressing,

    F. Shen, X. Jiang, X. He, H. Ye, C. Wang, X. Du, Z. Li, and J. Tang, “IMAGDressing-v1: Customizable virtual dressing,”arXiv preprint arXiv:2407.12705, 2024

  22. [22]

    Asynchronous deep interaction network for natural language inference,

    D. Liang, F. Zhang, Q. Zhang, and X.-J. Huang, “Asynchronous deep interaction network for natural language inference,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 2692–2700

  23. [23]

    Time-aware multiway adaptive fusion network for temporal knowledge graph ques- tion answering,

    Y . Liu, D. Liang, F. Fang, S. Wang, W. Wu, and R. Jiang, “Time-aware multiway adaptive fusion network for temporal knowledge graph ques- tion answering,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

  24. [24]

    Boosting consistency in story visualization with rich-contextual condi- tional diffusion models,

    F. Shen, H. Ye, S. Liu, J. Zhang, C. Wang, X. Han, and W. Yang, “Boosting consistency in story visualization with rich-contextual condi- tional diffusion models,”arXiv preprint arXiv:2407.02482, 2024

  25. [25]

    Dual path modeling for semantic matching by perceiving subtle conflicts,

    C. Xue, D. Liang, S. Wang, J. Zhang, and W. Wu, “Dual path modeling for semantic matching by perceiving subtle conflicts,” inICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

  26. [26]

    Searching for optimal subword tokenization in cross- domain ner,

    R. Ma, Y . Tan, X. Zhou, X. Chen, D. Liang, S. Wang, W. Wu, T. Gui, and Q. Zhang, “Searching for optimal subword tokenization in cross- domain ner,”arXiv preprint arXiv:2206.03352, 2022

  27. [27]

    Robust lottery tickets for pre-trained language models,

    R. Zheng, R. Bao, Y . Zhou, D. Liang, S. Wang, W. Wu, T. Gui, Q. Zhang, and X. Huang, “Robust lottery tickets for pre-trained language models,” arXiv preprint arXiv:2211.03013, 2022

  28. [28]

    Advancing pose-guided image synthesis with progressive conditional diffusion models,

    F. Shen, H. Ye, J. Zhang, C. Wang, X. Han, and W. Yang, “Advancing pose-guided image synthesis with progressive conditional diffusion models,”arXiv preprint arXiv:2310.06313, 2023

  29. [29]

    Improving semantic matching through dependency-enhanced pre-trained model with adaptive fusion,

    J. Song, D. Liang, R. Li, Y . Li, S. Wang, M. Peng, W. Wu, and Y . Yu, “Improving semantic matching through dependency-enhanced pre-trained model with adaptive fusion,” inFindings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, 2022, pp. 45–57. [Online]. Available: h...

  30. [30]

    DABERT: Dual attention enhanced BERT for semantic matching,

    S. Wang, D. Liang, J. Song, Y . Li, and W. Wu, “DABERT: Dual attention enhanced BERT for semantic matching,” inProceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Re- public of Korea: International Committee on Computational Linguistics, 2022, pp. 1645–1654. [Online]. Available: https://aclanthology.org/2022. coling-1.141

  31. [31]

    Robust lottery tickets for pre-trained language models,

    R. Zheng, B. Rong, Y . Zhou, D. Liang, S. Wang, W. Wu, T. Gui, Q. Zhang, and X. Huang, “Robust lottery tickets for pre-trained language models,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ire- land: Association for Computational Linguistics, 2022, pp. 2211–2224. [Online]. Avai...

  32. [32]

    Cqg: A simple and effective controlled generation framework for multi- hop question generation,

    Z. Fei, Q. Zhang, T. Gui, D. Liang, S. Wang, W. Wu, and X.-J. Huang, “Cqg: A simple and effective controlled generation framework for multi- hop question generation,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 6896–6906

  33. [33]

    S3prompt: Instructing the model with self- calibration, self-recall and self-aggregation to improve in-context learn- ing,

    J. Chen and J. Liu, “S3prompt: Instructing the model with self- calibration, self-recall and self-aggregation to improve in-context learn- ing,” inProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC- COLING 2024), 2024, pp. 14 259–14 271

  34. [34]

    Local and global: Temporal question answering via information fusion

    Y . Liu, D. Liang, M. Li, F. Giunchiglia, X. Li, S. Wang, W. Wu, L. Huang, X. Feng, and R. Guan, “Local and global: Temporal question answering via information fusion.” inIJCAI, 2023, pp. 5141–5149

  35. [35]

    Transferring from formal newswire domain with hypernet for twitter pos tagging,

    T. Gui, Q. Zhang, J. Gong, M. Peng, D. Liang, K. Ding, and X.-J. Huang, “Transferring from formal newswire domain with hypernet for twitter pos tagging,” inProceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 2540–2549

  36. [36]

    Resolving word vagueness with scenario-guided adapter for natural language inference,

    Y . Liu, M. Li, D. Liang, X. Li, F. Giunchiglia, L. Huang, X. Feng, and R. Guan, “Resolving word vagueness with scenario-guided adapter for natural language inference,”arXiv preprint arXiv:2405.12434, 2024

  37. [37]

    arXiv preprint arXiv:2506.04065, 2025

    Wu, Muling, Qian, Qi, Liu, Wenhao, Wang, Xiaohua, Huang, Zisu, Liang, Di, Miao, LI, Dou, Shihan, Lv, Changze, Wang, Zhenghua, et al.Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning. arXiv preprint arXiv:2506.04065, 2025

  38. [38]

    Question calibration and multi-hop modeling for temporal question answering,

    C. Xue, D. Liang, P. Wang, and J. Zhang, “Question calibration and multi-hop modeling for temporal question answering,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 19 332–19 340

  39. [39]

    ArXivabs/2408.09174(2024),https: //api.semanticscholar.org/CorpusID:2719028395, 6

    X. Wu, J. Yang, L. Chai, G. Zhang, J. Liu, X. Du, D. Liang, D. Shu, X. Cheng, T. Sunet al., “Tablebench: A comprehensive and complex benchmark for table question answering,”arXiv preprint arXiv:2408.09174, 2024

  40. [40]

    Hope: Hyperbolic rotary positional encoding for stable long-range dependency modeling in large language models.arXiv preprint arXiv:2509.05218, 2025

    Chang Dai, Hongyu Shan, Mingyang Song, and Di Liang. Hope: Hyperbolic rotary positional encoding for stable long-range dependency modeling in large language models.arXiv preprint arXiv:2509.05218, 2025

  41. [41]

    Decorl: Decoupling reasoning chains via parallel sub-step generation and cascaded reinforcement for interpretable and scalable rlhf

    Ziyuan Gao, Di Liang, Xianjie Wu, Philippe Morel, and Minlong Peng. Decorl: Decoupling reasoning chains via parallel sub-step generation and cascaded reinforcement for interpretable and scalable rlhf. InPro- ceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 30789–30797, 2026

  42. [42]

    Liang Li, Qisheng Liao, Meiting Lai, Di Liang, and Shangsong Liang

    Junchen Li, Chao Qi, Rongzheng Wang, Qizhi Chen, Liang Xu, Di Liang, Bob Simons, and Shuang Liang. When safety becomes a vulnerability: Exploiting llm alignment homogeneity for transferable blocking in rag.arXiv preprint arXiv:2603.03919, 2026

  43. [43]

    Parameter importance is not static: Evolving parameter isolation for supervised fine-tuning, 2026

    Zekai Lin, Chao Xue, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, and Minlong Peng. Parameter importance is not static: Evolving parameter isolation for supervised fine-tuning, 2026

  44. [44]

    Who stole your data? a method for detecting unauthorized rag theft.arXiv preprint arXiv:2510.07728, 2025

    Peiyang Liu, Ziqiang Cui, Di Liang, and Wei Ye. Who stole your data? a method for detecting unauthorized rag theft.arXiv preprint arXiv:2510.07728, 2025

  45. [45]

    Dpi: Exploiting parameter heterogeneity for interference-free fine-tuning.arXiv preprint arXiv:2601.17777, 2026

    Xiaoyu Liu, Xiaoyu Guan, Di Liang, and Xianjie Wu. Dpi: Exploiting parameter heterogeneity for interference-free fine-tuning.arXiv preprint arXiv:2601.17777, 2026

  46. [46]

    Structural reward model: Enhancing interpretability, efficiency, and scalability in reward modeling

    Xiaoyu Liu, Di Liang, Hongyu Shan, Peiyang Liu, Yonghao Liu, Muling Wu, Yuntao Li, Xianjie Wu, Li Miao, Jiangrong Shen, et al. Structural reward model: Enhancing interpretability, efficiency, and scalability in reward modeling. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 672– 685, 2025

  47. [47]

    Adaptive curriculum strategies: Stabilizing reinforcement learning for large language models

    Qi Qian, Muling Wu, Zisu Huang, Wenhao Liu, Changze Lv, Xiaohua Wang, Zhenghua Wang, Zhengkang Guo, Zhibo Xu, Lina Chen, et al. Adaptive curriculum strategies: Stabilizing reinforcement learning for large language models

  48. [48]

    Rethinking LLM-Driven Heuristic Design: Generating Efficient and Specialized Solvers via Dynamics-Aware Optimization

    Rongzheng Wang, Yihong Huang, Muquan Li, Jiakai Li, Di Liang, Bob Simons, Pei Ke, Shuang Liang, and Ke Qin. Rethinking llm- driven heuristic design: Generating efficient and specialized solvers via dynamics-aware optimization.arXiv preprint arXiv:2601.20868, 2026

  49. [49]

    Not all parameters are created equal: Smart isolation boosts fine-tuning performance.arXiv preprint arXiv:2508.21741, 2025

    Yao Wang, Di Liang, and Minlong Peng. Not all parameters are created equal: Smart isolation boosts fine-tuning performance.arXiv preprint arXiv:2508.21741, 2025

  50. [50]

    Breaking size barrier: Enhancing reasoning for large-size table question answering

    Xianjie Wu, Di Liang, Jian Yang, Xianfu Cheng, LinZheng Chai, Tongliang Li, Liqun Yang, and Zhoujun Li. Breaking size barrier: Enhancing reasoning for large-size table question answering. InInter- national Conference on Database Systems for Advanced Applications, pages 241–256. Springer, 2025

  51. [51]

    Mmtablebench: A multi-level multimodal benchmark for reasoning and layout complexity in table qa

    Xianjie Wu, Xiaohang Xu, Tingyu Jiang, Jian Yang, Di Liang, Xianfu Cheng, Zhenhe Wu, Linzheng Chai, Wei Zhang, Jiaheng Liu, et al. Mmtablebench: A multi-level multimodal benchmark for reasoning and layout complexity in table qa. InProceedings of the ACM Web Conference 2026, pages 3881–3892, 2026

  52. [52]

    Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

    Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, et al. Reason only when needed: Efficient generative reward modeling via model-internal uncertainty.arXiv preprint arXiv:2604.10072, 2026

  53. [53]

    Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

    Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, et al. Why supervised fine-tuning fails to learn: A systematic study of incomplete learning in large language models.arXiv preprint arXiv:2604.10079, 2026

  54. [54]

    Long-term talkingface generation via motion-prior conditional diffusion model,

    F. Shen, C. Wang, J. Gao, Q. Guo, J. Dang, J. Tang, and T.-S. Chua, “Long-term talkingface generation via motion-prior conditional diffusion model,”arXiv preprint arXiv:2502.09533, 2025

  55. [55]

    IMAGPose: A unified conditional framework for pose-guided person generation,

    F. Shen and J. Tang, “IMAGPose: A unified conditional framework for pose-guided person generation,” inProc. 38th Conf. on Neural Information Processing Systems (NeurIPS), 2024

  56. [56]

    Local and global: Text matching via syntax graph calibration,

    L. Li, Q. Liao, M. Lai, D. Liang, and S. Liang, “Local and global: Text matching via syntax graph calibration,” inICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 11 571–11 575

  57. [57]

    Unleashing potential of evidence in knowledge-intensive dialogue generation,

    X. Wu, J. Yang, T. Li, S. Zhang, Y . Du, L. Chai, D. Liang, and Z. Li, “Unleashing potential of evidence in knowledge-intensive dialogue generation,” inICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

  58. [58]

    Comateformer: Combined atten- tion transformer for semantic sentence matching,

    B. Li, D. Liang, and Z. Zhang, “Comateformer: Combined atten- tion transformer for semantic sentence matching,”arXiv preprint arXiv:2412.07220, 2024