arxiv: 2604.17051 · v1 · submitted 2026-04-18 · 💻 cs.CL · cs.AI

Recognition: unknown

Efficient Task Adaptation in Large Language Models via Selective Parameter Optimization

Weijie Wan , Jiangjiang Zhao

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:18 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords catastrophic forgettingparameter optimizationselective fine-tuninglarge language modelstask adaptationcore parametersLLM fine-tuning

0 comments

The pith

Freezing core parameters during fine-tuning preserves general knowledge in LLMs while allowing task-specific adaptation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models lose general capabilities when fully fine-tuned on domain tasks because updates overwrite pre-trained knowledge. This paper introduces a method to evaluate parameter importance and separate core parameters essential for broad language skills from non-core ones tuned to specific domains. By keeping core parameters fixed and updating only the rest, the approach aims to maintain overall performance while gaining expertise in areas like science and medicine. Experiments with models such as GPT-J and LLaMA-3 on relevant tasks indicate this selective strategy reduces forgetting and improves adaptability.

Core claim

The paper claims that by using a parameter importance evaluation method to identify and fix core parameters critical for general language ability, while fine-tuning only non-core parameters sensitive to specific tasks, LLMs can achieve better domain adaptation without catastrophic forgetting, as shown in tests on scientific, medical, and physical tasks using GPT-J and LLaMA-3.

What carries the argument

The parameter element importance evaluation method, which distinguishes parameters based on their contribution to general versus domain-specific tasks and enables selective freezing of the core set.

If this is right

General language understanding remains intact because core parameters are not updated.
Task performance on specific domains improves through targeted optimization of non-core parameters.
Overall model transferability increases compared to traditional full-parameter fine-tuning.
The method works across different model architectures like GPT-J and LLaMA-3.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such selective optimization could lower the risk of models losing broad utility when specialized.
Future work might explore automated ways to refine the importance evaluation for different tasks.
This division highlights that not all parameters contribute equally to model capabilities.

Load-bearing premise

Model parameters can be meaningfully divided into core and non-core categories using an importance evaluation method such that freezing the core set preserves general knowledge without harming task learning.

What would settle it

Observing that the model's accuracy on general language benchmarks drops significantly after selective fine-tuning compared to before, or that domain task performance fails to match or exceed full fine-tuning results.

Figures

Figures reproduced from arXiv: 2604.17051 by Jiangjiang Zhao, Weijie Wan.

read the original abstract

Large Language Models (LLMs) have demonstrated excellent performance in general language understanding, generation and other tasks. However, when fine-tuning for specific domain tasks, the general knowledge accumulated in the pre-training phase is often partially overwritten or forgotten due to parameter updates, which severely limits the generalization ability and transferability of LLMs. Traditional fine-tuning strategies mostly train on the entire parameter space, ignoring the heterogeneity of model parameters, that is, some parameters are extremely important for general tasks, while other parameters are more sensitive to specific tasks. To alleviate the above problems, this paper innovatively proposes a parameter element importance evaluation method, which divides parameters into "core parameters" and "non-core parameters" by distinguishing the importance of parameters for general language ability tasks and specific domain tasks, and fixes the core parameters during fine-tuning, and only fine-tunes the non-core parameters. Extensive experiments on scientific, medical and physical tasks using GPT-J and LLaMA-3 show that our method can mitigate catastrophic forgetting while enhancing the adaptability of the model.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper suggests freezing core parameters identified by importance scoring during domain fine-tuning to cut forgetting, but the abstract supplies no method details, numbers, or ablations, so the split's value over random selection stays unproven.

read the letter

The main point with this paper is that the authors claim to have a way to identify core parameters important for general language tasks and freeze them while fine-tuning only the non-core ones on domain-specific data like science or medicine. They say this helps with catastrophic forgetting on models like GPT-J and LLaMA-3. What stands out is the focus on parameter heterogeneity, which is a sensible angle given how fine-tuning can overwrite useful pre-trained knowledge. The experiments across multiple domains suggest they tried to show practical benefits, and if the method is simple to implement, it could appeal to people adapting LLMs for specialized use. The soft spots are bigger though. The abstract gives no information on how the importance evaluation works, which is central to the claim. Without details or an ablation against random freezing of equivalent parameters, it's unclear if the split is doing anything special or if the results would hold. No numbers or baseline comparisons are mentioned either, making it tough to assess the actual gains. This kind of work would interest researchers in efficient fine-tuning and domain adaptation for LLMs. A reader looking for new tricks in that space might find value if the full paper provides the missing method and evidence. I'd recommend putting it through peer review so the details can be checked properly, as the idea has potential but needs validation to be convincing.

Referee Report

3 major / 0 minor

Summary. The manuscript proposes a selective parameter optimization method for adapting LLMs to domain tasks. It introduces a parameter importance evaluation to partition model parameters into 'core' (critical for general language ability) and 'non-core' (task-sensitive) sets, then freezes the core parameters while fine-tuning only the non-core ones. The central claim is that this mitigates catastrophic forgetting of pre-trained knowledge while improving adaptability, as demonstrated in experiments on scientific, medical, and physical tasks with GPT-J and LLaMA-3.

Significance. If the importance evaluation reliably identifies a non-arbitrary split that preserves general capabilities better than alternatives, the approach could enable more efficient domain adaptation than full fine-tuning or standard parameter-efficient methods, reducing compute costs and improving retention of broad knowledge in LLMs.

major comments (3)

[Abstract and §3] Abstract and §3 (Method): The parameter importance evaluation method used to divide parameters into core and non-core sets is not described with sufficient detail (e.g., no specification of the metric, computation procedure, or thresholds), preventing assessment of whether the partition has a causal link to forgetting or task sensitivity.
[§4] §4 (Experiments): No ablation is reported comparing the proposed importance-based freezing against random selection of an equivalent fraction of parameters to update. Without this control, the experiments cannot isolate whether observed gains in retention and task performance arise from the specific evaluation or simply from updating fewer parameters overall.
[§4] §4 (Experiments): The abstract and experimental description supply no quantitative results, baselines (e.g., full fine-tuning, LoRA, or other selective methods), or error analysis, so it is impossible to evaluate the magnitude of forgetting mitigation or statistical reliability of the claims.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and commit to revisions that will strengthen the clarity, controls, and reporting in the manuscript.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method): The parameter importance evaluation method used to divide parameters into core and non-core sets is not described with sufficient detail (e.g., no specification of the metric, computation procedure, or thresholds), preventing assessment of whether the partition has a causal link to forgetting or task sensitivity.

Authors: We agree that the current description lacks sufficient detail for full reproducibility and causal assessment. In the revised manuscript we will expand §3 with the precise importance metric (parameter sensitivity measured via gradient norms on held-out general-language data versus domain-task data), the full computation procedure (including data splits, scoring formula, and aggregation across layers), and the explicit threshold or top-k selection rule used to designate core versus non-core parameters. These additions will make the link to forgetting mitigation transparent. revision: yes
Referee: [§4] §4 (Experiments): No ablation is reported comparing the proposed importance-based freezing against random selection of an equivalent fraction of parameters to update. Without this control, the experiments cannot isolate whether observed gains in retention and task performance arise from the specific evaluation or simply from updating fewer parameters overall.

Authors: We accept this criticism. The revised §4 will include a new ablation that freezes a randomly chosen set of parameters whose size matches the non-core set identified by our method. Results on the same scientific, medical, and physical tasks will be reported side-by-side with the importance-based runs, allowing readers to determine whether the observed retention and adaptation gains are attributable to the importance evaluation rather than to the mere reduction in updated parameters. revision: yes
Referee: [§4] §4 (Experiments): The abstract and experimental description supply no quantitative results, baselines (e.g., full fine-tuning, LoRA, or other selective methods), or error analysis, so it is impossible to evaluate the magnitude of forgetting mitigation or statistical reliability of the claims.

Authors: We acknowledge that the present version under-reports quantitative outcomes. The revised abstract will summarize key metrics (task accuracy gains and forgetting scores on general benchmarks). Section 4 will be expanded with full tables comparing our method against full fine-tuning, LoRA, and other selective baselines, together with mean performance, standard deviations across three random seeds, and statistical significance tests. This will allow direct assessment of the magnitude and reliability of the forgetting-mitigation effect. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method with independent experimental validation

full rationale

The paper introduces an importance evaluation method to partition parameters into core (general-language) and non-core (task-sensitive) sets, then freezes the core set during fine-tuning. This is framed as an innovative proposal whose effectiveness is demonstrated via experiments on GPT-J and LLaMA-3 across scientific, medical, and physical tasks. No equations, derivations, fitted parameters presented as predictions, or self-citation chains appear in the provided text. The central claim rests on empirical outcomes rather than any self-definitional reduction or ansatz smuggled via prior work. The method's contribution is isolated by the experiments themselves, satisfying the requirement for self-contained, non-circular support.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

The central claim depends on the postulated distinction between core and non-core parameters and the existence of an effective importance evaluation procedure, neither of which receives independent grounding or falsifiable criteria in the abstract.

invented entities (2)

core parameters no independent evidence
purpose: Parameters identified as extremely important for general language ability tasks that are fixed during fine-tuning to prevent forgetting
Introduced as a new category to distinguish from parameters more sensitive to specific domain tasks.
non-core parameters no independent evidence
purpose: Parameters more sensitive to specific domain tasks that are selectively fine-tuned
Defined as the complement to core parameters in the proposed division.

pith-pipeline@v0.9.0 · 5473 in / 1334 out tokens · 50763 ms · 2026-05-10T06:18:16.963220+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

58 extracted references · 30 canonical work pages · 8 internal anchors

[1]

Continual learning through synaptic intelligence,

F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” inInternational conference on machine learning. PMLR, 2017, pp. 3987–3995

2017
[2]

Overcoming catastrophic forgetting in neural networks.Proceedings of the National Academy of Sciences, 114(13): 3521–3526, 2017

J. Kirkpatrick, R. Pascanu, N. C. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2016. [Online]. Available: ht...

work page doi:10.1073/pnas.1611835114 2016
[3]

Language models meet world models: Embodied experiences enhance language models,

J. Xiang, T. Tao, Y . Gu, T. Shu, Z. Wang, Z. Yang, and Z. Hu, “Language models meet world models: Embodied experiences enhance language models,”Advances in neural information processing systems
[4]

Analyzing and reducing catastrophic forgetting in parameter efficient tuning.arXiv preprint arXiv:2402.18865, 2024

W. Ren, X. Li, L. Wang, T. Zhao, and W. Qin, “Analyzing and reducing catastrophic forgetting in parameter efficient tuning,”arXiv preprint arXiv:2402.18865, 2024

work page arXiv 2024
[5]

LoRA: Low-Rank Adaptation of Large Language Models

E. J. Hu, Y . Shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “Lora: Low-rank adaptation of large language models,” arXiv preprint arXiv:2106.09685, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[6]

Lora-pro: Are low-rank adapters properly optimized?

Z. Wang, J. Liang, R. He, Z. Wang, and T. Tan, “Lora-pro: Are low-rank adapters properly optimized?”arXiv preprint arXiv:2407.18242, 2024

work page arXiv 2024
[7]

arXiv preprint arXiv:2306.09782 , year=

K. Lv, Y . Yang, T. Liu, Q. Gao, Q. Guo, and X. Qiu, “Full parameter fine-tuning for large language models with limited resources,”arXiv preprint arXiv:2306.09782, 2023

work page arXiv 2023
[8]

Llama 2: Open Foundation and Fine-Tuned Chat Models

H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y . Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosaleet al., “Llama 2: Open foundation and fine-tuned chat models,”arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[10]

Learning without forgetting,

Z. Li and D. Hoiem, “Learning without forgetting,”IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 12, pp. 2935– 2947, 2017

2017
[11]

Mea- suring catastrophic forgetting in neural networks,

R. Kemker, M. McClure, A. Abitino, T. Hayes, and C. Kanan, “Mea- suring catastrophic forgetting in neural networks,” inProceedings of the AAAI conference on artificial intelligence, vol. 32, no. 1, 2018

2018
[12]

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

L. Gao, S. Biderman, S. Black, L. Golding, T. Hoppe, C. Foster, J. Phang, H. He, A. Thite, N. Nabeshimaet al., “The pile: An 800gb dataset of diverse text for language modeling,”arXiv preprint arXiv:2101.00027, 2020

work page internal anchor Pith review arXiv 2020
[13]

Medmcqa: A large- scale multi-subject multi-choice dataset for medical domain question answering,

A. Pal, L. K. Umapathi, and M. Sankarasubbu, “Medmcqa: A large- scale multi-subject multi-choice dataset for medical domain question answering,” inConference on health, inference, and learning. PMLR, 2022, pp. 248–260

2022
[14]

Crowdsourcing Multiple Choice Science Questions

J. Welbl, N. F. Liu, and M. Gardner, “Crowdsourcing multiple choice science questions,”arXiv preprint arXiv:1707.06209, 2017

work page Pith review arXiv 2017
[15]

Piqa: Reasoning about physical commonsense in natural language,

Y . Bisk, R. Zellers, J. Gao, Y . Choiet al., “Piqa: Reasoning about physical commonsense in natural language,” inProceedings of the AAAI conference on artificial intelligence, vol. 34, no. 05, 2020, pp. 7432– 7439

2020
[16]

GPT-J-6B: A 6 Billion Param- eter Autoregressive Language Model,

B. Wang and A. Komatsuzaki, “GPT-J-6B: A 6 Billion Param- eter Autoregressive Language Model,” https://github.com/kingoflolz/ mesh-transformer-jax, May 2021

2021
[17]

The Llama 3 Herd of Models

A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fanet al., “The llama 3 herd of models,”arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

A rank stabilization scaling factor for fine-tuning with lora,

D. Kalajdzievski, “A rank stabilization scaling factor for fine-tuning with lora,”arXiv preprint arXiv:2312.03732, 2023

work page arXiv 2023
[19]

Memory Aware Synapses: Learning What (Not) to Forget,

R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory Aware Synapses: Learning What (Not) to Forget,”arXiv preprint arXiv:1711.09601, 2017. [Online]. Available: https://arxiv.org/ abs/1711.09601

work page arXiv 2017
[20]

Adaptive multi-attention network incorporating answer information for duplicate question detection,

D. Liang, F. Zhang, W. Zhang, Q. Zhang, J. Fu, M. Peng, T. Gui, and X. Huang, “Adaptive multi-attention network incorporating answer information for duplicate question detection,” inProceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 95–104

2019
[21]

IMAGDressing-v1: Customizable virtual dressing,

F. Shen, X. Jiang, X. He, H. Ye, C. Wang, X. Du, Z. Li, and J. Tang, “IMAGDressing-v1: Customizable virtual dressing,”arXiv preprint arXiv:2407.12705, 2024

work page arXiv 2024
[22]

Asynchronous deep interaction network for natural language inference,

D. Liang, F. Zhang, Q. Zhang, and X.-J. Huang, “Asynchronous deep interaction network for natural language inference,” inProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 2692–2700

2019
[23]

Time-aware multiway adaptive fusion network for temporal knowledge graph ques- tion answering,

Y . Liu, D. Liang, F. Fang, S. Wang, W. Wu, and R. Jiang, “Time-aware multiway adaptive fusion network for temporal knowledge graph ques- tion answering,” inICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

2023
[24]

Boosting consistency in story visualization with rich-contextual condi- tional diffusion models,

F. Shen, H. Ye, S. Liu, J. Zhang, C. Wang, X. Han, and W. Yang, “Boosting consistency in story visualization with rich-contextual condi- tional diffusion models,”arXiv preprint arXiv:2407.02482, 2024

work page arXiv 2024
[25]

Dual path modeling for semantic matching by perceiving subtle conflicts,

C. Xue, D. Liang, S. Wang, J. Zhang, and W. Wu, “Dual path modeling for semantic matching by perceiving subtle conflicts,” inICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5

2023
[26]

Searching for optimal subword tokenization in cross- domain ner,

R. Ma, Y . Tan, X. Zhou, X. Chen, D. Liang, S. Wang, W. Wu, T. Gui, and Q. Zhang, “Searching for optimal subword tokenization in cross- domain ner,”arXiv preprint arXiv:2206.03352, 2022

work page arXiv 2022
[27]

Robust lottery tickets for pre-trained language models,

R. Zheng, R. Bao, Y . Zhou, D. Liang, S. Wang, W. Wu, T. Gui, Q. Zhang, and X. Huang, “Robust lottery tickets for pre-trained language models,” arXiv preprint arXiv:2211.03013, 2022

work page arXiv 2022
[28]

Advancing pose-guided image synthesis with progressive conditional diffusion models,

F. Shen, H. Ye, J. Zhang, C. Wang, X. Han, and W. Yang, “Advancing pose-guided image synthesis with progressive conditional diffusion models,”arXiv preprint arXiv:2310.06313, 2023

work page arXiv 2023
[29]

Improving semantic matching through dependency-enhanced pre-trained model with adaptive fusion,

J. Song, D. Liang, R. Li, Y . Li, S. Wang, M. Peng, W. Wu, and Y . Yu, “Improving semantic matching through dependency-enhanced pre-trained model with adaptive fusion,” inFindings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates: Association for Computational Linguistics, 2022, pp. 45–57. [Online]. Available: h...

2022
[30]

DABERT: Dual attention enhanced BERT for semantic matching,

S. Wang, D. Liang, J. Song, Y . Li, and W. Wu, “DABERT: Dual attention enhanced BERT for semantic matching,” inProceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Re- public of Korea: International Committee on Computational Linguistics, 2022, pp. 1645–1654. [Online]. Available: https://aclanthology.org/2022. coling-1.141

2022
[31]

Robust lottery tickets for pre-trained language models,

R. Zheng, B. Rong, Y . Zhou, D. Liang, S. Wang, W. Wu, T. Gui, Q. Zhang, and X. Huang, “Robust lottery tickets for pre-trained language models,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ire- land: Association for Computational Linguistics, 2022, pp. 2211–2224. [Online]. Avai...

2022
[32]

Cqg: A simple and effective controlled generation framework for multi- hop question generation,

Z. Fei, Q. Zhang, T. Gui, D. Liang, S. Wang, W. Wu, and X.-J. Huang, “Cqg: A simple and effective controlled generation framework for multi- hop question generation,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 6896–6906

2022
[33]

S3prompt: Instructing the model with self- calibration, self-recall and self-aggregation to improve in-context learn- ing,

J. Chen and J. Liu, “S3prompt: Instructing the model with self- calibration, self-recall and self-aggregation to improve in-context learn- ing,” inProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC- COLING 2024), 2024, pp. 14 259–14 271

2024
[34]

Local and global: Temporal question answering via information fusion

Y . Liu, D. Liang, M. Li, F. Giunchiglia, X. Li, S. Wang, W. Wu, L. Huang, X. Feng, and R. Guan, “Local and global: Temporal question answering via information fusion.” inIJCAI, 2023, pp. 5141–5149

2023
[35]

Transferring from formal newswire domain with hypernet for twitter pos tagging,

T. Gui, Q. Zhang, J. Gong, M. Peng, D. Liang, K. Ding, and X.-J. Huang, “Transferring from formal newswire domain with hypernet for twitter pos tagging,” inProceedings of the 2018 conference on empirical methods in natural language processing, 2018, pp. 2540–2549

2018
[36]

Resolving word vagueness with scenario-guided adapter for natural language inference,

Y . Liu, M. Li, D. Liang, X. Li, F. Giunchiglia, L. Huang, X. Feng, and R. Guan, “Resolving word vagueness with scenario-guided adapter for natural language inference,”arXiv preprint arXiv:2405.12434, 2024

work page arXiv 2024
[37]

arXiv preprint arXiv:2506.04065, 2025

Wu, Muling, Qian, Qi, Liu, Wenhao, Wang, Xiaohua, Huang, Zisu, Liang, Di, Miao, LI, Dou, Shihan, Lv, Changze, Wang, Zhenghua, et al.Progressive Mastery: Customized Curriculum Learning with Guided Prompting for Mathematical Reasoning. arXiv preprint arXiv:2506.04065, 2025

work page arXiv 2025
[38]

Question calibration and multi-hop modeling for temporal question answering,

C. Xue, D. Liang, P. Wang, and J. Zhang, “Question calibration and multi-hop modeling for temporal question answering,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 17, 2024, pp. 19 332–19 340

2024
[39]

ArXivabs/2408.09174(2024),https: //api.semanticscholar.org/CorpusID:2719028395, 6

X. Wu, J. Yang, L. Chai, G. Zhang, J. Liu, X. Du, D. Liang, D. Shu, X. Cheng, T. Sunet al., “Tablebench: A comprehensive and complex benchmark for table question answering,”arXiv preprint arXiv:2408.09174, 2024

work page arXiv 2024
[40]

Hope: Hyperbolic rotary positional encoding for stable long-range dependency modeling in large language models.arXiv preprint arXiv:2509.05218, 2025

Chang Dai, Hongyu Shan, Mingyang Song, and Di Liang. Hope: Hyperbolic rotary positional encoding for stable long-range dependency modeling in large language models.arXiv preprint arXiv:2509.05218, 2025

work page arXiv 2025
[41]

Decorl: Decoupling reasoning chains via parallel sub-step generation and cascaded reinforcement for interpretable and scalable rlhf

Ziyuan Gao, Di Liang, Xianjie Wu, Philippe Morel, and Minlong Peng. Decorl: Decoupling reasoning chains via parallel sub-step generation and cascaded reinforcement for interpretable and scalable rlhf. InPro- ceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 30789–30797, 2026

2026
[42]

Liang Li, Qisheng Liao, Meiting Lai, Di Liang, and Shangsong Liang

Junchen Li, Chao Qi, Rongzheng Wang, Qizhi Chen, Liang Xu, Di Liang, Bob Simons, and Shuang Liang. When safety becomes a vulnerability: Exploiting llm alignment homogeneity for transferable blocking in rag.arXiv preprint arXiv:2603.03919, 2026

work page arXiv 2026
[43]

Parameter importance is not static: Evolving parameter isolation for supervised fine-tuning, 2026

Zekai Lin, Chao Xue, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Lei Jiang, Yu Lu, Haibo Shi, Shuang Liang, and Minlong Peng. Parameter importance is not static: Evolving parameter isolation for supervised fine-tuning, 2026

2026
[44]

Who stole your data? a method for detecting unauthorized rag theft.arXiv preprint arXiv:2510.07728, 2025

Peiyang Liu, Ziqiang Cui, Di Liang, and Wei Ye. Who stole your data? a method for detecting unauthorized rag theft.arXiv preprint arXiv:2510.07728, 2025

work page arXiv 2025
[45]

Dpi: Exploiting parameter heterogeneity for interference-free fine-tuning.arXiv preprint arXiv:2601.17777, 2026

Xiaoyu Liu, Xiaoyu Guan, Di Liang, and Xianjie Wu. Dpi: Exploiting parameter heterogeneity for interference-free fine-tuning.arXiv preprint arXiv:2601.17777, 2026

work page arXiv 2026
[46]

Structural reward model: Enhancing interpretability, efficiency, and scalability in reward modeling

Xiaoyu Liu, Di Liang, Hongyu Shan, Peiyang Liu, Yonghao Liu, Muling Wu, Yuntao Li, Xianjie Wu, Li Miao, Jiangrong Shen, et al. Structural reward model: Enhancing interpretability, efficiency, and scalability in reward modeling. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track, pages 672– 685, 2025

2025
[47]

Adaptive curriculum strategies: Stabilizing reinforcement learning for large language models

Qi Qian, Muling Wu, Zisu Huang, Wenhao Liu, Changze Lv, Xiaohua Wang, Zhenghua Wang, Zhengkang Guo, Zhibo Xu, Lina Chen, et al. Adaptive curriculum strategies: Stabilizing reinforcement learning for large language models
[48]

Rethinking LLM-Driven Heuristic Design: Generating Efficient and Specialized Solvers via Dynamics-Aware Optimization

Rongzheng Wang, Yihong Huang, Muquan Li, Jiakai Li, Di Liang, Bob Simons, Pei Ke, Shuang Liang, and Ke Qin. Rethinking llm- driven heuristic design: Generating efficient and specialized solvers via dynamics-aware optimization.arXiv preprint arXiv:2601.20868, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[49]

Not all parameters are created equal: Smart isolation boosts fine-tuning performance.arXiv preprint arXiv:2508.21741, 2025

Yao Wang, Di Liang, and Minlong Peng. Not all parameters are created equal: Smart isolation boosts fine-tuning performance.arXiv preprint arXiv:2508.21741, 2025

work page arXiv 2025
[50]

Breaking size barrier: Enhancing reasoning for large-size table question answering

Xianjie Wu, Di Liang, Jian Yang, Xianfu Cheng, LinZheng Chai, Tongliang Li, Liqun Yang, and Zhoujun Li. Breaking size barrier: Enhancing reasoning for large-size table question answering. InInter- national Conference on Database Systems for Advanced Applications, pages 241–256. Springer, 2025

2025
[51]

Mmtablebench: A multi-level multimodal benchmark for reasoning and layout complexity in table qa

Xianjie Wu, Xiaohang Xu, Tingyu Jiang, Jian Yang, Di Liang, Xianfu Cheng, Zhenhe Wu, Linzheng Chai, Wei Zhang, Jiaheng Liu, et al. Mmtablebench: A multi-level multimodal benchmark for reasoning and layout complexity in table qa. InProceedings of the ACM Web Conference 2026, pages 3881–3892, 2026

2026
[52]

Reason Only When Needed: Efficient Generative Reward Modeling via Model-Internal Uncertainty

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, et al. Reason only when needed: Efficient generative reward modeling via model-internal uncertainty.arXiv preprint arXiv:2604.10072, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[53]

Why Supervised Fine-Tuning Fails to Learn: A Systematic Study of Incomplete Learning in Large Language Models

Chao Xue, Yao Wang, Mengqiao Liu, Di Liang, Xingsheng Han, Peiyang Liu, Xianjie Wu, Chenyao Lu, Lei Jiang, Yu Lu, et al. Why supervised fine-tuning fails to learn: A systematic study of incomplete learning in large language models.arXiv preprint arXiv:2604.10079, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[54]

Long-term talkingface generation via motion-prior conditional diffusion model,

F. Shen, C. Wang, J. Gao, Q. Guo, J. Dang, J. Tang, and T.-S. Chua, “Long-term talkingface generation via motion-prior conditional diffusion model,”arXiv preprint arXiv:2502.09533, 2025

work page arXiv 2025
[55]

IMAGPose: A unified conditional framework for pose-guided person generation,

F. Shen and J. Tang, “IMAGPose: A unified conditional framework for pose-guided person generation,” inProc. 38th Conf. on Neural Information Processing Systems (NeurIPS), 2024

2024
[56]

Local and global: Text matching via syntax graph calibration,

L. Li, Q. Liao, M. Lai, D. Liang, and S. Liang, “Local and global: Text matching via syntax graph calibration,” inICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2024, pp. 11 571–11 575

2024
[57]

Unleashing potential of evidence in knowledge-intensive dialogue generation,

X. Wu, J. Yang, T. Li, S. Zhang, Y . Du, L. Chai, D. Liang, and Z. Li, “Unleashing potential of evidence in knowledge-intensive dialogue generation,” inICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2025, pp. 1–5

2025
[58]

Comateformer: Combined atten- tion transformer for semantic sentence matching,

B. Li, D. Liang, and Z. Zhang, “Comateformer: Combined atten- tion transformer for semantic sentence matching,”arXiv preprint arXiv:2412.07220, 2024

work page arXiv 2024