Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

Bing Wang; Changchun Li; Gang Niu; Jinjin Chi; Masashi Sugiyama; Ximing Li

arxiv: 2605.05676 · v1 · submitted 2026-05-07 · 💻 cs.CL · cs.AI

Decomposing the Basic Abilities of Large Language Models: Mitigating Cross-Task Interference in Multi-Task Instruct-Tuning

Bing Wang , Ximing Li , Changchun Li , Jinjin Chi , Gang Niu , Masashi Sugiyama This is my paper

Pith reviewed 2026-05-08 11:09 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords large language modelsmulti-task instruct-tuningcross-task interferenceLoRA expertsbasic abilities decompositionorthogonal parametersspherical clustering

0 comments

The pith

Large language models can be decomposed into orthogonal basic abilities that tasks combine linearly, reducing cross-task interference in multi-task training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that multi-task instruct-tuning of large language models suffers from cross-task interference even when using task-specific parameters, because many parameters remain shared across tasks. It shows that certain parameters are consistently co-activated and naturally organize into base groups that behave like orthogonal basic abilities. By decomposing model parameters into high-singular-value LoRA experts for these abilities and enforcing orthogonality during training via spherical clustering of rank-1 components, the proposed BADIT method reduces interference. A sympathetic reader would care because this yields better performance on multi-task benchmarks than prior isolation approaches without requiring fully separate parameters for every task.

Core claim

LLMs encode several orthogonal basic abilities, and any task can be represented as a linear combination of these abilities. BADIT decomposes LLM parameters into orthogonal high-singular-value LoRA experts representing basic abilities and dynamically enforces their orthogonality during training via spherical clustering of rank-1 components. This approach outperforms state-of-the-art methods on the SuperNI benchmark with six different LLMs by mitigating the degree of cross-task interference.

What carries the argument

BADIT decomposes LLM parameters into orthogonal high-singular-value LoRA experts representing basic abilities and dynamically enforces their orthogonality during training via spherical clustering of rank-1 components.

Load-bearing premise

Parameters that are consistently co-activated across tasks naturally organize into orthogonal base groups representing basic abilities that any task can be expressed as a linear combination of.

What would settle it

If the decomposed LoRA experts fail to stay orthogonal after training or if measured gradient conflicts between tasks do not decrease while performance fails to exceed baselines on SuperNI, the claim that the decomposition mitigates cross-task interference would be falsified.

Figures

Figures reproduced from arXiv: 2605.05676 by Bing Wang, Changchun Li, Gang Niu, Jinjin Chi, Masashi Sugiyama, Ximing Li.

**Figure 1.** Figure 1: Shared task-specific neurons and parameters across different tasks. 1 2 3 4 5 6 7 8 task1590 task875 task511 task1572 task591 task002 task639 task748 task1290 task1510 task363 task181 task1687 task1729 task073 Expert IDs Llama3-3B Qwen3-4B Gemma2-2B 1 2 3 4 5 6 7 8 Expert IDs 1 2 3 4 5 6 7 8 Expert IDs view at source ↗

**Figure 2.** Figure 2: Activated experts in MoE-based LLMs by different tasks. subplots display parameter-level activations, where each column represents a specific task and indicates the number of activated parameters in the corresponding row of the parameter matrix for that task. Additionally, view at source ↗

**Figure 3.** Figure 3: Comparison with prior multi-task instruct-tuning methods. We decouple key LLM parameters into basic abilities (i.e., LoRA experts) and dynamically group their rank-1 components, enforcing orthogonality to represent distinct basic abilities. in Eq. (1), we freeze the residual weights Wc and solely update the LoRA parameters. To further encourage each LoRA expert to specialize in a different basic ability d… view at source ↗

**Figure 4.** Figure 4: Intra- and inter-expert gradient angles across epochs. during training. Second, the DOG stage introduces extra overhead because it performs spherical K-means clustering and integer optimization on parameter gradients, which are computationally intensive operations primarily executed on CPUs. Although our approach entails a modest increase in training time, the resulting substantial gains in model performa… view at source ↗

**Figure 5.** Figure 5: Shared task-specific neurons of MLP gate layers across different tasks. seed to the values {1, 2, 3, 4, 5} to ensure reproducibility. G.5. Evaluation Metrics Following standard practice in previous literature (LopezPaz & Ranzato, 2017; Zhao et al., 2024b), we adopt the following four metrics to comprehensively assess model performance: • ROUGE measures the overall effectiveness after learning the entire … view at source ↗

**Figure 6.** Figure 6: Shared task-specific parameters of MLP gate layers across different tasks. task1590 task875 task511 task1572 task591 task002 task639 task748 task1290 task1510 task363 task181 task1687 task1729 task073 16 528 1040 1552 2064 2576 3088 Parameter Row IDs - Average task1590 task875 task511 task1572 task591 task002 task639 task748 task1290 task1510 task363 task181 task1687 task1729 task073 16 528 1040 1552 2064 … view at source ↗

**Figure 7.** Figure 7: Shared task-specific parameters of query matrices in self-attention layers across different tasks. 19 view at source ↗

**Figure 8.** Figure 8: Shared task-specific parameters of key matrices in self-attention layers across different tasks. task1590 task875 task511 task1572 task591 task002 task639 task748 task1290 task1510 task363 task181 task1687 task1729 task073 16 272 528 784 1040 Parameter Row IDs - Average task1590 task875 task511 task1572 task591 task002 task639 task748 task1290 task1510 task363 task181 task1687 task1729 task073 16 272 528 7… view at source ↗

**Figure 9.** Figure 9: Shared task-specific parameters of value matrices in self-attention layers across different tasks. gradient angles, sensitivity analysis, and performance across different tasks. H.1. More Investigation Results In this section, we employ the same experimental setup as Sec. 2 to investigate neuron and parameter sharing behaviors across different tasks in a broader range of modules. Specifically, we conduct … view at source ↗

**Figure 10.** Figure 10: The number of parameters that activate each parameter column with positive or negative gradients across all tasks, respectively. we identify systematic shifts in activation patterns from lower to higher layers. Specifically, activated parameters tend to migrate from higher to lower parameter IDs as depth increases, for instance, in the query matrix of Llama3-3B in view at source ↗

**Figure 11.** Figure 11: Intra- and inter-expert gradient angles of BADIT on 6 LLMs across epochs. proximate orthogonality. In some cases, such as Qwen3-4B under Task Order B, the inter-expert angles remain consistently clustered around 90 degrees throughout training. Meanwhile, intra-expert gradient angles are consistently kept at relatively small values. These results collectively demonstrate the effectiveness of our DOG metho… view at source ↗

**Figure 12.** Figure 12: Sensitivity analysis of the parameters K and r. anatomy clinical_knowledge college_biology college_mathematics college_physics electrical_engineering global_facts high_school_european_history high_school_government_and_politics high_school_mathematics high_school_microeconomics high_school_psychology high_school_statistics high_school_us_history international_law marketing philosophy professional_accounti… view at source ↗

**Figure 13.** Figure 13: Visualizing basic abilities. H.4. Visualizing Basic Abilities To further explore whether these decomposed LoRAs represent distinct skills, we conduct a toy experiment on SuperNI (K=8, top-4 experts activated) and evaluate expert activation on 20 MMLU tasks using our proposed BADIT. Their activated patterns are shown in view at source ↗

read the original abstract

Recently, the prominent performance of large language models (LLMs) has been largely driven by multi-task instruct-tuning. Unfortunately, this training paradigm suffers from a key issue, named cross-task interference, due to conflicting gradients over shared parameters among different tasks. Some previous methods mitigate this issue by isolating task-specific parameters, e.g., task-specific neuron selection and mixture-of-experts. In this paper, we empirically reveal that the cross-task interference still exists for the existing solutions because of many parameters also shared by different tasks, and accordingly, we propose a novel solution, namely Basic Abilities Decomposition for multi-task Instruct-Tuning (BADIT). Specifically, we empirically find that certain parameters are consistently co-activated, and that co-activated parameters naturally organize into base groups. This motivates us to analogize that LLMs encode several orthogonal basic abilities, and that any task can be represented as a linear combination of these abilities. Accordingly, we propose BADIT that decomposes LLM parameters into orthogonal high-singular-value LoRA experts representing basic abilities, and dynamically enforces their orthogonality during training via spherical clustering of rank-1 components. We conduct extensive experiments on the SuperNI benchmark with 6 LLMs, and empirical results demonstrate that BADIT can outperform SOTA methods and mitigate the degree of cross-task interference.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BADIT enforces orthogonality on high-singular-value LoRA experts via spherical clustering to cut cross-task interference, but the gains may come from regularization rather than any discovered basic-abilities structure.

read the letter

The main thing to know is that this paper takes LoRA adapters, isolates their high-singular-value rank-1 pieces, and adds a spherical clustering step during training to keep those pieces orthogonal. They frame the result as a decomposition into basic abilities that any task combines linearly, and they test it on SuperNI across six LLMs, claiming it beats prior task-specific and MoE-style baselines while lowering interference.

Referee Report

3 major / 2 minor

Summary. The paper claims that multi-task instruct-tuning of LLMs suffers from cross-task interference due to shared parameters, and proposes BADIT to mitigate it by empirically identifying consistently co-activated parameters that organize into base groups. This motivates an analogy to orthogonal basic abilities, with any task as a linear combination thereof. BADIT decomposes parameters into orthogonal high-singular-value LoRA experts via spherical clustering of rank-1 components during training, and reports outperformance over SOTA methods on the SuperNI benchmark across 6 LLMs while reducing interference.

Significance. If the results hold and the orthogonality enforcement demonstrably isolates basic abilities rather than acting as generic regularization, the work could provide a mechanistic explanation for interference and a practical decomposition technique for multi-task tuning. The empirical co-activation observation and use of LoRA experts are concrete strengths, but the load-bearing assumption that co-activation patterns inherently yield orthogonal linear decompositions requires stronger isolation to elevate the contribution beyond existing MoE-style adapters.

major comments (3)

[Abstract, §3] Abstract and §3 (method): The central claim that 'co-activated parameters naturally organize into base groups' and motivate orthogonal basic abilities is presented as an empirical discovery, yet BADIT procedurally enforces orthogonality via spherical clustering on high-singular-value LoRA rank-1 components. Without an ablation that compares the full pipeline against a non-orthogonal clustering baseline or standard LoRA-MoE (to isolate whether interference reduction stems from the basic-abilities structure versus added capacity or the clustering loss), the results on SuperNI cannot confirm the linear-combination view or rule out alternative explanations.
[§4] §4 (experiments): The reported outperformance and interference mitigation on SuperNI lack details on error bars, statistical significance across the 6 LLMs, or held-out task-composition tests that would verify whether tasks are indeed linear combinations of the learned orthogonal experts. If the number of basic abilities is a free hyperparameter (as implied by the clustering step), the method's advantage over prior task-specific neuron or MoE approaches may not generalize beyond the tuned setting.
[§2, §3.2] §2 (related work) and §3.2: The distinction from prior isolation methods (task-specific neurons, MoE) is that BADIT targets shared parameters via co-activation clustering, but the paper does not quantify residual interference after decomposition (e.g., via gradient conflict metrics before/after) or show that the high-singular-value experts are uniquely interpretable as 'basic abilities' rather than just low-rank adapters.

minor comments (2)

[§3.3] Notation for the spherical clustering objective and the dynamic orthogonality enforcement should be formalized with an equation, as the procedural description leaves the exact loss term and rank-1 component extraction ambiguous.
[§4, Tables] Figure captions and tables reporting SuperNI results should include the exact number of basic abilities used per model and any sensitivity analysis, to aid reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We have revised the manuscript to address the concerns about empirical validation, statistical rigor, and clearer distinctions from prior work, while maintaining the core contributions of the co-activation observation and orthogonal decomposition.

read point-by-point responses

Referee: [Abstract, §3] Abstract and §3 (method): The central claim that 'co-activated parameters naturally organize into base groups' and motivate orthogonal basic abilities is presented as an empirical discovery, yet BADIT procedurally enforces orthogonality via spherical clustering on high-singular-value LoRA rank-1 components. Without an ablation that compares the full pipeline against a non-orthogonal clustering baseline or standard LoRA-MoE (to isolate whether interference reduction stems from the basic-abilities structure versus added capacity or the clustering loss), the results on SuperNI cannot confirm the linear-combination view or rule out alternative explanations.

Authors: We agree that the presentation of the co-activation finding as directly motivating the orthogonal model requires stronger isolation of effects. The co-activation patterns are an empirical observation from our analysis of shared parameters, but orthogonality is enforced to operationalize the linear-combination hypothesis. In the revised version, we have added an ablation in §4 comparing the full BADIT pipeline to (i) a non-orthogonal k-means clustering baseline on the same rank-1 components and (ii) a standard LoRA-MoE without spherical clustering. These show that the orthogonality constraint yields further reductions in gradient conflicts and higher SuperNI scores than either baseline, indicating benefits beyond capacity or generic clustering. We have also revised the abstract and §3 to clarify this distinction between the empirical observation and the modeling choice. revision: yes
Referee: [§4] §4 (experiments): The reported outperformance and interference mitigation on SuperNI lack details on error bars, statistical significance across the 6 LLMs, or held-out task-composition tests that would verify whether tasks are indeed linear combinations of the learned orthogonal experts. If the number of basic abilities is a free hyperparameter (as implied by the clustering step), the method's advantage over prior task-specific neuron or MoE approaches may not generalize beyond the tuned setting.

Authors: We accept that the experimental section would benefit from greater statistical detail and generalization checks. The revised §4 now reports error bars as standard deviation over three random seeds for all main results on the six LLMs, along with paired t-test p-values confirming statistical significance of the improvements. We have also added held-out task-composition experiments: we hold out certain ability combinations during clustering, then evaluate on newly composed tasks and show that performance is consistent with linear recombination of the experts. The number of basic abilities is not a free hyperparameter but is determined by the spherical clustering on observed co-activation patterns; we include a sensitivity analysis demonstrating robustness across 8–20 clusters and fair comparisons to prior methods under matched settings. revision: yes
Referee: [§2, §3.2] §2 (related work) and §3.2: The distinction from prior isolation methods (task-specific neurons, MoE) is that BADIT targets shared parameters via co-activation clustering, but the paper does not quantify residual interference after decomposition (e.g., via gradient conflict metrics before/after) or show that the high-singular-value experts are uniquely interpretable as 'basic abilities' rather than just low-rank adapters.

Authors: We have expanded §2 to emphasize that BADIT operates on consistently shared parameters identified via cross-task co-activation, in contrast to per-task neuron selection or task-specific expert routing. In the updated §3.2 and new results in §4, we now report quantitative residual interference via average cosine similarity of task gradients before versus after decomposition, showing a clear reduction. For interpretability, we added activation heatmaps and qualitative case studies illustrating that the high-singular-value experts align with distinct basic abilities (e.g., one expert activates strongly on arithmetic tasks while another on knowledge-retrieval tasks). While we cannot claim absolute uniqueness of the decomposition, the empirical patterns support the basic-abilities analogy beyond generic low-rank adapters. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper motivates BADIT from an empirical observation that co-activated parameters organize into base groups, analogizing this to orthogonal basic abilities with tasks as linear combinations. It then defines a procedural method using high-singular-value LoRA experts and spherical clustering to enforce orthogonality during training. This is not self-definitional or a fitted prediction by construction; the central claims are evaluated via experiments on SuperNI with multiple LLMs rather than reducing tautologically to inputs or self-citations. No load-bearing self-citations, uniqueness theorems, or ansatzes from prior author work appear in the text. The approach remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven interpretation that co-activated parameters form orthogonal basic abilities; this is postulated rather than derived from first principles or external benchmarks.

free parameters (1)

number of basic abilities / LoRA experts
The decomposition requires choosing or fitting the number of orthogonal components; this choice directly affects the representation of tasks as linear combinations.

axioms (1)

domain assumption Co-activated parameters organize into orthogonal base groups that represent independent basic abilities
Invoked to justify the decomposition step; appears in the motivation paragraph of the abstract.

invented entities (1)

basic abilities no independent evidence
purpose: To serve as orthogonal building blocks that tasks combine linearly
New conceptual entity introduced to explain the co-activation observation; no independent falsifiable handle provided.

pith-pipeline@v0.9.0 · 5553 in / 1383 out tokens · 50261 ms · 2026-05-08T11:09:06.763070+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

92 extracted references · 92 canonical work pages

[1]

Annual Meeting of the Association for Computational Linguistics , pages =

Shihan Dou and Enyu Zhou and Yan Liu and Songyang Gao and Wei Shen and Limao Xiong and Yuhao Zhou and Xiao Wang and Zhiheng Xi and Xiaoran Fan and Shiliang Pu and Jiang Zhu and Rui Zheng and Tao Gui and Qi Zhang and Xuanjing Huang , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[2]

CoRR , volume =

Dengchun Li and Yingzi Ma and Naizheng Wang and Zhiyuan Cheng and Lei Duan and Jie Zuo and Cal Yang and Mingjie Tang , title =. CoRR , volume =

work page
[3]

2024 , url =

Llama 3 Model Card , author=. 2024 , url =

work page 2024
[4]

Advances in Neural Information Processing Systems , year =

Rui Pan and Xiang Liu and Shizhe Diao and Renjie Pi and Jipeng Zhang and Chi Han and Tong Zhang , title =. Advances in Neural Information Processing Systems , year =

work page
[5]

Advances in Neural Information Processing Systems , year =

Fanxu Meng and Zhaohui Wang and Muhan Zhang , title =. Advances in Neural Information Processing Systems , year =

work page
[6]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. International Conference on Learning Representations , year =

work page
[7]

International Conference on Learning Representations , year =

Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt , title =. International Conference on Learning Representations , year =

work page
[8]

Findings of the Association for Computational Linguistics,

Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin , title =. Findings of the Association for Computational Linguistics,

work page
[9]

Annual Meeting of the Association for Computational Linguistics , pages =

Zahra Fatemi and Chen Xing and Wenhao Liu and Caiming Xiong , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[10]

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models , booktitle =

Ze. Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models , booktitle =

work page
[11]

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts , booktitle =

Shwai He and Run. Merging Experts into One: Improving Computational Efficiency of Mixture of Experts , booktitle =

work page
[12]

Psychology of Learning and Motivation-Advances in Research and Theory , volume=

Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , author=. Psychology of Learning and Motivation-Advances in Research and Theory , volume=

work page
[13]

Le and Geoffrey E

Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. International Conference on Learning Representations , year =

work page
[14]

CoRR , volume =

Yun Luo and Zhen Yang and Fandong Meng and Yafu Li and Jie Zhou and Yue Zhang , title =. CoRR , volume =

work page
[15]

International Conference on Machine Learning , year =

Xisen Jin and Xiang Ren , title =. International Conference on Machine Learning , year =

work page
[16]

Annual Meeting of the Association for Computational Linguistics , pages =

Zhicheng Wang and Yufang Liu and Tao Ji and Xiaoling Wang and Yuanbin Wu and Congcong Jiang and Ye Chao and Zhencong Han and Ling Wang and Xu Shao and Wenqiu Zeng , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[17]

Findings of the Association for Computational Linguistics:

Xiao Wang and Tianze Chen and Qiming Ge and Han Xia and Rong Bao and Rui Zheng and Qi Zhang and Tao Gui and Xuanjing Huang , title =. Findings of the Association for Computational Linguistics:

work page
[18]

Empirical Methods in Natural Language Processing , pages =

Thomas Scialom and Tuhin Chakrabarty and Smaranda Muresan , title =. Empirical Methods in Natural Language Processing , pages =

work page
[19]

Advances in Neural Information Processing Systems , pages =

Hanul Shin and Jung Kwon Lee and Jaehong Kim and Jiwon Kim , title =. Advances in Neural Information Processing Systems , pages =

work page
[20]

Liyuan Wang and Xingxing Zhang and Hang Su and Jun Zhu , title =

work page
[21]

Haizhou Shi and Zihao Xu and Hengyi Wang and Weiyi Qin and Wenyuan Wang and Yibin Wang and Zifeng Wang and Sayna Ebrahimi and Hao Wang , title =

work page
[22]

International Conference on Artificial Intelligence and Statistics , volume =

Mehrdad Farajtabar and Navid Azizan and Alex Mott and Ang Li , title =. International Conference on Artificial Intelligence and Statistics , volume =

work page
[23]

Nature Machine Intelligence , volume =

Guanxiong Zeng and Yang Chen and Bo Cui and Shan Yu , title =. Nature Machine Intelligence , volume =

work page
[24]

Advances in Neural Information Processing Systems , year =

Linglan Zhao and Xuerui Zhang and Ke Yan and Shouhong Ding and Weiran Huang , title =. Advances in Neural Information Processing Systems , year =

work page
[25]

Continual Learning with Pre-Trained Models:

Da. Continual Learning with Pre-Trained Models:. International Joint Conference on Artificial Intelligence , pages =

work page
[26]

International Conference on Computational Linguistics , pages =

Wenfeng Feng and Chuzhan Hao and Yuewei Zhang and Yu Han and Hao Wang , title =. International Conference on Computational Linguistics , pages =

work page
[27]

Conference on Empirical Methods in Natural Language Processing , pages =

Yufei Ma and Zihan Liang and Huangyu Dai and Ben Chen and Dehong Gao and Zhuoran Ran and Zihan Wang and Linbo Jin and Wen Jiang and Guannan Zhang and Xiaoyan Cai and Libin Yang , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page
[28]

International Conference on Computational Linguistics , pages =

Bing Wang and Liang Ding and Qihuang Zhong and Ximing Li and Dacheng Tao , title =. International Conference on Computational Linguistics , pages =

work page
[29]

International

Qidong Liu and Xian Wu and Xiangyu Zhao and Yuanshao Zhu and Derong Xu and Feng Tian and Yefeng Zheng , title =. International

work page
[30]

Advances in Neural Information Processing Systems , year =

Rajarshi Saha and Naomi Sagan and Varun Srivastava and Andrea Goldsmith and Mert Pilanci , title =. Advances in Neural Information Processing Systems , year =

work page
[31]

Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , title =

work page
[32]

Annual Meeting of the Association for Computational Linguistics , pages =

Jianheng Huang and Leyang Cui and Ante Wang and Chengyi Yang and Xinting Liao and Linfeng Song and Junfeng Yao and Jinsong Su , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[33]

Findings of the Association for Computational Linguistics:

Hongyu Li and Liang Ding and Meng Fang and Dacheng Tao , title =. Findings of the Association for Computational Linguistics:

work page
[34]

Conference of the North American Chapter of the Association for Computational Linguistics , pages =

Yifan Wang and Yafei Liu and Chufan Shi and Haoling Li and Chen Chen and Haonan Lu and Yujiu Yang , title =. Conference of the North American Chapter of the Association for Computational Linguistics , pages =

work page
[35]

Advances in Neural Information Processing Systems , year =

Yibo Yang and Xiaojie Li and Zhongzhu Zhou and Shuaiwen Song and Jianlong Wu and Liqiang Nie and Bernard Ghanem , title =. Advances in Neural Information Processing Systems , year =

work page
[36]

International Conference on Learning Representations , year =

Gangwei Jiang and Caigao Jiang and Zhaoyi Li and Siqiao Xue and Jun Zhou and Linqi Song and Defu Lian and Yin Wei , title =. International Conference on Learning Representations , year =

work page
[37]

International Conference on Machine Learning , volume =

Joel Jang and Seungone Kim and Seonghyeon Ye and Doyoung Kim and Lajanugen Logeswaran and Moontae Lee and Kyungjae Lee and Minjoon Seo , title =. International Conference on Machine Learning , volume =

work page
[38]

Annual Meeting of the Association for Computational Linguistics , pages =

Weixiang Zhao and Shilong Wang and Yulin Hu and Yanyan Zhao and Bing Qin and Xuanyu Zhang and Qing Yang and Dongliang Xu and Wanxiang Che , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[39]

Annual Meeting of the Association for Computational Linguistics , pages =

Xiang Lisa Li and Percy Liang , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[40]

DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =

Shih. DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =

work page
[41]

Advances in Neural Information Processing Systems , year =

Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer , title =. Advances in Neural Information Processing Systems , year =

work page
[42]

Albert Q. Jiang and Alexandre Sablayrolles and Antoine Roux and Arthur Mensch and Blanche Savary and Chris Bamford and Devendra Singh Chaplot and Diego de Las Casas and Emma Bou Hanna and Florian Bressand and others , title =. CoRR , volume =

work page
[43]

CoRR , volume =

Aixin Liu and Bei Feng and Bing Xue and Bingxuan Wang and Bochao Wu and Chengda Lu and Chenggang Zhao and Chengqi Deng and Chenyu Zhang and Chong Ruan and others , title =. CoRR , volume =

work page
[44]

Annual Meeting of the Association for Computational Linguistics , pages =

Yang Luo and Xiaozhe Ren and Zangwei Zheng and Zhuo Jiang and Xin Jiang and Yang You , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[45]

Annual Meeting of the Association for Computational Linguistics , pages =

Quzhe Huang and Zhenwei An and Nan Zhuang and Mingxu Tao and Chen Zhang and Yang Jin and Kun Xu and Liwei Chen and Songfang Huang and Yansong Feng , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[46]

Chi , title =

Jiaqi Ma and Zhe Zhao and Xinyang Yi and Jilin Chen and Lichan Hong and Ed H. Chi , title =

work page
[47]

International Conference on Learning Representations , year =

Zeren Chen and Ziqin Wang and Zhen Wang and Huayang Liu and Zhenfei Yin and Si Liu and Lu Sheng and Wanli Ouyang and Jing Shao , title =. International Conference on Learning Representations , year =

work page
[48]

Lillicrap and Gregory Wayne , title =

David Rolnick and Arun Ahuja and Jonathan Schwarz and Timothy P. Lillicrap and Gregory Wayne , title =. Advances in Neural Information Processing Systems , pages =

work page
[49]

Gradient Episodic Memory for Continual Learning , booktitle =

David Lopez. Gradient Episodic Memory for Continual Learning , booktitle =

work page
[50]

Advances in Neural Information Processing Systems , pages =

Cyprien de Masson d'Autume and Sebastian Ruder and Lingpeng Kong and Dani Yogatama , title =. Advances in Neural Information Processing Systems , pages =

work page
[51]

International Conference on Learning Representations , year =

Jaehong Yoon and Divyam Madaan and Eunho Yang and Sung Ju Hwang , title =. International Conference on Learning Representations , year =

work page
[52]

Online Continual Learning with Maximal Interfered Retrieval , booktitle =

Rahaf Aljundi and Eugene Belilovsky and Tinne Tuytelaars and Laurent Charlin and Massimo Caccia and Min Lin and Lucas Page. Online Continual Learning with Maximal Interfered Retrieval , booktitle =

work page
[53]

Liyuan Wang and Bo Lei and Qian Li and Hang Su and Jun Zhu and Yi Zhong , title =

work page
[54]

Bagdanov and Shangling Jui and Joost van de Weijer , title =

Xialei Liu and Chenshen Wu and Mikel Menta and Luis Herranz and Bogdan Raducanu and Andrew D. Bagdanov and Shangling Jui and Joost van de Weijer , title =

work page
[55]

CoRR , volume =

Bartosz Cywinski and Kamil Deja and Tomasz Trzcinski and Bartlomiej Twardowski and Lukasz Kucinski , title =. CoRR , volume =

work page
[56]

International Conference on Learning Representations , year =

Fan. International Conference on Learning Representations , year =

work page
[57]

International Conference on Learning Representations , year =

Gobinda Saha and Isha Garg and Kaushik Roy , title =. International Conference on Learning Representations , year =

work page
[58]

Advances in Neural Information Processing Systems , year =

Ang Bian and Wei Li and Hangjie Yuan and Chengrong Yu and Mang Wang and Zixiang Zhao and Aojun Lu and Pengliang Ji and Tao Feng , title =. Advances in Neural Information Processing Systems , year =

work page
[59]

International Conference on Machine Learning , year =

Chenghao Fan and Zhenyi Lu and Sichen Liu and Chengfeng Gu and Xiaoye Qu and Wei Wei and Yu Cheng , title =. International Conference on Machine Learning , year =

work page
[60]

Pattern Recognition , volume =

Cheng Chen and Lianli Gao and Pengpeng Zeng and Mingsheng Cao and Heng Tao Shen , title =. Pattern Recognition , volume =

work page
[61]

Vechev and Kristina Toutanova , title =

Anton Alexandrov and Veselin Raychev and Mark Mueller and Ce Zhang and Martin T. Vechev and Kristina Toutanova , title =. Findings of the Association for Computational Linguistics:

work page
[62]

Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics , pages =

Hanqing Wang and Yixia Li and Shuo Wang and Guanhua Chen and Yun Chen , title =. Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics , pages =

work page
[63]

International Conference on Machine Learning , year =

Jiawei Zhao and Zhenyu Zhang and Beidi Chen and Zhangyang Wang and Anima Anandkumar and Yuandong Tian , title =. International Conference on Machine Learning , year =

work page
[64]

Gomez and Lukasz Kaiser and Illia Polosukhin , title =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , title =. Advances in Neural Information Processing Systems , pages =

work page
[65]

Advances in Neural Information Processing Systems , year =

Sen Lin and Li Yang and Deliang Fan and Junshan Zhang , title =. Advances in Neural Information Processing Systems , year =

work page
[66]

International Conference on Learning Representations , year =

Sen Lin and Li Yang and Deliang Fan and Junshan Zhang , title =. International Conference on Learning Representations , year =

work page
[67]

Conference on Empirical Methods in Natural Language Processing , pages =

Yizhong Wang and Swaroop Mishra and Pegah Alipoormolabashi and Yeganeh Kordi and Amirreza Mirzaei and Atharva Naik and Arjun Ashok and Arut Selvan Dhanasekaran and Anjana Arunkumar and David Stap and others , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page
[68]

Liu and Ana Marasovic and Noah A

Pradeep Dasigi and Nelson F. Liu and Ana Marasovic and Noah A. Smith and Matt Gardner , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page
[69]

International Conference on Learning Representations , year =

Anastasia Razdaibiedina and Yuning Mao and Rui Hou and Madian Khabsa and Mike Lewis and Amjad Almahairi , title =. International Conference on Learning Representations , year =

work page
[70]

Conference on Empirical Methods in Natural Language Processing , pages =

Ran Song and Shizhu He and Shuting Jiang and Yantuan Xian and Shengxiang Gao and Kang Liu and Zhengtao Yu , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page
[71]

International Conference on Computational Linguistics , pages =

Yongqi Leng and Deyi Xiong , title =. International Conference on Computational Linguistics , pages =

work page
[72]

CoRR , volume =

Xinyu Tang and Zhihao Lv and Xiaoxue Cheng and Junyi Li and Wayne Xin Zhao and Zujie Wen and Zhiqiang Zhang and Jun Zhou , title =. CoRR , volume =

work page
[73]

Shixiang Tang and Dapeng Chen and Jinguo Zhu and Shijie Yu and Wanli Ouyang , title =

work page
[74]

CoRR , volume =

Shen Yuan and Yin Zheng and Taifeng Wang and Binbin Liu and Hongteng Xu , title =. CoRR , volume =

work page
[75]

Annual Meeting of the Association for Computational Linguistics , pages =

Damai Dai and Li Dong and Yaru Hao and Zhifang Sui and Baobao Chang and Furu Wei , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page
[76]

Conference on Empirical Methods in Natural Language Processing , pages =

Mor Geva and Roei Schuster and Jonathan Berant and Omer Levy , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page
[77]

CoRR , volume =

An Yang and Anfeng Li and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Gao and Chengen Huang and Chenxu Lv and others , title =. CoRR , volume =

work page
[78]

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , journal =

Kerim B. OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , journal =

work page
[79]

Advances in Neural Information Processing Systems , year =

Michael Matena and Colin Raffel , title =. Advances in Neural Information Processing Systems , year =

work page
[80]

Woodland , title =

Xiaodong Wu and Wenyi Yu and Chao Zhang and Philip C. Woodland , title =. Advances in Neural Information Processing Systems , year =

work page

Showing first 80 references.

[1] [1]

Annual Meeting of the Association for Computational Linguistics , pages =

Shihan Dou and Enyu Zhou and Yan Liu and Songyang Gao and Wei Shen and Limao Xiong and Yuhao Zhou and Xiao Wang and Zhiheng Xi and Xiaoran Fan and Shiliang Pu and Jiang Zhu and Rui Zheng and Tao Gui and Qi Zhang and Xuanjing Huang , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[2] [2]

CoRR , volume =

Dengchun Li and Yingzi Ma and Naizheng Wang and Zhiyuan Cheng and Lei Duan and Jie Zuo and Cal Yang and Mingjie Tang , title =. CoRR , volume =

work page

[3] [3]

2024 , url =

Llama 3 Model Card , author=. 2024 , url =

work page 2024

[4] [4]

Advances in Neural Information Processing Systems , year =

Rui Pan and Xiang Liu and Shizhe Diao and Renjie Pi and Jipeng Zhang and Chi Han and Tong Zhang , title =. Advances in Neural Information Processing Systems , year =

work page

[5] [5]

Advances in Neural Information Processing Systems , year =

Fanxu Meng and Zhaohui Wang and Muhan Zhang , title =. Advances in Neural Information Processing Systems , year =

work page

[6] [6]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. International Conference on Learning Representations , year =

work page

[7] [7]

International Conference on Learning Representations , year =

Dan Hendrycks and Collin Burns and Steven Basart and Andy Zou and Mantas Mazeika and Dawn Song and Jacob Steinhardt , title =. International Conference on Learning Representations , year =

work page

[8] [8]

Findings of the Association for Computational Linguistics,

Haonan Li and Yixuan Zhang and Fajri Koto and Yifei Yang and Hai Zhao and Yeyun Gong and Nan Duan and Timothy Baldwin , title =. Findings of the Association for Computational Linguistics,

work page

[9] [9]

Annual Meeting of the Association for Computational Linguistics , pages =

Zahra Fatemi and Chen Xing and Wenhao Liu and Caiming Xiong , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[10] [10]

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models , booktitle =

Ze. Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models , booktitle =

work page

[11] [11]

Merging Experts into One: Improving Computational Efficiency of Mixture of Experts , booktitle =

Shwai He and Run. Merging Experts into One: Improving Computational Efficiency of Mixture of Experts , booktitle =

work page

[12] [12]

Psychology of Learning and Motivation-Advances in Research and Theory , volume=

Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , author=. Psychology of Learning and Motivation-Advances in Research and Theory , volume=

work page

[13] [13]

Le and Geoffrey E

Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. International Conference on Learning Representations , year =

work page

[14] [14]

CoRR , volume =

Yun Luo and Zhen Yang and Fandong Meng and Yafu Li and Jie Zhou and Yue Zhang , title =. CoRR , volume =

work page

[15] [15]

International Conference on Machine Learning , year =

Xisen Jin and Xiang Ren , title =. International Conference on Machine Learning , year =

work page

[16] [16]

Annual Meeting of the Association for Computational Linguistics , pages =

Zhicheng Wang and Yufang Liu and Tao Ji and Xiaoling Wang and Yuanbin Wu and Congcong Jiang and Ye Chao and Zhencong Han and Ling Wang and Xu Shao and Wenqiu Zeng , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[17] [17]

Findings of the Association for Computational Linguistics:

Xiao Wang and Tianze Chen and Qiming Ge and Han Xia and Rong Bao and Rui Zheng and Qi Zhang and Tao Gui and Xuanjing Huang , title =. Findings of the Association for Computational Linguistics:

work page

[18] [18]

Empirical Methods in Natural Language Processing , pages =

Thomas Scialom and Tuhin Chakrabarty and Smaranda Muresan , title =. Empirical Methods in Natural Language Processing , pages =

work page

[19] [19]

Advances in Neural Information Processing Systems , pages =

Hanul Shin and Jung Kwon Lee and Jaehong Kim and Jiwon Kim , title =. Advances in Neural Information Processing Systems , pages =

work page

[20] [20]

Liyuan Wang and Xingxing Zhang and Hang Su and Jun Zhu , title =

work page

[21] [21]

Haizhou Shi and Zihao Xu and Hengyi Wang and Weiyi Qin and Wenyuan Wang and Yibin Wang and Zifeng Wang and Sayna Ebrahimi and Hao Wang , title =

work page

[22] [22]

International Conference on Artificial Intelligence and Statistics , volume =

Mehrdad Farajtabar and Navid Azizan and Alex Mott and Ang Li , title =. International Conference on Artificial Intelligence and Statistics , volume =

work page

[23] [23]

Nature Machine Intelligence , volume =

Guanxiong Zeng and Yang Chen and Bo Cui and Shan Yu , title =. Nature Machine Intelligence , volume =

work page

[24] [24]

Advances in Neural Information Processing Systems , year =

Linglan Zhao and Xuerui Zhang and Ke Yan and Shouhong Ding and Weiran Huang , title =. Advances in Neural Information Processing Systems , year =

work page

[25] [25]

Continual Learning with Pre-Trained Models:

Da. Continual Learning with Pre-Trained Models:. International Joint Conference on Artificial Intelligence , pages =

work page

[26] [26]

International Conference on Computational Linguistics , pages =

Wenfeng Feng and Chuzhan Hao and Yuewei Zhang and Yu Han and Hao Wang , title =. International Conference on Computational Linguistics , pages =

work page

[27] [27]

Conference on Empirical Methods in Natural Language Processing , pages =

Yufei Ma and Zihan Liang and Huangyu Dai and Ben Chen and Dehong Gao and Zhuoran Ran and Zihan Wang and Linbo Jin and Wen Jiang and Guannan Zhang and Xiaoyan Cai and Libin Yang , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page

[28] [28]

International Conference on Computational Linguistics , pages =

Bing Wang and Liang Ding and Qihuang Zhong and Ximing Li and Dacheng Tao , title =. International Conference on Computational Linguistics , pages =

work page

[29] [29]

International

Qidong Liu and Xian Wu and Xiangyu Zhao and Yuanshao Zhu and Derong Xu and Feng Tian and Yefeng Zheng , title =. International

work page

[30] [30]

Advances in Neural Information Processing Systems , year =

Rajarshi Saha and Naomi Sagan and Varun Srivastava and Andrea Goldsmith and Mert Pilanci , title =. Advances in Neural Information Processing Systems , year =

work page

[31] [31]

Kaiming He and Xiangyu Zhang and Shaoqing Ren and Jian Sun , title =

work page

[32] [32]

Annual Meeting of the Association for Computational Linguistics , pages =

Jianheng Huang and Leyang Cui and Ante Wang and Chengyi Yang and Xinting Liao and Linfeng Song and Junfeng Yao and Jinsong Su , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[33] [33]

Findings of the Association for Computational Linguistics:

Hongyu Li and Liang Ding and Meng Fang and Dacheng Tao , title =. Findings of the Association for Computational Linguistics:

work page

[34] [34]

Conference of the North American Chapter of the Association for Computational Linguistics , pages =

Yifan Wang and Yafei Liu and Chufan Shi and Haoling Li and Chen Chen and Haonan Lu and Yujiu Yang , title =. Conference of the North American Chapter of the Association for Computational Linguistics , pages =

work page

[35] [35]

Advances in Neural Information Processing Systems , year =

Yibo Yang and Xiaojie Li and Zhongzhu Zhou and Shuaiwen Song and Jianlong Wu and Liqiang Nie and Bernard Ghanem , title =. Advances in Neural Information Processing Systems , year =

work page

[36] [36]

International Conference on Learning Representations , year =

Gangwei Jiang and Caigao Jiang and Zhaoyi Li and Siqiao Xue and Jun Zhou and Linqi Song and Defu Lian and Yin Wei , title =. International Conference on Learning Representations , year =

work page

[37] [37]

International Conference on Machine Learning , volume =

Joel Jang and Seungone Kim and Seonghyeon Ye and Doyoung Kim and Lajanugen Logeswaran and Moontae Lee and Kyungjae Lee and Minjoon Seo , title =. International Conference on Machine Learning , volume =

work page

[38] [38]

Annual Meeting of the Association for Computational Linguistics , pages =

Weixiang Zhao and Shilong Wang and Yulin Hu and Yanyan Zhao and Bing Qin and Xuanyu Zhang and Qing Yang and Dongliang Xu and Wanxiang Che , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[39] [39]

Annual Meeting of the Association for Computational Linguistics , pages =

Xiang Lisa Li and Percy Liang , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[40] [40]

DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =

Shih. DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =

work page

[41] [41]

Advances in Neural Information Processing Systems , year =

Tim Dettmers and Artidoro Pagnoni and Ari Holtzman and Luke Zettlemoyer , title =. Advances in Neural Information Processing Systems , year =

work page

[42] [42]

Albert Q. Jiang and Alexandre Sablayrolles and Antoine Roux and Arthur Mensch and Blanche Savary and Chris Bamford and Devendra Singh Chaplot and Diego de Las Casas and Emma Bou Hanna and Florian Bressand and others , title =. CoRR , volume =

work page

[43] [43]

CoRR , volume =

Aixin Liu and Bei Feng and Bing Xue and Bingxuan Wang and Bochao Wu and Chengda Lu and Chenggang Zhao and Chengqi Deng and Chenyu Zhang and Chong Ruan and others , title =. CoRR , volume =

work page

[44] [44]

Annual Meeting of the Association for Computational Linguistics , pages =

Yang Luo and Xiaozhe Ren and Zangwei Zheng and Zhuo Jiang and Xin Jiang and Yang You , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[45] [45]

Annual Meeting of the Association for Computational Linguistics , pages =

Quzhe Huang and Zhenwei An and Nan Zhuang and Mingxu Tao and Chen Zhang and Yang Jin and Kun Xu and Liwei Chen and Songfang Huang and Yansong Feng , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[46] [46]

Chi , title =

Jiaqi Ma and Zhe Zhao and Xinyang Yi and Jilin Chen and Lichan Hong and Ed H. Chi , title =

work page

[47] [47]

International Conference on Learning Representations , year =

Zeren Chen and Ziqin Wang and Zhen Wang and Huayang Liu and Zhenfei Yin and Si Liu and Lu Sheng and Wanli Ouyang and Jing Shao , title =. International Conference on Learning Representations , year =

work page

[48] [48]

Lillicrap and Gregory Wayne , title =

David Rolnick and Arun Ahuja and Jonathan Schwarz and Timothy P. Lillicrap and Gregory Wayne , title =. Advances in Neural Information Processing Systems , pages =

work page

[49] [49]

Gradient Episodic Memory for Continual Learning , booktitle =

David Lopez. Gradient Episodic Memory for Continual Learning , booktitle =

work page

[50] [50]

Advances in Neural Information Processing Systems , pages =

Cyprien de Masson d'Autume and Sebastian Ruder and Lingpeng Kong and Dani Yogatama , title =. Advances in Neural Information Processing Systems , pages =

work page

[51] [51]

International Conference on Learning Representations , year =

Jaehong Yoon and Divyam Madaan and Eunho Yang and Sung Ju Hwang , title =. International Conference on Learning Representations , year =

work page

[52] [52]

Online Continual Learning with Maximal Interfered Retrieval , booktitle =

Rahaf Aljundi and Eugene Belilovsky and Tinne Tuytelaars and Laurent Charlin and Massimo Caccia and Min Lin and Lucas Page. Online Continual Learning with Maximal Interfered Retrieval , booktitle =

work page

[53] [53]

Liyuan Wang and Bo Lei and Qian Li and Hang Su and Jun Zhu and Yi Zhong , title =

work page

[54] [54]

Bagdanov and Shangling Jui and Joost van de Weijer , title =

Xialei Liu and Chenshen Wu and Mikel Menta and Luis Herranz and Bogdan Raducanu and Andrew D. Bagdanov and Shangling Jui and Joost van de Weijer , title =

work page

[55] [55]

CoRR , volume =

Bartosz Cywinski and Kamil Deja and Tomasz Trzcinski and Bartlomiej Twardowski and Lukasz Kucinski , title =. CoRR , volume =

work page

[56] [56]

International Conference on Learning Representations , year =

Fan. International Conference on Learning Representations , year =

work page

[57] [57]

International Conference on Learning Representations , year =

Gobinda Saha and Isha Garg and Kaushik Roy , title =. International Conference on Learning Representations , year =

work page

[58] [58]

Advances in Neural Information Processing Systems , year =

Ang Bian and Wei Li and Hangjie Yuan and Chengrong Yu and Mang Wang and Zixiang Zhao and Aojun Lu and Pengliang Ji and Tao Feng , title =. Advances in Neural Information Processing Systems , year =

work page

[59] [59]

International Conference on Machine Learning , year =

Chenghao Fan and Zhenyi Lu and Sichen Liu and Chengfeng Gu and Xiaoye Qu and Wei Wei and Yu Cheng , title =. International Conference on Machine Learning , year =

work page

[60] [60]

Pattern Recognition , volume =

Cheng Chen and Lianli Gao and Pengpeng Zeng and Mingsheng Cao and Heng Tao Shen , title =. Pattern Recognition , volume =

work page

[61] [61]

Vechev and Kristina Toutanova , title =

Anton Alexandrov and Veselin Raychev and Mark Mueller and Ce Zhang and Martin T. Vechev and Kristina Toutanova , title =. Findings of the Association for Computational Linguistics:

work page

[62] [62]

Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics , pages =

Hanqing Wang and Yixia Li and Shuo Wang and Guanhua Chen and Yun Chen , title =. Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics , pages =

work page

[63] [63]

International Conference on Machine Learning , year =

Jiawei Zhao and Zhenyu Zhang and Beidi Chen and Zhangyang Wang and Anima Anandkumar and Yuandong Tian , title =. International Conference on Machine Learning , year =

work page

[64] [64]

Gomez and Lukasz Kaiser and Illia Polosukhin , title =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , title =. Advances in Neural Information Processing Systems , pages =

work page

[65] [65]

Advances in Neural Information Processing Systems , year =

Sen Lin and Li Yang and Deliang Fan and Junshan Zhang , title =. Advances in Neural Information Processing Systems , year =

work page

[66] [66]

International Conference on Learning Representations , year =

Sen Lin and Li Yang and Deliang Fan and Junshan Zhang , title =. International Conference on Learning Representations , year =

work page

[67] [67]

Conference on Empirical Methods in Natural Language Processing , pages =

Yizhong Wang and Swaroop Mishra and Pegah Alipoormolabashi and Yeganeh Kordi and Amirreza Mirzaei and Atharva Naik and Arjun Ashok and Arut Selvan Dhanasekaran and Anjana Arunkumar and David Stap and others , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page

[68] [68]

Liu and Ana Marasovic and Noah A

Pradeep Dasigi and Nelson F. Liu and Ana Marasovic and Noah A. Smith and Matt Gardner , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page

[69] [69]

International Conference on Learning Representations , year =

Anastasia Razdaibiedina and Yuning Mao and Rui Hou and Madian Khabsa and Mike Lewis and Amjad Almahairi , title =. International Conference on Learning Representations , year =

work page

[70] [70]

Conference on Empirical Methods in Natural Language Processing , pages =

Ran Song and Shizhu He and Shuting Jiang and Yantuan Xian and Shengxiang Gao and Kang Liu and Zhengtao Yu , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page

[71] [71]

International Conference on Computational Linguistics , pages =

Yongqi Leng and Deyi Xiong , title =. International Conference on Computational Linguistics , pages =

work page

[72] [72]

CoRR , volume =

Xinyu Tang and Zhihao Lv and Xiaoxue Cheng and Junyi Li and Wayne Xin Zhao and Zujie Wen and Zhiqiang Zhang and Jun Zhou , title =. CoRR , volume =

work page

[73] [73]

Shixiang Tang and Dapeng Chen and Jinguo Zhu and Shijie Yu and Wanli Ouyang , title =

work page

[74] [74]

CoRR , volume =

Shen Yuan and Yin Zheng and Taifeng Wang and Binbin Liu and Hongteng Xu , title =. CoRR , volume =

work page

[75] [75]

Annual Meeting of the Association for Computational Linguistics , pages =

Damai Dai and Li Dong and Yaru Hao and Zhifang Sui and Baobao Chang and Furu Wei , title =. Annual Meeting of the Association for Computational Linguistics , pages =

work page

[76] [76]

Conference on Empirical Methods in Natural Language Processing , pages =

Mor Geva and Roei Schuster and Jonathan Berant and Omer Levy , title =. Conference on Empirical Methods in Natural Language Processing , pages =

work page

[77] [77]

CoRR , volume =

An Yang and Anfeng Li and Baosong Yang and Beichen Zhang and Binyuan Hui and Bo Zheng and Bowen Yu and Chang Gao and Chengen Huang and Chenxu Lv and others , title =. CoRR , volume =

work page

[78] [78]

OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , journal =

Kerim B. OLoRA: Orthonormal Low-Rank Adaptation of Large Language Models , journal =

work page

[79] [79]

Advances in Neural Information Processing Systems , year =

Michael Matena and Colin Raffel , title =. Advances in Neural Information Processing Systems , year =

work page

[80] [80]

Woodland , title =

Xiaodong Wu and Wenyi Yu and Chao Zhang and Philip C. Woodland , title =. Advances in Neural Information Processing Systems , year =

work page