arxiv: 2604.19048 · v1 · submitted 2026-04-21 · 💻 cs.CL · cs.AI

Recognition: unknown

SAMoRA: Semantic-Aware Mixture of LoRA Experts for Task-Adaptive Learning

Boyan Shi , Wei Chen , Shuyuan Zhao , Junfeng Shen , Shengnan Guo , Shaojiang Wang , Huaiyu Wan

Authors on Pith no claims yet

Pith reviewed 2026-05-10 02:23 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords mixture of expertsLoRAparameter-efficient fine-tuningmulti-task learninglarge language modelssemantic routingtask-adaptive scaling

0 comments

The pith

SAMoRA routes inputs to specialized LoRA experts via semantic matching and scales their contributions by task complexity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents SAMoRA as a parameter-efficient fine-tuning method that combines mixture-of-experts with low-rank adaptation for large language models. It tackles imprecise routing by introducing a semantic-aware router that aligns input text meanings directly with expert modules, and adds task-adaptive scaling to vary each expert's influence according to the demands of different tasks. A regularization term further encourages experts to specialize while keeping scaling effective. Experiments across multi-task benchmarks show gains over prior methods plus stronger generalization to unseen tasks.

Core claim

SAMoRA introduces a Semantic-Aware Router that explicitly matches textual semantics to the most suitable LoRA experts for precise routing, a Task-Adaptive Scaling mechanism that dynamically adjusts expert contributions according to task-specific requirements, and a regularization objective that jointly promotes expert specialization and effective scaling, enabling better task-adaptive learning in large language models.

What carries the argument

The Semantic-Aware Router, which aligns input semantics to experts for routing decisions, together with the Task-Adaptive Scaling mechanism that regulates each expert's weight based on task needs.

If this is right

Expert modules become more specialized because routing now follows explicit semantic cues rather than uniform strategies.
Task performance improves when scaling factors adapt to varying task complexity instead of using fixed fusion weights.
Parameter efficiency is maintained while generalization to new tasks increases due to the joint regularization on specialization and scaling.
Routing decisions become interpretable through the semantic alignment step, reducing the black-box nature of prior MoE-LoRA approaches.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same semantic router could be tested on instruction-tuning datasets where task boundaries are implicit rather than labeled.
If the router generalizes across model sizes, it might reduce the need to train separate expert pools for each downstream domain.
Combining the scaling mechanism with other adaptation techniques such as prompt tuning could be explored to further lower memory use.

Load-bearing premise

The semantic-aware router can reliably map textual inputs to the correct experts without routing mistakes or large added computation costs.

What would settle it

Running SAMoRA on the same multi-task benchmarks but observing no accuracy gain over standard MoE-LoRA baselines or measurable increases in routing errors would falsify the central claims.

Figures

Figures reproduced from arXiv: 2604.19048 by Boyan Shi, Huaiyu Wan, Junfeng Shen, Shaojiang Wang, Shengnan Guo, Shuyuan Zhao, Wei Chen.

**Figure 1.** Figure 1: Illustration of limitations in existing mechanisms. (a) MLP-based Routing: Fails to explicitly match tasks with expert capabilities, resulting in expert homogenization. (b) Uniform Weight Fusion: Applies a uniform update strength across diverse tasks, ignoring specific requirements and limiting multi-task generalization. tasks (Qin et al., 2023; Raffel et al., 2020; Chen et al., 2025b), yet inevitably im… view at source ↗

**Figure 2.** Figure 2: Overview of our SAMoRA. We design a Semantic-Aware Router and a Task-Adaptive Scaling mechanism, [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: PCA visualization of expert features extracted [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of task scaling factors across [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of expert activation patterns on the unseen MMLU benchmark. The top row (MLP Router) [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Sensitivity Analysis on hyperparameters evalu [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

The combination of Mixture-of-Experts (MoE) and Low-Rank Adaptation (LoRA) has shown significant potential for enhancing the multi-task learning capabilities of Large Language Models. However, existing methods face two primary challenges: (1)Imprecise Routing in the current MoE-LoRA method fails to explicitly match input semantics with expert capabilities, leading to weak expert specialization. (2)Uniform weight fusion strategies struggle to provide adaptive update strengths, overlooking the varying complexity of different tasks. To address these limitations, we propose SAMoRA (Semantic-Aware Mixture of LoRA Experts), a novel parameter-efficient fine-tuning framework tailored for task-adaptive learning. Specifically, A Semantic-Aware Router is proposed to explicitly align textual semantics with the most suitable experts for precise routing. A Task-Adaptive Scaling mechanism is designed to regulate expert contributions based on specific task requirements dynamically. In addition, a novel regularization objective is proposed to jointly promote expert specialization and effective scaling. Extensive experiments on multiple multi-task benchmarks demonstrate that SAMoRA significantly outperforms the state-of-the-art methods and holds excellent task generalization capabilities. Code is available at https://github.com/boyan-code/SAMoRA

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SAMoRA adds a semantic router and task-adaptive scaling to MoE-LoRA with joint regularization, but the outperformance claims rest on experiments that the abstract does not show.

read the letter

The main takeaway is that this work refines MoE-LoRA by routing based on explicit semantic matching and scaling expert contributions according to task needs, plus a regularization term to push specialization. That combination is the concrete new piece over the methods it cites. The design directly names two common weaknesses—imprecise expert assignment and fixed fusion weights—and supplies components meant to fix them. Releasing code at the GitHub link is useful for anyone who wants to inspect or replicate the setup. The approach stays within standard parameter-efficient fine-tuning territory and does not claim to rewrite the underlying math of LoRA or MoE. The regularization objective is a reasonable way to couple the two new mechanisms without adding separate losses. The paper is clear about the intended use case of multi-task adaptation for LLMs. The soft spots sit in the evaluation. The abstract states that SAMoRA significantly outperforms prior methods on multiple benchmarks and shows strong generalization, yet supplies no numbers, no baseline list, no ablation tables, and no error analysis. Without those details the size of the gain and whether the router actually avoids routing mistakes remain open. The number of experts and the regularization coefficients are free parameters that will need tuning, and any extra compute from the router is not quantified here. The central assumption that semantic alignment produces reliable expert selection without side effects is stated but not obviously stress-tested in the provided description. This paper is for people already working on efficient multi-task fine-tuning who are looking for routing and scaling ideas to try. A reader who follows MoE-LoRA literature will see the incremental advance and can judge whether the added pieces are worth the complexity once the numbers appear. It deserves a serious referee because the problem is practical, the method is internally consistent, and the code is public. The experiments section will decide whether the claims hold. I would send it to review with the expectation that referees will focus on the quantitative results and ablations rather than reject outright.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces SAMoRA, a parameter-efficient fine-tuning framework for large language models that extends Mixture-of-Experts LoRA. It proposes a Semantic-Aware Router to align input textual semantics with suitable LoRA experts, a Task-Adaptive Scaling mechanism to dynamically adjust expert contributions according to task requirements, and a joint regularization objective to encourage expert specialization and effective scaling. The central claim is that extensive experiments on multiple multi-task benchmarks demonstrate significant outperformance over state-of-the-art methods together with strong task generalization.

Significance. If the reported empirical gains hold under rigorous evaluation, the work would advance parameter-efficient multi-task adaptation of LLMs by addressing imprecise routing and non-adaptive fusion in prior MoE-LoRA approaches. The public code release supports reproducibility and is a clear strength.

minor comments (3)

Abstract: missing space after the numeral in “(1)Imprecise Routing” and “(2)Uniform weight fusion”.
The experimental claims of “significantly outperforms” and “excellent task generalization” would benefit from explicit reference to the specific tables or figures that report the quantitative deltas, standard deviations, and statistical tests.
Notation for the router and scaling functions could be introduced with a single consolidated equation block rather than scattered inline definitions.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of SAMoRA and for recommending minor revision. We appreciate the recognition that our semantic-aware routing, task-adaptive scaling, and joint regularization address key limitations in prior MoE-LoRA methods, and we value the note on reproducibility via the public code release.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper introduces SAMoRA as a new PEFT framework with three explicitly defined components: a Semantic-Aware Router for matching semantics to experts, a Task-Adaptive Scaling mechanism, and a joint regularization objective. These are architectural and training choices, not derived quantities. The central claim of superior performance rests on empirical results across multi-task benchmarks rather than any equation or prediction that reduces by construction to fitted parameters, self-citations, or renamed inputs. No load-bearing derivation step collapses to a tautology or self-referential fit.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on standard neural-network training assumptions plus two domain-specific premises about semantic matching and task complexity; no new physical entities are postulated.

free parameters (2)

Number of LoRA experts
Hyperparameter that must be chosen; affects routing granularity and is not derived from first principles.
Regularization coefficients
Weights balancing specialization and scaling objectives; selected during training.

axioms (2)

domain assumption Textual semantics can be extracted by a lightweight router network and used to select experts accurately
Invoked in the description of the Semantic-Aware Router.
domain assumption Task complexity varies in a way that can be captured by a dynamic scaling factor
Underlying the Task-Adaptive Scaling mechanism.

pith-pipeline@v0.9.0 · 5527 in / 1376 out tokens · 48191 ms · 2026-05-10T02:23:54.227244+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 30 canonical work pages · 8 internal anchors

[1]

Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education

Clancey, William J. Communication, Simulation, and Intelligent Agents: Implications of Personal Intelligent Machines for Medical Education. Proceedings of the Eighth International Joint Conference on Artificial Intelligence (IJCAI-83)
[2]

Classification Problem Solving

Clancey, William J. Classification Problem Solving. Proceedings of the Fourth National Conference on Artificial Intelligence
[3]

, title =

Robinson, Arthur L. , title =. 1980 , doi =. https://science.sciencemag.org/content/208/4447/1019.full.pdf , journal =

1980
[4]

New Ways to Make Microcircuits Smaller---Duplicate Entry

Robinson, Arthur L. New Ways to Make Microcircuits Smaller---Duplicate Entry. Science
[5]

Clancey and Glenn Rennels , abstract =

Diane Warner Hasling and William J. Clancey and Glenn Rennels , abstract =. Strategic explanations for a diagnostic consultation system , journal =. 1984 , issn =. doi:https://doi.org/10.1016/S0020-7373(84)80003-6 , url =

work page doi:10.1016/s0020-7373(84)80003-6 1984
[6]

and Rennels, Glenn R

Hasling, Diane Warner and Clancey, William J. and Rennels, Glenn R. and Test, Thomas. Strategic Explanations in Consultation---Duplicate. The International Journal of Man-Machine Studies
[7]

Poligon: A System for Parallel Problem Solving

Rice, James. Poligon: A System for Parallel Problem Solving
[8]

Transfer of Rule-Based Expertise through a Tutorial Dialogue

Clancey, William J. Transfer of Rule-Based Expertise through a Tutorial Dialogue
[9]

The Engineering of Qualitative Models

Clancey, William J. The Engineering of Qualitative Models
[10]

2017 , eprint=

Attention Is All You Need , author=. 2017 , eprint=

2017
[11]

Pluto: The 'Other' Red Planet

NASA. Pluto: The 'Other' Red Planet
[12]

doi:10.1162/neco.1991.3.1.79

Robert A. Jacobs and Michael I. Jordan and Steven J. Nowlan and Geoffrey E. Hinton , title =. Neural Comput. , volume =. 1991 , url =. doi:10.1162/NECO.1991.3.1.79 , timestamp =

work page doi:10.1162/neco.1991.3.1.79 1991
[13]

Le and Geoffrey E

Noam Shazeer and Azalia Mirhoseini and Krzysztof Maziarz and Andy Davis and Quoc V. Le and Geoffrey E. Hinton and Jeff Dean , title =. 5th International Conference on Learning Representations,. 2017 , url =

2017
[14]

9th International Conference on Learning Representations,

Dmitry Lepikhin and HyoukJoong Lee and Yuanzhong Xu and Dehao Chen and Orhan Firat and Yanping Huang and Maxim Krikun and Noam Shazeer and Zhifeng Chen , title =. 9th International Conference on Learning Representations,. 2021 , url =

2021
[15]

William Fedus and Barret Zoph and Noam Shazeer , title =. J. Mach. Learn. Res. , volume =. 2022 , url =

2022
[16]

Mohammed Muqeeth and Haokun Liu and Colin Raffel , title =. Trans. Mach. Learn. Res. , volume =. 2024 , url =

2024
[17]

Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference , booktitle =

Sneha Kudugunta and Yanping Huang and Ankur Bapna and Maxim Krikun and Dmitry Lepikhin and Minh. Beyond Distillation: Task-level Mixture-of-Experts for Efficient Inference , booktitle =. 2021 , url =. doi:10.18653/V1/2021.FINDINGS-EMNLP.304 , timestamp =

work page doi:10.18653/v1/2021.findings-emnlp.304 2021
[18]

CoRR , volume =

Jingwei Xu and Junyu Lai and Yunpeng Huang , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2405.13053 , eprinttype =. 2405.13053 , timestamp =

work page doi:10.48550/arxiv.2405.13053 2024
[19]

arXiv preprint arXiv:2307.13269 , year=

Chengsong Huang and Qian Liu and Bill Yuchen Lin and Tianyu Pang and Chao Du and Min Lin , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2307.13269 , eprinttype =. 2307.13269 , timestamp =

work page doi:10.48550/arxiv.2307.13269 2023
[20]

Mixture-of-LoRAs: An Efficient Multitask Tuning Method for Large Language Models , booktitle =

Wenfeng Feng and Chuzhan Hao and Yuewei Zhang and Yu Han and Hao Wang , editor =. Mixture-of-LoRAs: An Efficient Multitask Tuning Method for Large Language Models , booktitle =. 2024 , url =

2024
[21]

Qidong Liu and Xian Wu and Xiangyu Zhao and Yuanshao Zhu and Derong Xu and Feng Tian and Yefeng Zheng , editor =. When. Proceedings of the 47th International. 2024 , url =. doi:10.1145/3626772.3657722 , timestamp =

work page doi:10.1145/3626772.3657722 2024
[22]

HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning , booktitle =

Chunlin Tian and Zhan Shi and Zhijiang Guo and Li Li and Cheng. HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning , booktitle =. 2024 , url =

2024
[23]

MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning , booktitle =

Yaming Yang and Dilxat Muhtar and Yelong Shen and Yuefeng Zhan and Jianfeng Liu and Yujing Wang and Hao Sun and Weiwei Deng and Feng Sun and Qi Zhang and Weizhu Chen and Yunhai Tong , editor =. MTL-LoRA: Low-Rank Adaptation for Multi-Task Learning , booktitle =. 2025 , url =. doi:10.1609/AAAI.V39I20.35509 , timestamp =

work page doi:10.1609/aaai.v39i20.35509 2025
[24]

2024 , url =

Xiao Liu and Yanan Zheng and Zhengxiao Du and Ming Ding and Yujie Qian and Zhilin Yang and Jie Tang , title =. 2024 , url =. doi:10.1016/J.AIOPEN.2023.08.012 , timestamp =

work page doi:10.1016/j.aiopen.2023.08.012 2024
[25]

Liu , title =

Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu , title =. J. Mach. Learn. Res. , volume =. 2020 , url =

2020
[26]

Scaling Laws for Neural Language Models

Jared Kaplan and Sam McCandlish and Tom Henighan and Tom B. Brown and Benjamin Chess and Rewon Child and Scott Gray and Alec Radford and Jeffrey Wu and Dario Amodei , title =. CoRR , volume =. 2020 , url =. 2001.08361 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2020
[27]

Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , booktitle =

Chengwei Qin and Aston Zhang and Zhuosheng Zhang and Jiaao Chen and Michihiro Yasunaga and Diyi Yang , editor =. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? , booktitle =. 2023 , url =. doi:10.18653/V1/2023.EMNLP-MAIN.85 , timestamp =

work page doi:10.18653/v1/2023.emnlp-main.85 2023
[28]

Pangu-{\Sigma}: Towards trillion parameter language model with sparse heterogeneous comput- ing,

Xiaozhe Ren and Pingyi Zhou and Xinfan Meng and Xinjing Huang and Yadao Wang and Weichao Wang and Pengfei Li and Xiaoda Zhang and Alexander Podolskiy and Grigory Arshinov and Andrey Bout and Irina Piontkovskaya and Jiansheng Wei and Xin Jiang and Teng Su and Qun Liu and Jun Yao , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2303.10845 , epri...

work page doi:10.48550/arxiv.2303.10845 2023
[29]

Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M

Jason Wei and Maarten Bosma and Vincent Y. Zhao and Kelvin Guu and Adams Wei Yu and Brian Lester and Nan Du and Andrew M. Dai and Quoc V. Le , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

2022
[30]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , booktitle =. 2022 , url =

2022
[31]

Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

Peter Clark and Isaac Cowhey and Oren Etzioni and Tushar Khot and Ashish Sabharwal and Carissa Schoenick and Oyvind Tafjord , title =. CoRR , volume =. 2018 , url =. 1803.05457 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv 2018
[32]

Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering

Todor Mihaylov and Peter Clark and Tushar Khot and Ashish Sabharwal , title =. CoRR , volume =. 2018 , url =. 1809.02789 , timestamp =

work page internal anchor Pith review arXiv 2018
[33]

The Thirty-Fourth

Yonatan Bisk and Rowan Zellers and Ronan Le Bras and Jianfeng Gao and Yejin Choi , title =. The Thirty-Fourth. 2020 , url =. doi:10.1609/AAAI.V34I05.6239 , timestamp =

work page doi:10.1609/aaai.v34i05.6239 2020
[34]

SocialIQA: Commonsense Reasoning about Social Interactions

Maarten Sap and Hannah Rashkin and Derek Chen and Ronan Le Bras and Yejin Choi , title =. CoRR , volume =. 2019 , url =. 1904.09728 , timestamp =

work page internal anchor Pith review arXiv 2019
[35]

Bowman , title =

Alex Wang and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , title =. 7th International Conference on Learning Representations,. 2019 , url =

2019
[36]

MMLU-Pro:

Yubo Wang and Xueguang Ma and Ge Zhang and Yuansheng Ni and Abhranil Chandra and Shiguang Guo and Weiming Ren and Aaran Arulraj and Xuan He and Ziyan Jiang and Tianle Li and Max Ku and Kai Wang and Alex Zhuang and Rongqi Fan and Xiang Yue and Wenhu Chen , editor =. MMLU-Pro:. Advances in Neural Information Processing Systems 38: Annual Conference on Neura...

2024
[37]

DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =

Shih. DoRA: Weight-Decomposed Low-Rank Adaptation , booktitle =. 2024 , url =

2024
[38]

PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models , booktitle =

Fanxu Meng and Zhaohui Wang and Muhan Zhang , editor =. PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models , booktitle =. 2024 , url =

2024
[39]

Psychometrika , volume=

The approximation of one matrix by another of lower rank , author=. Psychometrika , volume=. 1936 , publisher=

1936
[40]

Bowman , editor =

Alex Wang and Yada Pruksachatkun and Nikita Nangia and Amanpreet Singh and Julian Michael and Felix Hill and Omer Levy and Samuel R. Bowman , editor =. SuperGLUE:. Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada , pages =. 2019 , url =

2019
[41]

Proceedings of the 57th Conference of the Association for Computational Linguistics,

Rowan Zellers and Ari Holtzman and Yonatan Bisk and Ali Farhadi and Yejin Choi , editor =. HellaSwag: Can a Machine Really Finish Your Sentence? , booktitle =. 2019 , url =. doi:10.18653/V1/P19-1472 , timestamp =

work page doi:10.18653/v1/p19-1472 2019
[42]

Keisuke Sakaguchi and Ronan Le Bras and Chandra Bhagavatula and Yejin Choi , title =. Commun. 2021 , url =. doi:10.1145/3474381 , timestamp =

work page doi:10.1145/3474381 2021
[43]

doi: 10.18653/v1/N19-1421

Alon Talmor and Jonathan Herzig and Nicholas Lourie and Jonathan Berant , editor =. CommonsenseQA:. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies,. 2019 , url =. doi:10.18653/V1/N19-1421 , timestamp =

work page doi:10.18653/v1/n19-1421 2019
[44]

Learning multiple visual domains with residual adapters , booktitle =

Sylvestre. Learning multiple visual domains with residual adapters , booktitle =. 2017 , url =

2017
[45]

A Survey of Large Language Models

Wayne Xin Zhao and Kun Zhou and Junyi Li and Tianyi Tang and Xiaolei Wang and Yupeng Hou and Yingqian Min and Beichen Zhang and Junjie Zhang and Zican Dong and Yifan Du and Chen Yang and Yushuo Chen and Zhipeng Chen and Jinhao Jiang and Ruiyang Ren and Yifan Li and Xinyu Tang and Zikang Liu and Peiyu Liu and Jian. A Survey of Large Language Models , journ...

work page Pith review doi:10.48550/arxiv.2303.18223 2023
[46]

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment

Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment , author=. arXiv preprint arXiv:2312.12148 , year=

work page arXiv
[47]

Advances in neural information processing systems , volume=

Attention is all you need , author=. Advances in neural information processing systems , volume=
[48]

arXiv preprint arXiv:2506.14436 , year=

Shen Yuan and Yin Zheng and Taifeng Wang and Binbin Liu and Hongteng Xu , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2506.14436 , eprinttype =. 2506.14436 , timestamp =

work page doi:10.48550/arxiv.2506.14436 2025
[49]

The Thirteenth International Conference on Learning Representations,

Fan Wang and Juyong Jiang and Chansung Park and Sunghun Kim and Jing Tang , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025
[50]

arXiv preprint arXiv:2311.11501 , year=

Yiming Wang and Yu Lin and Xiaodong Zeng and Guannan Zhang , title =. CoRR , volume =. 2023 , url =. doi:10.48550/ARXIV.2311.11501 , eprinttype =. 2311.11501 , timestamp =

work page doi:10.48550/arxiv.2311.11501 2023
[51]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov and Soumya Batra and Prajjwal Bhargava and Shruti Bhosale and Dan Bikel and Lukas Blecher and Cristian Canton. Llama 2: Open Foundation and Fine-Tuned Chat Models , journal =. 2023 , url =. doi:10.48550/ARXIV.2307.09288 , eprinttype ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.09288 2023
[52]

2023 , eprint=

Mistral 7B , author=. 2023 , eprint=

2023
[53]

2024 , eprint=

DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models , author=. 2024 , eprint=

2024
[54]

arXiv preprint arXiv:2506.14436 , year=

MoORE: SVD-based Model MoE-ization for Conflict-and Oblivion-Resistant Multi-Task Adaptation , author=. arXiv preprint arXiv:2506.14436 , year=

work page arXiv
[55]

Qwen3 Technical Report

Qwen Team , title =. CoRR , volume =. 2025 , url =. doi:10.48550/ARXIV.2505.09388 , eprinttype =. 2505.09388 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2505.09388 2025
[56]

Commonsenseqa: A question answering challenge targeting commonsense knowledge , author=. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) , pages=

2019
[57]

Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model , booktitle =

Jiali Chen and Xusen Hei and Yuqi Xue and Zihan Wu and Jiayuan Xie and Yi Cai , editor =. Classic4Children: Adapting Chinese Literary Classics for Children with Large Language Model , booktitle =. 2025 , url =. doi:10.18653/V1/2025.FINDINGS-NAACL.133 , timestamp =

work page doi:10.18653/v1/2025.findings-naacl.133 2025
[58]

ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments , booktitle =

Jiali Chen and Yujie Jia and Zihan Wu and Jinyu Yang and Jianpeng Chen and Xusen Hei and Jiayuan Xie and Yi Cai and Qing Li , editor =. ExpStar: Towards Automatic Commentary Generation for Multi-discipline Scientific Experiments , booktitle =. 2025 , url =. doi:10.1145/3746027.3755756 , timestamp =

work page doi:10.1145/3746027.3755756 2025
[59]

Spatial-Temporal Knowledge Distillation for Takeaway Recommendation , booktitle =

Shuyuan Zhao and Wei Chen and Boyan Shi and Liyong Zhou and Shuohao Lin and Huaiyu Wan , editor =. Spatial-Temporal Knowledge Distillation for Takeaway Recommendation , booktitle =. 2025 , url =. doi:10.1609/AAAI.V39I12.33459 , timestamp =

work page doi:10.1609/aaai.v39i12.33459 2025
[60]

GLM-4.5V and GLM-4.1V-Thinking: Towards Versatile Multimodal Reasoning with Scalable Reinforcement Learning

Wenyi Hong and Wenmeng Yu and Xiaotao Gu and Guo Wang and Guobing Gan and Haomiao Tang and Jiale Cheng and Ji Qi and Junhui Ji and Lihang Pan and Shuaiqi Duan and Weihan Wang and Yan Wang and Yean Cheng and Zehai He and Zhe Su and Zhen Yang and Ziyang Pan and Aohan Zeng and Baoxu Wang and Boyan Shi and Changyu Pang and Chenhui Zhang and Da Yin and Fan Yan...

work page internal anchor Pith review doi:10.48550/arxiv.2507.01006 2025
[61]

The Llama 3 Herd of Models

Llama Team , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2407.21783 , eprinttype =. 2407.21783 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
[62]

The Thirteenth International Conference on Learning Representations,

Mengqi Liao and Wei Chen and Junfeng Shen and Shengnan Guo and Huaiyu Wan , title =. The Thirteenth International Conference on Learning Representations,. 2025 , url =

2025